HSBC Group is seeking an Associate Director, Software Engineering to lead Site Reliability Engineering (SRE) practices and production infrastructure. In this role, you will focus on troubleshooting, designing scalable and secure cloud infrastructure, and mentoring junior engineers to embed reliability throughout our systems.
What You'll Do
- Lead complex troubleshooting and root cause analysis efforts for incidents impacting production.
- Design, architect, and enhance scalable, highly available, and secure infrastructure using cloud, container, and orchestration technologies.
- Champion the adoption and refinement of SRE practices, including defining SLIs/SLOs and automating operational processes.
- Develop and maintain comprehensive monitoring, logging, and alerting systems using modern observability tools.
- Drive advancements in deployment automation, CI/CD pipelines, and infrastructure-as-code.
- Guide, mentor, and coach junior SREs and engineers.
- Collaborate with software development, QA, product, and operations teams to embed reliability throughout the software lifecycle.
- Participate in and lead on-call rotations, improve incident response, and perform blameless postmortems.
What We're Looking For
- A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field, or equivalent experience.
- At least 10 years of hands-on experience in IT, with significant experience in SRE, DevOps, or Production Support.
- Advanced hands-on expertise with containers (Docker, Kubernetes), cloud platforms (AWS, GCP, Azure), and orchestration technologies.
- Deep experience with monitoring, log management, and observability platforms.
- Fluent in at least one programming or scripting language (Python, Bash, Go, etc.).
- Experience implementing and maturing SRE principles at the organizational level.
- Excellent problem-solving, analytical, communication, and mentoring skills.
- Proven ability in high-availability, mission-critical, and/or 24x7 operational environments.
Nice to Have
- Experience with infrastructure-as-code (Terraform, Ansible, or similar tools).
- Experience working in 24x7 or high-availability production environments.
- SRE and cloud certifications (e.g., GCP Professional SRE, AWS DevOps Engineer, CKA/CKAD).
- Experience with microservices, distributed systems, and high-throughput architectures.
- Experience with AIOps to optimize production operations.
Technical Stack
- Cloud: AWS, GCP, Azure
- Containers & Orchestration: Kubernetes, Docker
- Observability: Prometheus, Grafana, ELK, Datadog, Splunk
- Infrastructure-as-Code: Terraform, Ansible, Helm
- Languages: Python, Bash, Go
HSBC is committed to building a culture where all employees are valued, respected and opinions count.





