Singapore, Singapore On-site Employment

Thoughtworks is hiring a Lead Service Reliability Engineer

About the Role

Thoughtworks is looking for a Lead Service Reliability Engineer to join our DAMO service line. You will take a multifaceted approach to ensure technical excellence and operational efficiency, championing SRE principles to evolve our infrastructure towards a more customer-focused and agile model.

What You'll Do

  • Understand SRE goals from both technical and business perspectives.
  • Provide solutions to improve reliability, fault tolerance, and incident response times (MTTR, MTTD).
  • Enhance the incident management process, including prioritization, triage, communication, and post-mortem analysis.
  • Manage client stakeholder expectations during incidents, provide technical analysis and remediation plans, and interface with C-level executives as needed.
  • Act as a liaison with client engineering teams, building trust and influencing senior stakeholders for better decision-making.
  • Identify opportunities to enhance system performance and reliability aligned with business SLAs, SLOs, and KPIs.
  • Collaborate with Thoughtworks application development leads and solution architects to recommend design changes and reliability best practices.
  • Oversee and mentor other SREs on the team, contributing to their growth.

What We're Looking For

  • Ability to program with one or more high-level languages such as Python, Golang, Shell scripting, Ruby, or Java.
  • Familiarity with DevOps and GitOps practices, integrating observability automation into CI/CD pipelines (e.g., GitLab, Jenkins, CircleCI).
  • In-depth knowledge of configuration management and Infrastructure as Code tools (e.g., Terraform, Ansible, ARM, CloudFormation).
  • Expertise in observability, logs, tracing, and monitoring tools (e.g., Grafana, Prometheus, Graylog, Jaeger, Zipkin, ELK stack).
  • Strong understanding of container-based architecture and hands-on experience with orchestration tools (e.g., Kubernetes, AWS EKS, Docker Swarm, Nomad).
  • In-depth experience in application and infrastructure performance tuning and scaling under heavy load scenarios.
  • Good understanding of SLI/SLO/SLA, chaos engineering, golden signals, blameless postmortems, synthetic monitoring, distributed tracing, end-user monitoring, and performance testing.
  • Experience with network load balancing, security tech stacks, Transport Layer Security (TLS), certificate management, and standard networking protocols.
  • Strong communication and articulation skills, proficiency in English.
  • Ability to convey resolutions to audiences with varying technical/business proficiency and bring them to consensus.
  • Excellent problem-solving and analytical skills with a focus on continuous improvement.
  • Good listening and presentation skills.
  • Ability to solve challenging and difficult-to-debug issues with a determined attitude.
  • Ability to collaborate with cross-functional teams for capacity planning, scalability assessments, and solution design.
  • Ability to work under pressure with composure during production incidents.
  • Ability to understand and break down client requirements on technical and business aspects.
  • Willingness to be part of a rotation- and need-based, 24x7 available team.

Technical Stack

  • Languages: Python, Golang, Shell scripting, Ruby, Java
  • CI/CD: GitLab, Jenkins, CircleCI
  • Infrastructure as Code: Terraform, Ansible, ARM, CloudFormation
  • Observability: Grafana, Prometheus, Graylog, Jaeger, Zipkin, ELK stack
  • Orchestration: Kubernetes, AWS EKS, Docker Swarm, Nomad

Team & Environment

You will be part of the DAMO service line, collaborating with Thoughtworks application development leads, solution architects, and client engineering teams.

Work Mode

This is an onsite position.

Thoughtworks is an equal opportunity employer.

Required Skills
PythonGolangShell scriptingTerraformAnsibleKubernetesAWS EKSPrometheusGrafanaCI/CDDevOpsGitOpsObservability
Need to work legally in Thailand?

Work permits without the paperwork nightmare

Thai immigration rules are strict and easy to get wrong. SVBL handles the bureaucracy — correct visa type, proper documentation, timely submissions. You focus on your work.

Right visa type for your situation
Document preparation & submission
Deadline tracking & renewals
Direct liaison with immigration
Talk to an expert
10+ years experience
About company
Thoughtworks

A leading technology consultancy that helps clients solve complex business problems using technology, with a focus on innovation and continuous learning.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago