Thoughtworks is looking for a Lead Systems Support Engineer to play a critical role in guaranteeing the operational efficiency, stability and availability of intricate application systems. You will lead a team to achieve operational success, enhance incident management and DevOps proficiency, and deliver inventive solutions directly to clients.
What You'll Do
- Understand complex application systems and debug business-impacting issues.
- Use incident management processes and application monitoring metrics to generate reports and take corrective actions.
- Leverage logging techniques for alerting, monitoring and identifying the root cause of incidents.
- Follow standards and best practices to bring operational efficiencies, stability and availability to systems.
- Lead the planning and execution of system upgrades, migrations and maintenance activities, minimizing downtime.
- Use continuous delivery practices to evolve and support high-quality software and bring value to customers early.
- Efficiently use DevOps tools and practices to deploy and run software.
- Act as a mentor for less-experienced peers through both technical knowledge and leadership skills.
- Apply the latest technology thinking from the Technology Radar to solve client problems.
What We're Looking For
- Experience working in Python.
- Good understanding of the AWS cloud platform.
- Experience with application monitoring tools such as DataDog, Prometheus or Grafana, including generating reports and taking corrective actions.
- Strong debugging and triaging skills to troubleshoot code effectively.
- Ability to conduct system performance analysis, identify bottlenecks and implement optimization strategies.
- Ability to perform predictive analysis and proactively identify issues with development teams.
- Experience working with a relational or non-relational database.
- Experience with CI/CD tools like Jenkins, Github Actions, Buildkite or Azure Pipelines.
- Ability to ensure bug fixes and enhancements are high-quality and well-tested.
- High-level understanding of architectures such as monolithic, N-tier, layered, microservices and serverless.
- Good communication and articulation skills.
- Resilient in ambiguous situations and able to approach challenges from multiple perspectives.
- Can influence clients on processes including incident management, support levels and scope of work.
- Advocate for and implement cloud best practices in resource optimization, monitoring and alerting.
- Advocate for and implement security best practices.
- Comfortable with Agile methods, such as Scrum or Kanban.
- Willing to be part of a rotation-based 24x7 on-call team and handle multiple engagements.
Nice to Have
- Enjoy influencing others and advocate for technical excellence while being open to change.
- Presence in the external tech community and willingness to share expertise via speaking, open source contributions, or blogs.
Technical Stack
- Python
- AWS
- DataDog, Prometheus, Grafana
- Jenkins, Github, Buildkite, Azure Pipelines
Team & Environment
You will join a dynamic and inclusive community of bright and supportive colleagues. Thoughtworks is focused on revolutionizing tech through purposeful work, balancing autonomy with a strong cultivation culture. Your career development is entirely up to you, supported by interactive tools, numerous development programs and teammates.
Work Mode
This is a fully remote position.
Thoughtworks is an equal opportunity employer committed to diversity and inclusion.




