United Kingdom, London Hybrid Full-time

Thought Machine is hiring a Senior Site Reliability Engineer

About the Role

The role involves building and maintaining scalable systems that support high-availability services, combining software engineering practices with operational rigor to improve system resilience and efficiency.

Responsibilities

  • Design and implement reliable, scalable infrastructure for cloud-native applications
  • Develop automation tools to reduce manual intervention in system operations
  • Monitor system performance and proactively address potential issues
  • Respond to incidents with clear escalation paths and post-incident reviews
  • Improve system uptime and reduce mean time to recovery
  • Collaborate with development teams to enhance service reliability
  • Define and track key reliability metrics and service level objectives
  • Troubleshoot complex production issues across distributed systems
  • Optimize resource usage and cost-efficiency in cloud environments
  • Contribute to disaster recovery and business continuity planning
  • Implement observability solutions including logging, metrics, and tracing
  • Enforce best practices in configuration management and deployment safety
  • Support CI/CD pipelines with reliability-focused testing and validation
  • Drive improvements in system architecture for fault tolerance
  • Participate in on-call rotations with a focus on sustainable operations
  • Mentor engineers in reliability principles and operational discipline
  • Evaluate new technologies for improving system stability
  • Document system behavior, failure modes, and recovery procedures
  • Ensure compliance with security and operational standards
  • Work across time zones to support global service operations

Nice to Have

  • Experience with financial technology or regulated environments
  • Familiarity with formal incident management frameworks
  • Contributions to open-source projects related to infrastructure
  • Background in performance tuning and load testing
  • Knowledge of networking protocols and distributed consensus algorithms

Compensation

Competitive salary with performance-based incentives

Work Arrangement

Hybrid work model with flexible remote options

Team

Collaborative engineering team focused on building resilient, cloud-native systems

Our Tech Stack

  • We use Google Cloud Platform as our primary infrastructure
  • Services are containerized using Docker and orchestrated with Kubernetes
  • Infrastructure is managed through Terraform for consistent deployments
  • Monitoring is powered by Prometheus and Grafana
  • Logging and tracing are handled via Fluentd and OpenTelemetry

Engineering Culture

  • We value transparency, ownership, and continuous learning
  • Engineers are encouraged to propose and lead technical initiatives
  • Blameless postmortems are standard practice after incidents
  • We maintain a strong focus on documentation and knowledge sharing
  • Team members are supported in attending conferences and training

Available for qualified candidates requiring work authorization

About company
Thought Machine

Thought Machine is a technology company building modern core banking systems. Its mission is to create technology that can run the world’s banks according to the best designs and software practices of the modern age, replacing legacy infrastructure with robust, scalable solutions.

The company’s core banking engine, Vault Core, was launched in 2015 and enables banks to operate with greater agility, security, and efficiency. Thought Machine provides a cloud-native, microservices-based platform that supports digital transformation for financial institutions globally.

With clients including Lloyds Banking Group, SEB, Standard Chartered, JPMorgan Chase, and Intesa Sanpaolo, Thought Machine has established itself as a leader in fintech innovation. The company continues to expand internationally, serving banks and fintechs across Europe, Asia, North America, Australasia, and the Middle East.

Recognized as one of the best places to work and one of the fastest-growing companies in Europe, Thought Machine combines deep expertise in banking and technology to drive the future of financial services.

All jobs at Thought Machine Visit website
Job Details
Department Engineering, Run Engineering
Category infrastructure
Posted 4 months ago