As the SRE Lead, you'll establish and lead a dedicated reliability engineering practice from the ground up. Your focus will be on building systems that are inherently stable, scalable, and fault-tolerant, ensuring seamless performance as the platform grows. You'll set the technical and cultural foundation for reliability across engineering teams, driving practices that balance innovation with operational excellence.
Key Responsibilities
- Define and implement service-level objectives, indicators, and error budgets to align engineering speed with system stability
- Design and run chaos engineering exercises to proactively uncover weaknesses before they impact users
- Develop comprehensive runbooks and foster a blameless post-incident learning culture
- Lead performance testing initiatives, including load simulations and architectural assessments for 10x scale
- Automate repetitive operational tasks to reduce toil and free engineers for higher-value work
- Establish defensive design patterns such as circuit breakers, rate limiting, and graceful degradation
- Validate system behavior under stress using synthetic and adversarial workloads
- Integrate observability and resilience practices early in the development lifecycle
- Create frameworks that enable rapid iteration without sacrificing system integrity
- Recruit, develop, and lead a small team of SREs who amplify reliability practices across engineering
- Collaborate with engineering leaders and product teams to elevate how reliability is prioritized and measured
What You Bring
- Proven experience in high-pressure domains like FinTech, payments, or critical SaaS platforms where uptime is essential
- Deep technical fluency in SLO design, observability tooling, automation, and chaos engineering
- Strong grasp of resilient architecture and systems thinking, with a focus on graceful failure modes
- Ability to influence stakeholders and align teams without direct authority
- Leadership experience in hiring, coaching engineers, and setting technical direction
Nice to Have
- Familiarity with investment technology concepts to better align reliability with business needs
- Hands-on experience with Golang, Kubernetes, Google Cloud Platform, Postgres, Kafka, or Datadog
Environment & Culture
The organization values transparency, empowerment, and collective progress. You'll work in a flexible hybrid model with team hubs in Berlin, Tallinn, and London, involving periodic in-person collaboration in Berlin. The culture emphasizes curiosity, inclusivity, and shared success—where simplifying complexity and owning outcomes are core principles. Continuous learning is supported through development budgets and access to coaching. Diversity and inclusion are actively upheld across all levels of the company.


