As a Senior Site Reliability Engineer (Resilience), you'll drive the evolution of large-scale, multi-cloud systems by designing automation and improving platform reliability. You'll work across engineering teams to build resilient infrastructure for cloud-hosted and serverless services, ensuring systems scale efficiently and operate consistently under real-world demands.
What You'll Do
- Design and implement automation to streamline system engineering workflows and strengthen platform stability.
- Scale global infrastructure to meet growing demand through code-driven tooling and maintainable systems.
- Lead incident response and problem management efforts to reduce recurring customer impact and improve resolution efficiency.
- Collaborate across time zones in a follow-the-sun on-call model, primarily during regular working hours.
- Foster a culture of shared ownership, operational rigor, and continuous learning within engineering teams.
Requirements
- Demonstrated experience applying engineering principles to improve platform reliability and reduce operational toil.
- Customer-focused mindset with the ability to assess and resolve operational issues through an SRE lens.
- Strong software engineering background enabling effective collaboration on system design and implementation.
Preferred Qualifications
- Hands-on experience with public cloud platforms and managed Kubernetes services.
- Proven work with Infrastructure-as-Code tools such as Crossplane or Terraform in SaaS environments.
- Experience operating large-scale Kubernetes clusters across multiple cloud providers.
- Proficiency in Golang or other programming languages for building system-level tools.
- Familiarity with container technologies like Docker and distributed Linux environments.
- Track record improving alerting, incident response, and observability systems using tools like Prometheus, Influx, Graphite, or the Elastic Stack.
- Experience mentoring engineers and promoting knowledge sharing in distributed teams.
- Background in inclusive communication practices that strengthen team and partner relationships.
- Remote work experience in self-directed, globally distributed teams.
Benefits
- Compensation aligned with role impact, not prior salary history.
- Comprehensive health coverage for employees and dependents in many regions.
- Flexible work arrangements with support for remote and asynchronous collaboration.
- Generous annual vacation allowance.
- Up to $2000 in matched donations for charitable giving or community service.
- 40 hours annually dedicated to volunteer activities.
- Minimum of 16 weeks of parental leave.
- Commitment to diversity, equity, and inclusion across a global workforce.
- Clear pathways for professional development regardless of age, background, or tenure.


