As a Senior Site Reliability Engineer, you will be central to building and maintaining the reliability of our cloud platform. You'll work closely with engineering teams to design robust, scalable systems and ensure high availability across all infrastructure layers, including data and control planes.

What You’ll Do

Partner with engineering teams to implement secure, reliable, and high-performing infrastructure solutions.
Define and manage service level objectives and agreements to maintain consistent reliability standards.
Ensure comprehensive monitoring and alerting coverage across all critical components to enable rapid incident detection and resolution.
Lead and improve incident response workflows, including post-mortem reviews and communication with support teams during outages.
Drive initiatives to test system resilience, including chaos engineering programs aligned with organizational priorities.
Oversee on-call operations, refine escalation paths, and establish best practices to reduce downtime and improve response efficiency.

What We’re Looking For

A degree in Computer Science or a related field, or equivalent practical experience.
Minimum of 8 years in site reliability, systems engineering, or a similar role.
Production experience with ClickHouse and proficiency in Go or Python.
Strong familiarity with cloud platforms such as AWS, Azure, or Google Cloud.
Hands-on experience with Kubernetes, Docker Swarm, or similar container orchestration tools.
Proven track record with automation tools like Terraform, Ansible, or Puppet.
Exceptional debugging skills and a methodical approach to solving complex production issues.
Strong communication abilities and a collaborative mindset, with a focus on shared business outcomes.
A sense of ownership and accountability for system performance and uptime.

Preferred Background

Deep understanding of distributed databases and SQL, particularly ClickHouse, is highly valued.

Work Environment

This is a remote-first role open to candidates in the US and other countries. We support a flexible, globally distributed workforce with teams across 20 countries. You’ll receive a $500 stipend for home office setup and have opportunities to connect in person during company-wide gatherings.

Compensation & Benefits

Base salary range: $141,000 – $208,000 USD (or $157,000 – $230,000 in premium markets like San Francisco or NYC).
Equity is offered to all new hires through stock options.
Employer contributions toward healthcare coverage.
Flexible time-off policy in the US; generous leave in other regions.

Our Commitment

We believe in inclusive growth and equal opportunity. We do not discriminate based on race, color, religion, age, sex, national origin, disability status, genetics, veteran status, sexual orientation, gender identity, or any other protected characteristic under applicable laws.

Chime is hiring a Senior Site Reliability Engineer- Remote

What You’ll Do

What We’re Looking For

Preferred Background

Work Environment

Compensation & Benefits

Our Commitment

Similar Jobs

Senior/Lead Cloud Automation Developer

Containerization Cloud Consulting

Senior Network Engineer

Senior Infrastructure Engineer

Entry Level - Site Reliability Engineer (Remote - Ireland)

Staff / Senior Infrastructure Engineer (relocation)

Related Articles

Platform Engineering: Kubernetes for All

Become an AI Developer: Your Career Guide

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026