This senior role involves hands-on leadership in reliability, automation, and infrastructure design within a small, high-impact SRE team. The engineer will shape platform scalability and resilience across Google Cloud Platform, focusing on long-term system health and operational excellence.

Responsibilities

Lead the design and automation of GCP infrastructure with an emphasis on reliability, scalability, security, and cost efficiency.
Establish and refine service level indicators, objectives, and error budgets using Cloud Monitoring and Datadog to align system performance with business impact.
Develop strategies for multi-region deployment, disaster recovery, and capacity planning to support sustained platform growth.
Design and enhance cloud networking solutions, including VPC architecture, Cloud Armor, DNS, and secure connectivity for internal and external services.
Implement and advance infrastructure-as-code and GitOps workflows using Terraform, Kubernetes, Helm, and ArgoCD for consistent, auditable deployments.
Mentor engineers through technical reviews, incident post-mortems, and collaborative problem-solving to build team-wide expertise.
Investigate and apply LLM-powered automation to reduce manual effort and accelerate incident resolution.

Requirements

Minimum of 8 years of experience in software, infrastructure, or site reliability engineering roles.
At least 5 years managing production systems in Google Cloud Platform, covering compute, networking, storage, IAM, and observability.
Extensive hands-on work with Kubernetes (GKE), Helm, container technologies, Terraform, and ArgoCD.
Proficiency in Python, Go, or TypeScript/JavaScript for building automation tools and internal systems.
Proven experience in defining, monitoring, and acting on SLIs, SLOs, and error budgets.
Solid understanding of relational and distributed databases such as MySQL, Cloud SQL, Cloud Spanner, and Redis, including performance optimization and high availability.
Demonstrated ability to lead incident response, conduct root cause analysis, and implement systemic fixes.

Nice to Have

Experience working in fintech or other regulated industries.
Familiarity with CI/CD tools such as GitHub Actions, Jenkins, Tekton, or CircleCI.
Background in fast-paced, high-growth startup environments.

Tech Stack

GCP, Kubernetes, GKE, Helm, Terraform, ArgoCD, Cloud Monitoring, Datadog, VPC, Cloud Armor, VPN, DNS, Python, Go, TypeScript, JavaScript, MySQL, Cloud SQL, Cloud Spanner, Redis

Benefits

Opportunity to solve complex technical challenges, grow alongside top engineers, and contribute to financial empowerment for millions.
Flexible work hours and a virtual-first culture with a home office stipend.
Premium medical, dental, and vision insurance coverage.
Generous paid leave for parents and caregivers.
401(k) plan with company matching contributions.
Access to financial advisors and wellness resources.
Flexible paid time off and expanded company holidays, including Juneteenth and Winter Break.
Annual or biannual in-person company gatherings and regular virtual events to foster team connection.

Compensation

Competitive salary and equity package commensurate with experience

Work Arrangement

Virtual first, with home office stipend and optional in-person events

Team

3–4 engineers; small, high-leverage SRE team reporting to the Director of DevX & Infrastructure Engineering

Member centric
Helpful
Transparent
Persistent
Better together
Values diversity, inclusion, and empowerment

Additional Information

Eligible candidates must reside in the United States, excluding Hawaii.
In-person company events occur once or twice per year.
The organization operates as virtual first.
Equal Employment Opportunity employer that does not discriminate on any legally protected basis.
Complies with the City of Los Angeles’ Fair Chance Initiative for Hiring Ordinance.
Encourages applications from individuals who may experience imposter syndrome.

Not specified

Dave is hiring a Staff Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Compensation

Work Arrangement

Team

Additional Information

Similar Jobs

Senior Platform Engineer - Observability

Senior DevOps Engineer (hiring in US/CAN & LATAM)

Implementation Engineer

Principal Software Engineer - GCP Hosted Control Planes (Ireland and Czechia)

Platform Engineer - Product Reliability (Mid Level)

Senior Infrastructure Engineer

Related Articles

Platform Engineering: Kubernetes for All

AI Boom Job Impact: Tech Decline vs. Service Growth in SF

Developer Experience Platform: Lessons from Europe

Dave is hiring a Staff Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Compensation

Work Arrangement

Team

Additional Information

Similar Jobs

Senior Platform Engineer - Observability

Senior DevOps Engineer (hiring in US/CAN &amp; LATAM)

Implementation Engineer

Principal Software Engineer - GCP Hosted Control Planes (Ireland and Czechia)

Platform Engineer - Product Reliability (Mid Level)

Senior Infrastructure Engineer

Related Articles

Platform Engineering: Kubernetes for All

AI Boom Job Impact: Tech Decline vs. Service Growth in SF

Developer Experience Platform: Lessons from Europe

Senior DevOps Engineer (hiring in US/CAN & LATAM)