This senior role involves hands-on leadership in reliability, automation, and infrastructure design within a small, high-impact SRE team. The engineer will shape platform scalability and resilience across Google Cloud Platform, focusing on long-term system health and operational excellence.
Responsibilities
- Lead the design and automation of GCP infrastructure with an emphasis on reliability, scalability, security, and cost efficiency.
- Establish and refine service level indicators, objectives, and error budgets using Cloud Monitoring and Datadog to align system performance with business impact.
- Develop strategies for multi-region deployment, disaster recovery, and capacity planning to support sustained platform growth.
- Design and enhance cloud networking solutions, including VPC architecture, Cloud Armor, DNS, and secure connectivity for internal and external services.
- Implement and advance infrastructure-as-code and GitOps workflows using Terraform, Kubernetes, Helm, and ArgoCD for consistent, auditable deployments.
- Mentor engineers through technical reviews, incident post-mortems, and collaborative problem-solving to build team-wide expertise.
- Investigate and apply LLM-powered automation to reduce manual effort and accelerate incident resolution.
Requirements
- Minimum of 8 years of experience in software, infrastructure, or site reliability engineering roles.
- At least 5 years managing production systems in Google Cloud Platform, covering compute, networking, storage, IAM, and observability.
- Extensive hands-on work with Kubernetes (GKE), Helm, container technologies, Terraform, and ArgoCD.
- Proficiency in Python, Go, or TypeScript/JavaScript for building automation tools and internal systems.
- Proven experience in defining, monitoring, and acting on SLIs, SLOs, and error budgets.
- Solid understanding of relational and distributed databases such as MySQL, Cloud SQL, Cloud Spanner, and Redis, including performance optimization and high availability.
- Demonstrated ability to lead incident response, conduct root cause analysis, and implement systemic fixes.
Nice to Have
- Experience working in fintech or other regulated industries.
- Familiarity with CI/CD tools such as GitHub Actions, Jenkins, Tekton, or CircleCI.
- Background in fast-paced, high-growth startup environments.
Tech Stack
GCP, Kubernetes, GKE, Helm, Terraform, ArgoCD, Cloud Monitoring, Datadog, VPC, Cloud Armor, VPN, DNS, Python, Go, TypeScript, JavaScript, MySQL, Cloud SQL, Cloud Spanner, Redis
Benefits
- Opportunity to solve complex technical challenges, grow alongside top engineers, and contribute to financial empowerment for millions.
- Flexible work hours and a virtual-first culture with a home office stipend.
- Premium medical, dental, and vision insurance coverage.
- Generous paid leave for parents and caregivers.
- 401(k) plan with company matching contributions.
- Access to financial advisors and wellness resources.
- Flexible paid time off and expanded company holidays, including Juneteenth and Winter Break.
- Annual or biannual in-person company gatherings and regular virtual events to foster team connection.
Compensation
Competitive salary and equity package commensurate with experience
Work Arrangement
Virtual first, with home office stipend and optional in-person events
Team
3–4 engineers; small, high-leverage SRE team reporting to the Director of DevX & Infrastructure Engineering
- Member centric
- Helpful
- Transparent
- Persistent
- Better together
- Values diversity, inclusion, and empowerment
Additional Information
- Eligible candidates must reside in the United States, excluding Hawaii.
- In-person company events occur once or twice per year.
- The organization operates as virtual first.
- Equal Employment Opportunity employer that does not discriminate on any legally protected basis.
- Complies with the City of Los Angeles’ Fair Chance Initiative for Hiring Ordinance.
- Encourages applications from individuals who may experience imposter syndrome.
Not specified


