Favorited is hiring a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of the infrastructure powering our real-time platform. You will play a key role in building and maintaining systems that support high-traffic applications used by a rapidly growing global audience.
What You'll Do
- Design, implement, and maintain highly reliable and scalable infrastructure supporting real-time applications.
- Build automation and tooling to improve system reliability, deployment processes, and operational efficiency.
- Develop and maintain monitoring, logging, and alerting systems to ensure high availability and rapid incident response.
- Partner closely with engineering teams to improve service reliability, performance, and observability.
- Support incident response, root cause analysis, and postmortems, ensuring learnings are incorporated into system improvements.
- Optimize infrastructure for performance, cost efficiency, and scalability.
- Manage and scale containerized environments using Docker, Kubernetes, and related orchestration technologies.
- Help define and enforce reliability standards, SLOs, and operational best practices across engineering teams.
- Continuously evaluate new infrastructure tools and practices to improve system resilience and developer productivity.
What We're Looking For
- 6+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
- Experience managing infrastructure for large-scale systems supporting millions of users.
- Strong expertise with cloud infrastructure, ideally Google Cloud Platform (GCP).
- Hands-on experience with Kubernetes, container orchestration, and distributed systems.
- Experience implementing monitoring and observability systems (Prometheus, Grafana, Datadog, or similar).
- Strong scripting or programming experience in languages such as Python, Go, or TypeScript.
- Deep understanding of reliability engineering practices including SLOs, SLIs, and incident management.
- Strong collaboration skills and ability to work cross-functionally with engineering teams.
Nice to Have
- Experience supporting real-time streaming, gaming, or large-scale consumer applications.
- Familiarity with event-driven architectures and large-scale data processing systems.
- Experience optimizing infrastructure costs in high-growth environments.
Technical Stack
- Google Cloud Platform (GCP)
- Kubernetes, Docker
- Prometheus, Grafana, Datadog
- Python, Go, TypeScript
Benefits & Compensation
- Base salary: $150k - $200k + equity (options)
- Unlimited PTO
- 401(k) plan
- Comprehensive health insurance
- Paid company holidays
Work Mode
This role is onsite in Santa Monica.
Favorited is an equal opportunity employer.



