Santa Monica, California, United States On-site Employment USD 150,000 - 200,000 Yearly

favorited is hiring a Senior Site Reliability Engineer

About the Role

Favorited is hiring a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of the infrastructure powering our real-time platform. You will play a key role in building and maintaining systems that support high-traffic applications used by a rapidly growing global audience.

What You'll Do

  • Design, implement, and maintain highly reliable and scalable infrastructure supporting real-time applications.
  • Build automation and tooling to improve system reliability, deployment processes, and operational efficiency.
  • Develop and maintain monitoring, logging, and alerting systems to ensure high availability and rapid incident response.
  • Partner closely with engineering teams to improve service reliability, performance, and observability.
  • Support incident response, root cause analysis, and postmortems, ensuring learnings are incorporated into system improvements.
  • Optimize infrastructure for performance, cost efficiency, and scalability.
  • Manage and scale containerized environments using Docker, Kubernetes, and related orchestration technologies.
  • Help define and enforce reliability standards, SLOs, and operational best practices across engineering teams.
  • Continuously evaluate new infrastructure tools and practices to improve system resilience and developer productivity.

What We're Looking For

  • 6+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
  • Experience managing infrastructure for large-scale systems supporting millions of users.
  • Strong expertise with cloud infrastructure, ideally Google Cloud Platform (GCP).
  • Hands-on experience with Kubernetes, container orchestration, and distributed systems.
  • Experience implementing monitoring and observability systems (Prometheus, Grafana, Datadog, or similar).
  • Strong scripting or programming experience in languages such as Python, Go, or TypeScript.
  • Deep understanding of reliability engineering practices including SLOs, SLIs, and incident management.
  • Strong collaboration skills and ability to work cross-functionally with engineering teams.

Nice to Have

  • Experience supporting real-time streaming, gaming, or large-scale consumer applications.
  • Familiarity with event-driven architectures and large-scale data processing systems.
  • Experience optimizing infrastructure costs in high-growth environments.

Technical Stack

  • Google Cloud Platform (GCP)
  • Kubernetes, Docker
  • Prometheus, Grafana, Datadog
  • Python, Go, TypeScript

Benefits & Compensation

  • Base salary: $150k - $200k + equity (options)
  • Unlimited PTO
  • 401(k) plan
  • Comprehensive health insurance
  • Paid company holidays

Work Mode

This role is onsite in Santa Monica.

Favorited is an equal opportunity employer.

Required Skills
Google Cloud Platform (GCP)KubernetesDockerPrometheusGrafanaDatadogPythonGoTypeScriptSite Reliability EngineeringDevOpsInfrastructure EngineeringDistributed SystemsMonitoringObservability
Landing international contracts?

Invoice globally with an EU company

GloPay creates an Estonian partnership for you automatically. Your clients get proper invoices, you keep 95% of payments. Setup takes 5 minutes, works in 100+ currencies.

EU-registered company for compliance
Multi-currency invoicing & payments
Expense tracking & tax reports
Money in your bank in 1 business day
Start invoicing free
5% per invoice • No subscriptions
About company
favorited

At favorited, we believe that digital communities should be more than just spaces to watch content. Our platform is a place to connect, engage, and play, and empowers creators by enhancing audience participation and fostering deeper connections.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago