This role is central to ensuring the stability, performance, and scalability of our real-time digital platform. As a Senior Site Reliability Engineer, you will bridge the gap between development and operations by building resilient systems, enforcing reliability standards, and driving automation across the infrastructure lifecycle. You will work closely with engineering teams to proactively identify risks, reduce toil, and improve system observability, all while maintaining service uptime under high-load conditions. Your expertise will directly influence the platform’s ability to support millions of concurrent users and deliver seamless interactive experiences. This position requires a strategic mindset, deep technical proficiency, and a commitment to operational excellence in a dynamic, fast-moving environment.

Responsibilities

Design and maintain scalable, reliable infrastructure for real-time applications.
Create automation tools to enhance system reliability and streamline deployment workflows.
Implement monitoring, logging, and alerting systems for rapid incident detection and resolution.
Collaborate with engineering teams to improve service performance, reliability, and observability.
Lead incident response efforts, conduct root cause analysis, and apply learnings to prevent recurrence.
Optimize infrastructure for performance, cost efficiency, and long-term scalability.
Scale containerized environments using Kubernetes, Docker, and orchestration technologies.
Establish and uphold reliability standards, including SLOs and operational best practices.
Evaluate emerging tools and methodologies to enhance system resilience and engineering productivity.

Requirements

Minimum of six years of experience in Site Reliability Engineering, DevOps, or infrastructure roles.
Proven experience managing infrastructure for large-scale systems serving millions of users.
Strong technical background with cloud platforms, particularly Google Cloud Platform (GCP).
Hands-on expertise with Kubernetes, containerization, and distributed systems.
Experience building monitoring and observability solutions using tools like Prometheus, Grafana, or Datadog.
Proficiency in scripting or programming languages such as Python, Go, or TypeScript.
Solid understanding of SLOs, SLIs, and incident management frameworks.
Demonstrated ability to collaborate effectively across engineering teams.

Nice to Have

Experience supporting real-time streaming, gaming, or large-scale consumer-facing applications.
Knowledge of event-driven architectures and large-scale data processing systems.
Track record of optimizing infrastructure costs in high-growth environments.

Tech Stack

Google Cloud Platform (GCP), Kubernetes, Docker, Prometheus, Grafana, Datadog, Python, Go, TypeScript

Benefits

Unlimited paid time off to support work-life balance.
401(k) plan for long-term financial planning.
Comprehensive health insurance coverage.
Paid company holidays for rest and rejuvenation.
Competitive base salary reflecting experience and impact.

Compensation

$150k - $200k base salary. Equity: options

Work Arrangement

onsite — Santa Monica

Fast-paced and collaborative work environment
Emphasis on high standards and personal initiative
Open and respectful communication practices
Culture of real-time feedback and continuous improvement
Work intensity aligned with ambitious goals
Focus on gaming and interactive digital experiences
Encouragement of self-driven projects and innovation

Additional Information

This is a full-time, on-site role located in Santa Monica.
The platform emphasizes real-time interaction, engagement, and gamified experiences.
The company supports creators and audience participation within digital communities.

favorited is hiring a Senior Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Compensation

Work Arrangement

Additional Information

Similar Jobs

Senior Platform Engineer - Observability

Senior Software Engineer [REMOTE]

Senior Site Reliability Engineer

Sr. Site Reliability Engineer (SRE)

Senior DevOps Engineer (hiring in US/CAN & LATAM)

IT Software Engineer

Related Articles

Platform Engineering: Kubernetes for All

Network Configuration as Code: CI/CD for Automation | NVIDIA

Remote SRE Jobs: Vanguard’s Cloud Transformation

favorited is hiring a Senior Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Compensation

Work Arrangement

Additional Information

Similar Jobs

Senior Platform Engineer - Observability

Senior Software Engineer [REMOTE]

Senior Site Reliability Engineer

Sr. Site Reliability Engineer (SRE)

Senior DevOps Engineer (hiring in US/CAN &amp; LATAM)

IT Software Engineer

Related Articles

Platform Engineering: Kubernetes for All

Network Configuration as Code: CI/CD for Automation | NVIDIA

Remote SRE Jobs: Vanguard’s Cloud Transformation

Senior DevOps Engineer (hiring in US/CAN & LATAM)