What You'll Do

Design and deploy reliability-focused solutions that strengthen system resilience and reduce operational burden. Champion automation-first strategies to streamline incident response, capacity planning, and service monitoring across distributed platforms.

Work closely with engineering and product teams to refine Service Level Objectives and Indicators, ensuring realistic error budgets and measurable reliability standards. Enhance CI/CD workflows to support robust DevOps practices and faster, safer deployments.

Lead root cause analyses following incidents, promoting a blameless culture where insights drive systemic improvements. Participate in on-call rotations to maintain continuous service availability and contribute to disaster recovery planning.

Mentor engineers across teams, sharing SRE principles and helping embed reliability thinking into daily workflows. Stay informed on emerging technologies and practices, evaluating tools that advance observability, scalability, and system health.

Requirements

Demonstrate a strong automation mindset, using scripting to eliminate repetitive tasks and reduce toil
Possess deep knowledge of SLOs, SLIs, error budgets, and architectures built for high availability
Have experience managing production incidents and leading post-incident reviews with a focus on continuous improvement
Apply best practices in logging, monitoring, and alerting to ensure full system observability
Show practical understanding of data structures and modern data processing engines
Communicate effectively across technical and non-technical stakeholders to advocate for reliability initiatives
Display a commitment to coaching others and fostering a culture of operational excellence

Preferred Qualifications

Five or more years in software engineering, site reliability, or cloud infrastructure roles
Hands-on experience with DevOps platforms such as GitHub, Azure DevOps, GitLab, or Jenkins
Proficiency in building cloud-native, service-oriented systems at scale
Strong programming skills in Python, Go, Java, C#, or .NET
Familiarity with observability tools like Prometheus, Grafana, or OpenTelemetry
Experience improving CI/CD pipelines and automating deployment workflows
Background in global SaaS environments requiring 24/7 uptime
Knowledge of redundancy, failover, and disaster recovery strategies
Ability to collaborate across technical and business functions
Experience with Agile methodologies and delivering complex technical projects
Skill in problem-solving, analysis, and clear communication
Exposure to Chaos Engineering or AI Ops concepts is a plus

Benefits

Comprehensive health, dental, and vision insurance
Parental leave for primary and secondary caregivers
Flexible work arrangements
Two company-wide breaks each year, each lasting a week
Additional time off beyond standard vacation
Long-term incentive program
Annual training investment for professional growth

Relativity is hiring a Senior Engineer - Site Reliability Engineering

What You'll Do

Requirements

Preferred Qualifications

Benefits

Similar Jobs

Senior Software Engineer [REMOTE]

Senior Software Engineer - Cloud

Software Engineer / DevOps

DevOps & Site Reliability Engineer

KTO - Platform Engineer - SRE - Lever

Senior SRE Engineer

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

Become an AI Developer: Your Career Guide

Relativity is hiring a Senior Engineer - Site Reliability Engineering

What You'll Do

Requirements

Preferred Qualifications

Benefits

Similar Jobs

Senior Software Engineer [REMOTE]

Senior Software Engineer - Cloud

Software Engineer / DevOps

DevOps &amp; Site Reliability Engineer

KTO - Platform Engineer - SRE - Lever

Senior SRE Engineer

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

Become an AI Developer: Your Career Guide

DevOps & Site Reliability Engineer