California or Boca Raton or Minneapolis or Dayton or Alpharetta or Saint Cloud USD 95,300 – 158,800 / year

RELX is hiring a Senior Site Reliability Engineer I

The Senior Site Reliability Engineer I plays a critical role in ensuring the reliability, scalability, and performance of our distributed systems. This position bridges the gap between development and operations by driving automation, improving system observability, and reducing manual intervention through proactive engineering solutions. The engineer will lead initiatives to define service level objectives, implement robust monitoring and alerting, and ensure rapid incident response and recovery. This role requires deep technical expertise, strong collaboration skills, and a relentless focus on system resilience and operational excellence.

Responsibilities

  • Develops monitoring queries and defines service level objectives to measure system reliability
  • Assists senior engineers during incident response and contributes to root cause analyses
  • Conducts post-incident reviews with detailed reporting on impact, timeline, and follow-up actions
  • Participates in disaster recovery testing to validate system resilience
  • Implements automation solutions and deploys code in production environments
  • Documents SRE practices and contributes to internal knowledge resources
  • Supports the design of infrastructure layouts and deployment processes
  • Tests system availability, reliability, and recovery capabilities in non-production settings
  • Benchmarks performance data to inform production readiness assessments
  • Applies advanced DevOps expertise in cloud infrastructure, CI/CD, containerization, and security
  • Participates in on-call rotations to support critical incident resolution
  • Validates failover mechanisms across geographic regions for production systems
  • Automates system recovery using Infrastructure-as-Code and configuration management tools
  • Leads scenario modeling for SLO breaches and designs responsive workflows
  • Writes advanced scripts for automated incident response, including rollbacks and failovers
  • Analyzes operational toil through ticket trends and recommends process improvements
  • Executes independent projects to eliminate repetitive manual work
  • Applies deep observability knowledge to diagnose complex system issues
  • Builds reusable observability dashboards and configurations via code templates
  • Guides appropriate error budget and SLO definitions for services
  • Collaborates with cross-functional teams to migrate applications to standardized platforms
  • Provides technical guidance on implementing new platform features

Requirements

  • Proven experience across core SRE practices and principles
  • Understanding of monitoring and tracing in distributed systems with interdependencies
  • Ability to automate recovery processes to maintain service level agreements
  • Prior on-call experience supporting incident resolution
  • Track record of improving processes through practical contributions
  • Advanced hands-on skills in DevOps including monitoring, networking, cloud storage, containers, orchestration, CI/CD, and cloud security
  • Experience creating monitoring logic and setting performance baselines
  • History of supporting senior staff during major incidents
  • Active participation in post-mortem and RCA processes
  • Involvement in disaster recovery validation exercises
  • Direct experience deploying automation in production systems
  • Contributions to SRE documentation and knowledge repositories
  • Support in developing infrastructure diagrams and deployment workflows
  • Testing of system reliability and recoverability outside production
  • Documenting benchmark results for production readiness
  • On-call participation for major incident recovery
  • Testing of regional failover for systems and components
  • Automates recovery using Infrastructure-as-Code and configuration scripts
  • Producing comprehensive RCAs with executive summaries and risk assessments
  • Leading SLO breach scenario planning and response workflows

Tech Stack

Azure (including AKS), Terraform, GitHub, CI/CD pipelines, Java debugging, Helm charts, JFrog

Benefits

  • Comprehensive health, dental, and vision insurance
  • 401(k) plan with company match
  • Generous paid time off and flexible work arrangements

Compensation

Competitive salary based on experience and qualifications

Additional Information

  • This role requires occasional on-call availability to support production systems
  • Candidates must be authorized to work in the United States without sponsorship
Required Skills
TerraformGitHubCI/CDHelm
About company
RELX
RELX is a global provider of information-based analytics and decision tools for professional and business customers.
All jobs at RELX Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 4 months ago