North America Employment

UnitedHealth Group / Optum is hiring a Lead Site Reliability Engineer

About the Role

UnitedHealth Group / Optum is looking for a Lead Site Reliability Engineer to operate at the intersection of development and operations. You will ensure systems meet stringent production SLAs while empowering development teams to ship code faster and safer. Our mission is to help people live healthier lives and make the health system work better for everyone, supported by a culture guided by inclusion, talented peers, and comprehensive benefits.

What You'll Do

  • Champion the philosophy of 'automation first' to eliminate manual, repetitive operational tasks (toil).
  • Design and implement robust automation solutions to allow engineers to focus on strategic projects.
  • Implement and manage comprehensive monitoring, logging, and alerting systems to provide deep visibility into application performance and infrastructure health.
  • Develop dashboards and tools that enable rapid detection and resolution of incidents.
  • Act as a catalyst for DevOps culture and practices across development teams.
  • Provide the tools, infrastructure, and guardrails necessary to accelerate the software delivery lifecycle securely and reliably.
  • Lead the design and implementation of automated operational workflows for existing services and new service onboarding, including provisioning, deployment, scaling, and self-healing capabilities.
  • Oversee incident response management, lead root cause analyses (post-mortems), and ensure action items are completed to prevent recurrence.
  • Manage and optimize cloud infrastructure costs and efficiency using Infrastructure as Code (IaC) principles.

What We're Looking For

  • Undergraduate degree or equivalent experience.
  • 10+ years of experience in SRE, DevOps, Software Engineering, or a related operational capacity within a high-traffic production environment.
  • Extensive experience in managing critical production systems, incident response, and leading post-mortem processes.
  • Proven experience managing infrastructure and applications within a major public cloud environment (AWS, Azure, or GCP) at scale.
  • Proven solid track record of automating complex, manual operational processes and improving engineering efficiency.
  • Hands-on experience implementing and managing monitoring and logging stacks (e.g., Prometheus, Grafana, ELK stack/Elasticsearch, Datadog, Splunk).
  • Solid experience with Infrastructure as Code tools such as Terraform or CloudFormation, and configuration management tools (Ansible, Chef, or Puppet).
  • Proficiency in programming languages (e.g., Python, Go, Ruby, or Java/C#) used for automation, tooling development, and services management.
  • Proven expertise in cloud platforms (e.g., AWS services such as EC2, S3, RDS, Lambda, EKS/ECS).
  • Proven mandatory expertise in Docker and Kubernetes for container orchestration and management.
  • Proven expertise in building and maintaining robust CI/CD pipelines (e.g., GHA, Jenkins, GitLab CI, Azure DevOps) and strong Git practices.

Technical Stack

  • Monitoring & Logging: Prometheus, Grafana, ELK stack/Elasticsearch, Datadog, Splunk
  • Infrastructure as Code: Terraform, CloudFormation
  • Configuration Management: Ansible, Chef, Puppet
  • Languages: Python, Go, Ruby, Java, C#
  • AWS Services: EC2, S3, RDS, Lambda, EKS/ECS
  • Containers & Orchestration: Docker, Kubernetes
  • CI/CD: GHA, Jenkins, GitLab CI, Azure DevOps
  • Version Control: Git

UnitedHealth Group is committed to mitigating environmental impact and enabling and delivering equitable care that addresses health disparities.

Required Skills
PrometheusGrafanaELK stackTerraformAWSAzureGCPAnsibleChefPuppetincident responsepost-mortemautomation
Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
UnitedHealth Group / Optum

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. It connects people with care, pharmacy benefits, data and resources.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago