40+ countries Remote (Global)

Red Hat is hiring a Senior Site Reliability Engineer

About the Role

The role involves bridging development and operations by applying engineering principles to infrastructure and operations problems. The focus is on building and maintaining reliable systems at scale.

Responsibilities

  • Design and implement scalable monitoring solutions for distributed systems
  • Develop automation tools to improve system reliability and reduce manual intervention
  • Respond to and resolve critical production incidents in a timely manner
  • Collaborate with development teams to enhance application performance and resilience
  • Drive post-incident reviews and implement corrective actions
  • Optimize system performance and availability across cloud environments
  • Maintain and improve CI/CD pipelines for faster and safer deployments
  • Enforce best practices in configuration management and infrastructure as code
  • Support capacity planning and system scalability initiatives
  • Contribute to disaster recovery planning and execution
  • Evaluate and integrate new technologies that improve system stability
  • Ensure compliance with security and operational standards
  • Mentor junior engineers and share operational knowledge
  • Participate in on-call rotations for critical systems
  • Improve observability through logging, tracing, and metrics collection
  • Troubleshoot complex cross-system issues in production environments
  • Promote a culture of blameless post-mortems and continuous improvement
  • Work closely with product teams to influence system design for reliability
  • Automate routine operational tasks to increase efficiency
  • Monitor system health and proactively address potential failures

Nice to Have

  • Master's degree in computer science or related field
  • Experience supporting mission-critical enterprise systems
  • Contributions to open-source projects
  • Familiarity with service mesh technologies
  • Knowledge of large-scale data replication and consistency models
  • Experience with performance benchmarking and tuning
  • Background in software development with production code contributions
  • Exposure to edge computing or hybrid cloud architectures
  • Certifications in cloud or systems administration
  • Track record of improving system uptime and reducing incident frequency

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid remote and office-based work model

Team

Collaborative engineering team focused on system reliability and scalability

Why This Role Matters

  • This position plays a key role in maintaining the stability and performance of large-scale services used by global customers.
  • Engineers in this role directly influence the reliability and efficiency of core infrastructure platforms.

Technology Environment

  • Work is conducted in a Linux-based, open-source environment with extensive use of cloud-native technologies.
  • Primary tools include Kubernetes, Prometheus, Git, and Ansible, running on public and private cloud infrastructures.

Available for qualified candidates

Required Skills
OpenShiftKubernetesLinuxAWSGCPAzurePrometheusAnsiblePuppetChefRHELCentOSFedoraSite Reliability EngineeringInfrastructure as Code OpenShiftKubernetesLinuxAWSGCPAzurePrometheusAnsiblePuppetChefRHELCentOSFedoraSite Reliability EngineeringInfrastructure as Code
About company
Red Hat
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies.
All jobs at Red Hat Visit website
Job Details
Category infrastructure
Posted 9 months ago