Remote (Global)

Red Hat is hiring a Site Reliability Engineer

About the Role

The role involves maintaining and improving the reliability of large-scale systems by combining software engineering and operations expertise to support critical infrastructure and services.

Responsibilities

  • Monitor system performance and respond to incidents
  • Design and maintain scalable infrastructure systems
  • Implement automated solutions for operational tasks
  • Collaborate with development teams to improve code deployability
  • Troubleshoot complex production issues
  • Develop tools to enhance system observability
  • Support continuous integration and delivery pipelines
  • Optimize system reliability and uptime
  • Participate in on-call rotation for critical systems
  • Document system architecture and operational procedures
  • Enforce security and compliance standards
  • Contribute to disaster recovery planning
  • Evaluate new technologies for operational efficiency
  • Improve monitoring and alerting frameworks
  • Work with distributed systems and cloud platforms
  • Ensure efficient resource utilization across environments
  • Drive incident post-mortem analysis and follow-up actions
  • Promote best practices in configuration management
  • Support containerized application deployments
  • Maintain high availability for critical services
  • Assist in capacity planning and forecasting
  • Integrate feedback loops for system improvements
  • Collaborate on performance tuning initiatives
  • Support global infrastructure with low-latency requirements
  • Contribute to internal knowledge sharing

Nice to Have

  • Experience with large-scale production environments
  • Background in open-source contributions
  • Familiarity with service mesh technologies
  • Knowledge of database administration
  • Experience with infrastructure as code tools
  • Understanding of site reliability engineering principles
  • Exposure to global team collaboration
  • Proficiency with automation frameworks
  • Experience in agile development environments
  • Strong grasp of system architecture patterns

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model with flexibility for remote operations

Team

Collaborative engineering environment focused on system stability and performance

About the Team

  • This team focuses on building resilient systems using open-source technologies.
  • Engineers work closely with development and operations groups to deliver reliable services.

Technology Stack

  • Primary tools include Linux, Kubernetes, Prometheus, and Git.
  • Cloud platforms and containerized environments are central to operations.

Available for qualified candidates

Required Skills
OpenShift AdministrationLinux AdministrationAWS TechnologiesCI/CD (Tekton, GitHub Actions)AnsibleTerraformMonitoring (Grafana, Prometheus)Cloud InfrastructurePython/GoLangIncident Management OpenShift AdministrationLinux AdministrationAWS TechnologiesCI/CD (Tekton, GitHub Actions)AnsibleTerraformMonitoring (Grafana, Prometheus)Cloud InfrastructurePython/GoLangIncident Management
About company
Red Hat
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies.
All jobs at Red Hat Visit website
Job Details
Category infrastructure
Posted 6 months ago