Red Hat is hiring a Senior Site Reliability Engineer to develop, scale, and operate our OpenShift managed cloud services. You will contribute to running OpenShift at scale by enabling customer self-service, improving monitoring, and eliminating toil through automation.
What You'll Do
- Contribute code to increase the scalability and reliability of the service.
- Contribute software tests and participate in peer review to increase codebase quality.
- Mentor and develop peers’ capabilities through knowledge sharing and collaboration.
- Participate in a regular on-call schedule, including occasional paid weekends and holidays.
- Practice sustainable incident response and blameless postmortems.
- Resolve customer issues escalated from the Red Hat Global Support team.
- Work within a small agile team to develop SRE software, support peers, plan, and self-improve.
- Proactively utilize AI-assisted development tools for code generation, auto-completion, and intelligent suggestions.
- Demonstrate proficiency in utilizing LLMs for brainstorming, research, summarizing documentation, and drafting communications.
- Participate in AI-assisted code reviews utilizing tools for real-time feedback.
- Explore and experiment with emerging AI technologies relevant to software development.
- Collaborate with cross-functional teams to identify opportunities for AI integration and share successful use cases.
What We're Looking For
- Bachelor's degree in Computer Science or a related technical field involving software or systems engineering, or hands-on experience demonstrating ability and interest in SRE.
- Experience programming in at least one of these languages: Python, Golang, Java, C, C++ or another object-oriented language.
- Experience working with public clouds such as AWS, GCP, or Azure.
- Ability to collaboratively troubleshoot and solve problems in a team setting.
- Basic understanding of Unix/Linux operating systems.
Nice to Have
- Experience troubleshooting an as-a-service offering (SaaS, PaaS, etc.).
- Experience working with complex distributed systems.
- Direct experience with Kubernetes or OpenShift.
- Demonstrated ability to debug, optimize code and automate routine tasks.
- 5+ years of experience managing Linux servers running RHEL, CentOS, or Fedora hosted at a cloud provider.
- 3+ years of experience with enterprise systems monitoring; knowledge of Prometheus is a plus.
- 3+ years of experience with enterprise configuration management software like Ansible by Red Hat, Puppet, or Chef.
- 2+ years of experience programming with at least one object-oriented language; Golang, Java, or Python are preferred.
- 2+ years of experience delivering a hosted service.
- Demonstrated ability to quickly and accurately troubleshoot system issues.
- Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP.
- Solid communications skills and experience working directly with and presenting to customers.
- 1+ year(s) of experience with Kubernetes is a plus.
- 1+ year(s) of experience with docker-based containers is a plus.
Technical Stack
- Platforms: OpenShift, Kubernetes
- Operating Systems: Linux (RHEL, CentOS, Fedora)
- Cloud Providers: AWS, GCP, Azure
- Monitoring: Prometheus
- Configuration Management: Ansible, Puppet, Chef
- Languages: Python, Golang, Java, C, C++
- Containers: Docker
- AI Tools: GitHub Copilot, Cursor, Claude Code, Google Gemini
Team & Environment
You will join a small agile team within a global SRE organization. Our culture relies on teamwork, openness, and transparency. We learn from failures in a blameless environment to support continuous improvement. We encourage teams to proactively, thoughtfully, and ethically use AI to simplify workflows and boost efficiency. We embrace change, have a strong growth mindset, and welcome associates to bring their best ideas, no matter their title or tenure.
Work Mode
This is a global role. We welcome applicants from 40+ countries.
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer.




