The Lead Site Reliability Administrator plays a critical role in ensuring the stability, scalability, and performance of cloud-based services. This position bridges the gap between development and operations by implementing automation, proactive monitoring, and robust incident management practices. The role requires deep technical expertise in cloud infrastructure, containerization, and DevOps tooling, along with strong leadership skills to guide reliability initiatives across teams. The administrator will be responsible for maintaining high availability, driving continuous improvement, and ensuring systems meet stringent service level objectives in a fast-paced, globally distributed environment.

Responsibilities

Design and implement solutions to improve service availability, performance, and operational stability
Automate routine tasks and processes within a cloud DevOps environment to increase efficiency
Develop proactive monitoring and alerting systems to reduce incident frequency
Respond to incidents in accordance with defined service level agreements
Provide ongoing feedback to development teams on system defects, stability, and improvement opportunities
Create and maintain runbooks and operational patterns for production application support
Collaborate with IT and development teams to define and implement KPI monitoring and real-time transaction tracking
Lead change validation efforts for deployments by infrastructure and development teams
Participate in advanced troubleshooting of production issues reported by users or customers
Take ownership of incident resolution, including root cause analysis and participation in SWAT investigations
Work rotating shifts as required to support continuous operations
Participate in on-call rotations to ensure 24/7/365 system coverage

Requirements

Extensive experience with Linux systems and proficiency in scripting languages such as Shell, Python, Perl, or JavaScript
Hands-on experience with public cloud platforms including Google Cloud, AWS, and Azure, as well as PaaS technologies like Kubernetes, Cloud Foundry, and BOSH
Operational knowledge of containerization technologies such as Docker, rkt, and Mesos, along with microservices and RESTful architectures
Proficiency with continuous delivery and automation tools such as GitOps, Ansible, Rundeck, or Argo CD
Experience supporting middleware and Java-based applications including Apache, Tomcat, Spring, Struts, and Spark
Familiarity with relational and NoSQL databases including Oracle, Postgres, MariaDB, and Cassandra
Strong understanding of monitoring and observability tools such as New Relic, Dynatrace, AppDynamics, Zabbix, and check_mk, as well as logging platforms like Graylog and Kibana
Experience with messaging and search technologies including Kafka, RabbitMQ, Solr, and Elasticsearch
Proven ability to diagnose and resolve complex issues in high-volume environments with adherence to security and ITIL standards
Demonstrated leadership and collaboration skills with the ability to manage multiple priorities and work across teams

Tech Stack

Linux, Shell, Python, Perl, JavaScript, Google Cloud, AWS, Azure, Kubernetes, Cloud Foundry, BOSH, Docker, rkt, Mesos, microservices, RESTful architectures, GitOps, Ansible, Rundeck, Argo CD, Apache, Tomcat, Spring, Struts, Spark

Benefits

Comprehensive benefits package supporting physical, emotional, and financial wellbeing
Eligibility for variable and commission-based compensation
Vacation entitlement
Paid time off

Compensation

$103,250 - $153,250. Compensation may vary based on candidate’s education, experience, skills, geographical location, and alignment with internal equity and external market

Team

Part of a cloud DevOps organization, collaborating cross-functionally with development teams and IT business partners

Innovation
Creativity
Collaboration
AI-First
Future-Driven
Human-Centered

Additional Information

This role operates in a dynamic, agile environment with frequent deployments and rapid incident response cycles.
Candidates must be comfortable working in a high-pressure, on-call environment with mission-critical systems.
Strong documentation and communication skills are essential for effective cross-team collaboration.
Opportunities for professional growth and specialization in cloud-native technologies are supported.
Regular participation in post-incident reviews and system improvement initiatives is expected.

OpenText is hiring a Lead Site Reliability Administrator

Responsibilities

Requirements

Tech Stack

Benefits

Compensation

Team

Additional Information

Similar Jobs

Senior Infrastructure Engineer /DevOps

Implementation Engineer

DevOPS Engineer

Senior Platform Engineer / Senior Devops Engineer

Senior Technical Consultant | Public Sector | Remote

Entry Level - Site Reliability Engineer (Remote - Ireland)

Related Articles

Platform Engineering: Kubernetes for All

Become an AI Developer: Your Career Guide

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026