United States USD 103,250 – 153,250 / year

OpenText is hiring a Lead Site Reliability Administrator

The Lead Site Reliability Administrator plays a critical role in ensuring the stability, scalability, and performance of cloud-based services. This position bridges the gap between development and operations by implementing automation, proactive monitoring, and robust incident management practices. The role requires deep technical expertise in cloud infrastructure, containerization, and DevOps tooling, along with strong leadership skills to guide reliability initiatives across teams. The administrator will be responsible for maintaining high availability, driving continuous improvement, and ensuring systems meet stringent service level objectives in a fast-paced, globally distributed environment.

Responsibilities

  • Design and implement solutions to improve service availability, performance, and operational stability
  • Automate routine tasks and processes within a cloud DevOps environment to increase efficiency
  • Develop proactive monitoring and alerting systems to reduce incident frequency
  • Respond to incidents in accordance with defined service level agreements
  • Provide ongoing feedback to development teams on system defects, stability, and improvement opportunities
  • Create and maintain runbooks and operational patterns for production application support
  • Collaborate with IT and development teams to define and implement KPI monitoring and real-time transaction tracking
  • Lead change validation efforts for deployments by infrastructure and development teams
  • Participate in advanced troubleshooting of production issues reported by users or customers
  • Take ownership of incident resolution, including root cause analysis and participation in SWAT investigations
  • Work rotating shifts as required to support continuous operations
  • Participate in on-call rotations to ensure 24/7/365 system coverage

Requirements

  • Extensive experience with Linux systems and proficiency in scripting languages such as Shell, Python, Perl, or JavaScript
  • Hands-on experience with public cloud platforms including Google Cloud, AWS, and Azure, as well as PaaS technologies like Kubernetes, Cloud Foundry, and BOSH
  • Operational knowledge of containerization technologies such as Docker, rkt, and Mesos, along with microservices and RESTful architectures
  • Proficiency with continuous delivery and automation tools such as GitOps, Ansible, Rundeck, or Argo CD
  • Experience supporting middleware and Java-based applications including Apache, Tomcat, Spring, Struts, and Spark
  • Familiarity with relational and NoSQL databases including Oracle, Postgres, MariaDB, and Cassandra
  • Strong understanding of monitoring and observability tools such as New Relic, Dynatrace, AppDynamics, Zabbix, and check_mk, as well as logging platforms like Graylog and Kibana
  • Experience with messaging and search technologies including Kafka, RabbitMQ, Solr, and Elasticsearch
  • Proven ability to diagnose and resolve complex issues in high-volume environments with adherence to security and ITIL standards
  • Demonstrated leadership and collaboration skills with the ability to manage multiple priorities and work across teams

Tech Stack

Linux, Shell, Python, Perl, JavaScript, Google Cloud, AWS, Azure, Kubernetes, Cloud Foundry, BOSH, Docker, rkt, Mesos, microservices, RESTful architectures, GitOps, Ansible, Rundeck, Argo CD, Apache, Tomcat, Spring, Struts, Spark

Benefits

  • Comprehensive benefits package supporting physical, emotional, and financial wellbeing
  • Eligibility for variable and commission-based compensation
  • Vacation entitlement
  • Paid time off

Compensation

$103,250 - $153,250. Compensation may vary based on candidate’s education, experience, skills, geographical location, and alignment with internal equity and external market

Team

Part of a cloud DevOps organization, collaborating cross-functionally with development teams and IT business partners

  • Innovation
  • Creativity
  • Collaboration
  • AI-First
  • Future-Driven
  • Human-Centered

Additional Information

  • This role operates in a dynamic, agile environment with frequent deployments and rapid incident response cycles.
  • Candidates must be comfortable working in a high-pressure, on-call environment with mission-critical systems.
  • Strong documentation and communication skills are essential for effective cross-team collaboration.
  • Opportunities for professional growth and specialization in cloud-native technologies are supported.
  • Regular participation in post-incident reviews and system improvement initiatives is expected.
Required Skills
LinuxPythonPerlJavaScriptGCPAWSMicrosoft AzureKubernetesDockerMicroservices
About company
OpenText
OpenText is a global leader in information management. At OpenText, AI is at the heart of everything we do—powering innovation, transforming work, and empowering digital knowledge workers.
All jobs at OpenText Visit website
Job Details
Department Engineering
Category infrastructure
Posted 3 months ago