Responsibilities
- Design and maintain extensive OpenShift clusters.
- Oversee core cluster components including API server, scheduler, controllers, MCO, ingress, registry, and SDN/OVN-Kubernetes.
- Execute zero-downtime upgrades and manage version transitions and rollout/rollback strategies.
- Create cluster capacity models, resource plans, HA/DR topology, and infrastructure/worker segmentation.
- Manage Subscription channels, InstallPlans, CSV transitions, CRDs, Catalog Sources, and operator dependencies.
- Diagnose and resolve operator failures, operand issues, and API deprecations.
- Provide guidance to developers on operator usage and GitOps workflows.
- Serve as the primary technical escalation point for cluster, node, networking, storage, registry, ingress, and workload issues.
- Conduct root cause analysis, develop problem prevention plans, and offer stability recommendations.
- Manage CSI drivers, snapshotting, cloning, PV/PVC design, and storage topology.
- Implement and manage Velero/OADP backup and application-level restore workflows.
- Define disaster recovery strategy, namespace recovery, and cluster rebuild workflows.
- Implement CIS benchmarks, RBAC/SCC policies, audit logs, and TLS/hardening configurations.
- Manage authentication integrations (AD/LDAP/OAuth).
- Ensure compliance with enterprise governance, patching standards, image policies, and vulnerability remediation.
- Automate infrastructure operations using Ansible, Terraform, Helm, and Bash/Python scripting.
- Integrate OpenShift with CI/CD pipelines (GitOps, Jenkins, Tekton, Argo CD).
- Develop reusable automation frameworks for cluster provisioning and operational workflows.
- Configure and optimize Prometheus, Alertmanager, Grafana, Loki/EFK stack.
- Create actionable dashboards, alert rules, and log routing pipelines.
- Lead performance tuning and SLO/SLI management.
- Mentor L1/L2 support teams and provide knowledge transfer, SOPs, and runbooks.
- Lead war-rooms, incident bridges, and cross-team collaboration (Network, Security, VMware, Storage).
- Represent the platform team in architecture and customer-facing discussions.
Requirements
- Proficient in Linux (RHEL/SUSE/Oracle Linux) administration.
- Deep knowledge of Kubernetes internals and OpenShift 3.x/4.x architecture.
- Experience with VMware vSphere (HA/DRS/Networking) and container registries.
- Expertise in Machine Config Operator, Ingress/Route architecture, SDN/OVN networking, Node lifecycle operations, CRI-O/Docker, CNI, CSI.
- Strong troubleshooting skills using oc/kubectl, journald, tcpdump, strace, Wireshark, systemd tools.
- Solid understanding of Git, DevOps, automation, and infrastructure-as-code workflows.
- Strong Linux/RHCOS experience and working knowledge of systemd, SELinux, and OS hardening.
- Expertise with oc, kubectl, YAML, RBAC, quotas, projects, and Operators.
- Operational experience with VMware vSphere: Cluster HA/DRS, VM placement and sizing, Datastore troubleshooting.
- Fundamentals of storage: CSI, ONTAP/Trident, NFS, iSCSI, PV/PVC lifecycle.
- Fundamentals of networking: SDN, routing, MTU, ingress controllers, load balancers.
- Fundamentals of backup and restore (Velero/PowerProtect).
- Experience with monitoring tools (Prometheus, Alertmanager, Grafana).
- Develop and maintain runbooks, SOPs, and automation scripts with quarterly reviews and version history.
- Collaborate with engineers for issue escalation and resolution.
- Experience working with cross-functional teams (Cloud, Security, Network, Developers, Firewall teams).
- Ability to handle on-call rotation and work in a 24/7 support environment.
- Understanding of ITSM workflows to handle P4, P3, P2, P1 SLAs.
- Collaborate with VMware, Network, Storage, Security, and Application teams.
- Familiarity with CI/CD concepts and automation workflows.
- Experience with Bash/Shell scripting.
Nice to Have
- Experience with various ticketing and monitoring tools.
- Vendor coordination skills.
- Red Hat Certified Specialist in OpenShift Administration certification.
- Certifications in Terraform/Ansible, CKA - Certified Kubernetes Administration.
Benefits
- Regular team meetings.
- Employee referral program.
- Comprehensive learning and development opportunities.
- Company pension plan.
- Highly international, high-performance culture.
- Diverse and inclusive work environment.
- Women's network.
Compensation
Not specified
Work Arrangement
On-site
Team
Cross-functional teams
Other
Cloud AWS Admin
Not specified


