About the Role
Role details below.
Responsibilities
- Develop and manage infrastructure automation using tools like Ansible, Terraform, Helm charts, and Golang to scale and maintain the cloud environment efficiently
- Apply best practices in Infrastructure-as-Code (IaC), automating cloud operations to improve deployment speed and reduce errors
- Investigate and integrate new open-source tools and technologies to enhance cloud capabilities and introduce new features
- Troubleshoot and resolve issues in the cloud, ensuring the continued smooth operation of critical components and keeping them up-to-date
- Work with technologies such as OpenShift Kubernetes Distribution (OKD), OpenStack, Ceph, and Kubernetes
- Collaborate with other teams, contribute to documentation, and assist developers by providing best practices for deploying and maintaining services on the cloud platform
- Contribute to runbooks and participate in the on-call rotation to ensure system reliability
- Work with advanced monitoring and alerting systems to ensure the health and performance of cloud services
Requirements
- Strong experience with Linux, networking, and OpenStack, Ceph, including deploying, configuring, and managing clusters
- Knowledge of Kubernetes, OKD, and familiarity with GitOps methodologies
- Familiarity with Programming (preferably Golang): experience with writing k8s controllers/operators and Prometheus exporters
- A sharp analytical mindset with the ability to troubleshoot, identify root causes, and resolve complex issues within the cloud infrastructure
- Ability to work effectively within a team, support users, and help build new cloud features
- Quick to learn new technologies and tools, staying ahead of the curve to drive improvements in cloud services
Nice to Have
- Infrastructure-as-Code (IaC) tools like Terraform and Ansible
Additional Information
- Collaboration with other teams is expected
- Contribution to documentation and runbooks is required