About the Role
Role details below.
Responsibilities
- Design, build, and maintain infrastructure across AWS, GCP, and Azure using Infrastructure as Code (IaC) principles
- Implement and optimize CI/CD pipelines using tools like Argo and CircleCI to enable rapid, reliable deployments
- Manage and scale Kubernetes clusters in production environments, ensuring high availability and optimal resource utilization
- Administer and optimize cloud databases including MongoDB, Redis, RDS, and other data stores for performance and reliability
- Develop monitoring, alerting, and observability solutions to identify and resolve issues before they impact users
- Automate routine operational tasks to reduce manual toil and improve system reliability
- Conduct incident response and post-mortem analysis to drive continuous improvement
- Collaborate with development teams to design systems with reliability, scalability, and operational excellence in mind
- Document infrastructure architecture, runbooks, and operational procedures
- Evaluate and implement new tools and technologies to improve platform capabilities
Work Arrangement
Remote (Worldwide)