Responsibilities
- Design, deploy, and operate Kubernetes clusters (EKS or self-managed) on AWS, ensuring high availability and security
- Build and maintain CI/CD pipelines and internal developer tooling to improve engineering velocity
- Automate infrastructure provisioning and operational tasks using Python and tools like Terraform, OpenTofu, and Ansible
- Define and enforce platform standards around observability, cost management, and incident response
- Partner with application teams to support containerized workloads and resolve infrastructure bottlenecks
- Collaborate with Customer Success teams by providing reliable and scalable tooling that supports seamless customer onboarding, integrations, and service delivery
Requirements
- Solid hands-on experience with Kubernetes (cluster administration, Helm, RBAC, networking, etc)
- Strong AWS knowledge across core services — EC2, EKS, IAM, VPC, S3, RDS, and related tooling
- Proficiency in Python for scripting, automation, and building internal tools
- Familiarity with infrastructure-as-code practices (Terraform, OpenTofu, and Ansible)
- A collaborative mindset and comfort working in a fast-moving environment
- Knowledge of multi-account AWS strategies, AWS Organizations, and landing zone patterns for enterprise-scale environments
Nice to Have
- Experience with service meshes (Istio) for managing microservice communication, traffic policies, and mutual TLS
- GitOps workflows using ArgoCD or Flux for declarative, version-controlled infrastructure and application delivery
- Exposure to container security tooling such as Falco, Trivy, or similar and OPA or Kyverno for policy enforcement and vulnerability scanning
- Experience with observability stacks like Prometheus, Grafana, or the ELK/OpenSearch stack for metrics, logging, and distributed tracing
Work Arrangement
Hybrid