Lead infrastructure engineering for a global multi-cloud SaaS platform, designing, building, and operating large-scale, multi-region Kubernetes environments across public and private clouds with a focus on reliability, scalability, and operational excellence.
Responsibilities
- Design and deploy multi-cluster, multi-region Kubernetes environments using EKS, GKE, and AKS
- Construct scalable infrastructure spanning multiple regions and cloud providers
- Own production infrastructure from design through operation
- Lead incident response, conduct postmortems, and implement changes to prevent future issues
- Develop and maintain reusable Terraform modules for complex infrastructure setups
- Manage extensive configuration across clusters, regions, and environments using GitOps
- Design and refine ArgoCD ApplicationSets and Helm chart structures
- Build automated deployment pipelines for safe, scalable releases across hundreds of microservices
- Analyze system performance, detect bottlenecks, and apply performance improvements
- Enhance service level objectives through capacity planning, autoscaling, and architectural refinements
- Develop and maintain monitoring, alerting, and observability systems using Prometheus, Grafana, Loki, and custom tools
- Increase visibility into distributed systems with complex interactions
- Enforce security policies, compliance standards, and secure-by-design principles across cloud environments
- Architect secure multi-tenant infrastructure
- Mentor team members, define best practices, and guide technical direction
- Collaborate with platform, SRE, and product teams to deliver resilient infrastructure
Requirements
- Minimum of five years in cloud infrastructure engineering, with strong expertise in at least one major cloud provider, preferably AWS
- Extensive Kubernetes experience including cluster design, operators, controllers, and managing multiple clusters
- Proficient in Infrastructure as Code tools such as Terraform, CloudFormation, or similar
- Expertise in GitOps workflows using ArgoCD, Flux, or equivalent, including ApplicationSets and advanced deployment patterns
- In-depth knowledge of Linux systems and networking
- Hands-on experience with distributed systems including Elasticsearch, PostgreSQL, Redis, Kafka, and RabbitMQ
- Experience with monitoring and observability tools such as Prometheus, Grafana, ELK stack, or similar
- Strong analytical and problem-solving skills for debugging complex distributed systems
- Proven experience implementing cloud security, compliance frameworks like SOC2 and ISO27001, and secure design principles
- Excellent communication skills for collaboration across time zones and with distributed teams
- Self-motivated with a demonstrated ability to own and resolve complex issues from start to finish
Nice to Have
- Experience designing and managing multi-cloud architectures and cloud-agnostic solutions
- Contributions to open-source infrastructure projects
- Familiarity with service mesh technologies such as Istio or Linkerd
- Knowledge of chaos engineering and reliability testing practices
- Experience in cost optimization and FinOps
Tech Stack
Kubernetes, EKS, GKE, AKS, Terraform, GitOps, ArgoCD, Helm, Prometheus, Grafana, Loki, ELK stack, AWS, GCP, Azure, CloudFormation, Elasticsearch, PostgreSQL, Redis, Kafka, RabbitMQ, Istio, Linkerd
Benefits
- Work on large-scale infrastructure with hundreds of Kubernetes clusters and thousands of services across the globe
- Exercise deep technical ownership by designing, building, and operating critical systems
- Use a modern technology stack including Kubernetes, GitOps, Infrastructure as Code, and cloud-native tools
- Make a measurable impact—infrastructure decisions influence millions of users
- Opportunities for professional growth through collaboration with experienced engineers on complex technical challenges
Work Arrangement
global — work with distributed teams across time zones
Team
group of talented engineers
- Committed to building highly reliable, scalable, and secure solutions
- Experiencing significant growth and success
- Seeks top-tier talent who are passionate about making a meaningful impact in their field
- Focused on customer-driven innovation and networking
- Agile and responsive to ensure customer and partner success
Additional Information
- Collaborate with distributed teams across multiple time zones
- Must be self-driven with a proven history of end-to-end problem ownership
- Company headquarters located in San Jose, California
- Serves 50,000 global customers, including half of the Fortune 50 companies
- Complies with equal employment opportunity laws and prohibits discrimination and harassment


