Requirements
- 6–9+ years in SRE / Platform / Infrastructure Engineering
- Proven experience scaling Kubernetes in high-throughput production environments
- Deep knowledge of: - Scheduler behavior - StatefulSets & persistent workloads - Autoscaling strategies (HPA, VPA, KEDA, custom scaling) - Resource management & performance tuning - Multi-cluster and multi-region architectures
- Experience in diagnosing production failures at the cluster scale
- Strong Terraform or Crossplane experience
- GitOps workflows (ArgoCD / Flux) experience
- CI/CD reliability experience
- Automation-first mindset
- AWS production experience
- Proficiency in Go (strongly preferred) or equivalent systems language
Nice to Have
- Experience with web3 concepts (e.g. blockchain node lifecycle, forks, reorgs, or RPC issues)
- Experience with oracle systems, token architectures, or decentralized services
- Experience scaling stateful high-availability distributed systems
- Experience building internal platform primitives
- Experience implementing custom autoscaling logic
- Experience designing SLO strategies and error-budget usage
- Experience improving diagnosability and observability frameworks
- Experience working in high-ambiguity environments
- Experience operating blockchain infrastructure in production
- Certified Kubernetes Administrator (CKA)
- Experience contributing to Kubernetes ecosystem projects
- Experience building multi-tenant platform infrastructure
- Experience working in high-security and/or SOC 2/ISO27001 compliant environments
- Experience with chaos engineering practices or implementation