6–9+ years in SRE / Platform / Infrastructure Engineering
Proven experience scaling Kubernetes in high-throughput production environments
Deep knowledge of: - Scheduler behavior - StatefulSets & persistent workloads - Autoscaling strategies (HPA, VPA, KEDA, custom scaling) - Resource management & performance tuning - Multi-cluster and multi-region architectures
Experience in diagnosing production failures at the cluster scale
Strong Terraform or Crossplane experience
GitOps workflows (ArgoCD / Flux) experience
CI/CD reliability experience
Automation-first mindset
AWS production experience
Proficiency in Go (strongly preferred) or equivalent systems language

Experience with web3 concepts (e.g. blockchain node lifecycle, forks, reorgs, or RPC issues)
Experience with oracle systems, token architectures, or decentralized services
Experience scaling stateful high-availability distributed systems
Experience building internal platform primitives
Experience implementing custom autoscaling logic
Experience designing SLO strategies and error-budget usage
Experience improving diagnosability and observability frameworks
Experience working in high-ambiguity environments
Experience operating blockchain infrastructure in production
Certified Kubernetes Administrator (CKA)
Experience contributing to Kubernetes ecosystem projects
Experience building multi-tenant platform infrastructure
Experience working in high-security and/or SOC 2/ISO27001 compliant environments
Experience with chaos engineering practices or implementation

Chainlink Labs is hiring a Senior Site Reliability Engineer, Node Platform

Similar Jobs