Requirements
- 5+ years of engineering management experience, ideally with at least part of that leading teams on critical-path production infrastructure at scale
- deep systems background — load balancing, scheduling, cache-coherent distributed state, high-performance networking, or similar. You need enough depth to make architectural calls about routing and efficiency, and to evaluate candidates who go to the kernel and framework level
- shipped performance improvements in large-scale systems and can explain, with numbers, what the impact was
- run production infrastructure with real operational stakes: on-call, incident response, capacity events, deploy discipline
- results-oriented with a bias toward impact, and comfortable working in a space where throughput, latency, stability, and feature velocity all pull in different directions
- build strong relationships across team boundaries — this is a seam role, and much of the job is making sure other teams can rely on yours
- curious about machine learning systems. You don't need an ML research background, but you should want to learn how transformer inference actually works and how that shapes the systems problems
Nice to Have
- Experience with LLM inference serving — KV caching, continuous batching, request scheduling, prefill/decode disaggregation
- Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale
- Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and how hardware differences affect workload placement
- Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging — enough to follow and evaluate the technical work, not necessarily to do it daily
- Led teams at supercomputing or hyperscaler infrastructure scale
- Led teams through rapid-growth periods where hiring and onboarding competed with roadmap delivery