San Francisco, CA | New York City, NY Hybrid Employment $405,000 - $485,000 USD

Anthropic is hiring an Engineering Manager, Inference Routing and Performance

Requirements

5+ years of engineering management experience, ideally with at least part of that leading teams on critical-path production infrastructure at scale
deep systems background — load balancing, scheduling, cache-coherent distributed state, high-performance networking, or similar. You need enough depth to make architectural calls about routing and efficiency, and to evaluate candidates who go to the kernel and framework level
shipped performance improvements in large-scale systems and can explain, with numbers, what the impact was
run production infrastructure with real operational stakes: on-call, incident response, capacity events, deploy discipline
results-oriented with a bias toward impact, and comfortable working in a space where throughput, latency, stability, and feature velocity all pull in different directions
build strong relationships across team boundaries — this is a seam role, and much of the job is making sure other teams can rely on yours
curious about machine learning systems. You don't need an ML research background, but you should want to learn how transformer inference actually works and how that shapes the systems problems

Nice to Have

Experience with LLM inference serving — KV caching, continuous batching, request scheduling, prefill/decode disaggregation
Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale
Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and how hardware differences affect workload placement
Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging — enough to follow and evaluate the technical work, not necessarily to do it daily
Led teams at supercomputing or hyperscaler infrastructure scale
Led teams through rapid-growth periods where hiring and onboarding competed with roadmap delivery

Required Skills

LLM inference servingheterogeneous accelerator fleetsGPU/accelerator programmingML framework internalsor OS-level performance debugging LLM inference servingheterogeneous accelerator fleetsGPU/accelerator programmingML framework internalsor OS-level performance debugging

Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation

Project choice & autonomy

International client base

Career growth support

Check compensation

Top earners exceed market rate

About company

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.

All jobs at Anthropic Visit website

Job Details

Department Inference Routing

Category management

Posted 2 hours ago