Anthropic is looking for a Staff Software Engineer to join the Inference team. You will work end to end, identifying and addressing key infrastructure blockers to serve Claude to millions of users while enabling breakthrough AI research. The team is responsible for the entire stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators.
What You'll Do
- Build and maintain the critical systems that serve Claude to millions of users worldwide.
- Serve models via the industry's largest compute-agnostic inference deployments.
- Work on the entire stack from intelligent request routing to fleet-wide orchestration across diverse AI accelerators.
- Maximize compute efficiency to serve explosive customer growth.
- Enable breakthrough research by providing high-performance inference infrastructure for scientists.
- Tackle complex, distributed systems challenges across multiple accelerator families and emerging AI hardware in multiple cloud platforms.
- Design intelligent routing algorithms that optimize request distribution across thousands of accelerators.
- Autoscale our compute fleet to dynamically match supply with demand across production, research, and experimental workloads.
- Build production-grade deployment pipelines for releasing new models to millions of users.
- Integrate new AI accelerator platforms to maintain hardware-agnostic competitive advantage.
- Contribute to new inference features (e.g., structured sampling, prompt caching).
- Support inference for new model architectures.
- Analyze observability data to tune performance based on real-world production workloads.
- Manage multi-region deployments and geographic routing for global customers.
What We're Looking For
- Significant software engineering experience, particularly with distributed systems.
- At least a Bachelor's degree in a related field or equivalent experience.
- Familiarity with performance optimization, distributed systems, large-scale service orchestration, and intelligent request routing.
Nice to Have
- Familiarity with LLM inference optimization, batching strategies, and multi-accelerator deployments.
- High-performance, large-scale distributed systems experience.
- Experience implementing and deploying machine learning systems at scale.
- Experience with load balancing, request routing, or traffic management systems.
- Experience with LLM inference optimization, batching, and caching strategies.
- Experience with Kubernetes and cloud infrastructure (AWS, GCP).
- Experience with Python or Rust.
Technical Stack
- Languages: Python, Rust
- Orchestration: Kubernetes
- Cloud Platforms: AWS, GCP
Team & Environment
You will join the Inference team, a quickly growing group of committed researchers, engineers, policy experts, and business leaders. We are an extremely collaborative group with frequent research discussions and value communication skills. Anthropic is a public benefit corporation headquartered in San Francisco, and our mission is to create reliable, interpretable, and steerable AI systems.
Benefits & Compensation
- Competitive compensation: €295,000—€355,000 EUR
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Lovely office space.
Work Mode
This is a hybrid role. You will work from Anthropic offices with flexible working arrangements.
Anthropic is an equal opportunity employer.



