Requirements
- 5+ years of demonstrated experience building large-scale, fault-tolerant, distributed systems and API microservices.
- Strong background in designing, analyzing, and improving efficiency, scalability, and stability of complex systems.
- Excellent understanding of low-level OS concepts: multi-threading, memory management, networking, and storage performance.
- Expert-level programming in one or more of: Rust, Go, Python, or TypeScript.
- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience.
Nice to Have
- Knowledge of modern LLMs and generative models and how they are served in production is a plus.
- Experience working with the open source ecosystem around inference is highly valuable; familiarity with SGLang, vLLM, or NVIDIA Dynamo will be especially handy.
- Experience with Kubernetes or container orchestration is a strong plus.
- Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies (InfiniBand, NVLink, MPI) is a plus.