NVIDIA is looking for a Senior Software Engineer - NIM Factory Infrastructure to design and build factory automation for NVIDIA Inference Microservices (NIMs). You will apply deep technical expertise to create an efficient, scalable, and reliable automation pipeline that transforms AI models into validated, deployable NIMs.
What You'll Do
- Develop, analyze, and optimize factory infrastructure that takes an AI model in and produces a deployable service validated across Cloud, On-prem, and Kubernetes environments.
- Define and deliver rapid iterations on the group's technical strategies and roadmaps to deliver and improve the NIM factory.
- Develop harness, automate hardware acceptance, analyze benchmarks, gather data, and perform statistical analysis of systems health and performance of NIMs.
- Design and develop scalable and reliable factory acceptance and performance tuning of hardware platforms.
- Collaborate with multiple AI model teams to understand requirements and build efficient infrastructure that improves team productivity.
- Define metrics and drive improvements based on user feedback.
- Mentor and collaborate throughout the team and with other teams.
What We're Looking For
- History of using advanced programming skills to build tooling and automation for hardware system characterization and benchmarking.
- Proven experience debugging and analyzing performance of compute applications and systems.
- Deep technical expertise working with system software and platform layers including Kernel, device driver, memory, storage, networking, and PCIe devices.
- Experience working with hardware clusters, distributed systems, networking, GPU interconnects (PCIe, NVlink), node and cluster interconnect (InfiniBand).
- Passion for building platform engineering components and automation of system benchmarking and characterization.
- Excellent interpersonal skills and the ability to lead multi-functional efforts.
- BS or MS in Computer Science, Computer Engineering or related field (or equivalent experience).
- 5+ years of proven experience developing performant microservices, cloud software and/or tooling.
Nice to Have
- Experience delivering optimized system engineering environment for inference applications in data center and consumer grade hardware platforms.
- History of building and deploying automated benchmarking solutions in Cloud and On-prem environments, and their associated CI/CD pipelines.
- Prior experience in working with large scale compute infrastructure solutions.
Technical Stack
- Docker
- Kubernetes
- Cloud
- On-prem
- GPU
Benefits & Compensation
- Compensation: 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4. + equity: Eligible
- Equity
- Benefits (via NVIDIA benefits page)
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.




