NVIDIA is looking for a Senior AI Performance and Efficiency Engineer to play a pivotal role in advancing AI and ML research on GPU clusters. You will enhance efficiency for researchers by implementing progressions throughout the entire stack, collaborating closely with customers to identify and resolve infrastructure and application deficiencies.
What You'll Do
- Collaborate closely with AI/ML researchers to make their ML models more efficient, leading to significant productivity improvements and cost savings.
- Build tools, frameworks, and apply ML techniques to detect & analyze efficiency bottlenecks and deliver productivity improvements for researchers.
- Work with researchers on a variety of innovative ML workloads across Robotics, Autonomous vehicles, LLM’s, Videos and more.
- Collaborate across engineering organizations to deliver efficiency in usage of hardware, software, and infrastructure.
- Proactively monitor fleet-wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them.
- Keep up to date with the most recent developments in AI/ML technologies, frameworks, and successful strategies, and advocate for their integration within the organization.
What We're Looking For
- BS or similar background in Computer Science or related area (or equivalent experience).
- Minimum 8+ years of experience designing and operating large scale compute infrastructure.
- Strong understanding of modern ML techniques and tools.
- Experience investigating, and resolving, training & inference performance end to end.
- Debugging and optimization experience with NSight Systems and NSight Compute.
- Experience with debugging large-scale distributed training using NCCL.
- Proficiency in programming & scripting languages such as Python, Go, Bash.
- Familiarity with cloud computing platforms (e.g., AWS, GCP, Azure).
- Experience with parallel computing frameworks and paradigms.
- Dedication to ongoing learning and staying updated on new technologies and innovative methods in the AI/ML infrastructure sector.
- Excellent communication and collaboration skills, with the ability to work effectively with teams and individuals of different backgrounds.
Nice to Have
- Background with NVIDIA GPUs, CUDA Programming, NCCL and MLPerf benchmarking.
- Experience with Machine Learning and Deep Learning concepts, algorithms and models.
- Familiarity with InfiniBand with IBOP and RDMA.
- Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloads.
- Familiarity with deep learning frameworks like PyTorch and TensorFlow.
Technical Stack
- Languages: Python, Go, Bash
- Cloud: AWS, GCP, Azure
- NVIDIA Tools: NSight Systems, NSight Compute, NCCL, CUDA
- Frameworks: PyTorch, TensorFlow
- Infrastructure: InfiniBand, RDMA, Lustre, GPFS
Benefits & Compensation
- Competitive salaries
- Comprehensive benefits package
- Equity
- Compensation: 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5 + equity.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.




