Germany, Remote, Switzerland, Zurich, Germany, Munich, Germany, Berlin remote Full-time

NVIDIA is hiring a Senior HPC and AI Networking Performance Research and Analysis Engineer

About the Role

As a Senior HPC and AI Networking Performance Research and Analysis Engineer, you will investigate and enhance the performance of AI workloads running on extensive GPU and CPU systems. Your primary focus will be on distributed deep learning applications, particularly large language model training and inference, where communication patterns and network efficiency play a critical role.

Key Responsibilities

  • Conduct in-depth profiling and analysis of AI workloads to uncover performance bottlenecks, especially in communication and data transfer layers
  • Design and execute benchmarking strategies to evaluate system behavior under real-world conditions
  • Collaborate with hardware and software teams to assess performance across CPUs, GPUs, host channel adapters, and network switches
  • Develop and apply simulation models, performance tools, and analytical methods to diagnose system limitations
  • Investigate low-level system interactions to determine root causes of performance issues
  • Establish performance baselines and define testing strategies for emerging technologies
  • Guide optimization efforts to achieve maximum system throughput and efficiency

Qualifications

Applicants should hold a Bachelor's degree in Computer Science or Software Engineering and bring at least six years of hands-on experience in high-performance networking. Essential skills include deep familiarity with RDMA, MPI, NCCL, and networking protocols such as RoCE. Proficiency in Python, Bash, and C is required, along with strong Linux system knowledge.

Experience with NVIDIA GPUs, CUDA libraries, and deep learning frameworks like TensorFlow or PyTorch is necessary. Demonstrated ability in performance analysis, problem solving, and cross-team collaboration is essential.

Preferred Background

  • Proven track record in benchmarking AI workloads, especially for distributed LLM training
  • Strong understanding of CUDA and NCCL internals
  • Comprehensive knowledge of system architecture, including CPUs (Intel, AMD, ARM), GPUs, memory, and PCI subsystems
  • Familiarity with congestion control mechanisms in high-speed networks
Required Skills
RDMAMPINCCLRoCECUDATensorFlowPyTorchPythonBashCPerformance AnalysisDistributed Deep LearningCollective CommunicationHPCNetworking RDMAMPINCCLRoCECUDATensorFlowPyTorchPythonBashCPerformance AnalysisDistributed Deep LearningCollective CommunicationHPCNetworking
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Department Performance group
Category data
Posted 2 months ago