hybrid Hybrid

NVIDIA is hiring a Senior Software Development Engineer, TensorRT-LLM

About the Role

The role involves designing and improving software for large-scale language model inference, with a focus on performance, correctness, and integration within AI ecosystems.

Responsibilities

  • Develop and optimize core components for large language model inference
  • Improve runtime performance and memory efficiency on GPU architectures
  • Collaborate on building scalable inference solutions for production use
  • Implement new features in the inference engine for advanced model support
  • Work closely with research and framework teams to integrate new technologies
  • Diagnose and resolve performance bottlenecks in distributed systems
  • Contribute to kernel optimization for tensor operations
  • Support model compatibility across different AI frameworks
  • Ensure correctness and numerical accuracy in inference outputs
  • Participate in code reviews and maintain high code quality standards
  • Document technical designs and system behavior for team reference
  • Drive improvements in latency, throughput, and scalability
  • Assist in debugging complex system-level issues
  • Contribute to open-source projects related to inference
  • Stay current with advancements in AI model architectures
  • Collaborate on benchmarking and profiling tools
  • Optimize for real-time and batch inference scenarios
  • Support deployment in cloud and data center environments
  • Work on model quantization and compression techniques
  • Integrate security and safety features into inference pipelines

Nice to Have

  • Advanced degree in computer science or related technical field
  • Direct experience with large language model inference
  • Contributions to open-source deep learning projects
  • Experience with model quantization techniques
  • Knowledge of safety and alignment in AI systems
  • Familiarity with CI/CD pipelines for AI software
  • Background in compiler optimization for AI workloads
  • Experience with kernel fusion and graph optimization
  • Published work in systems or AI conferences
  • Leadership in technical design and architecture

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model

Team

Part of the accelerated computing and AI software team

About the Team

  • The team focuses on building high-performance software for AI inference, enabling faster and more efficient deployment of large language models across industries.
  • Work is centered on pushing the boundaries of what's possible with GPU-accelerated computing in AI applications.

Why Join Us

  • Opportunity to work on cutting-edge AI technologies at scale.
  • Collaborative environment with experts in systems, machine learning, and compilers.
  • Impact on real-world AI deployment across cloud, enterprise, and research domains.

Available for qualified candidates

Required Skills
C++PythonTensorFlowPyTorchCUDAOpenCLCMakeLLMDeep LearningHigh-Performance ComputingGPU ProgrammingDistributed SystemsModel Optimization C++PythonTensorFlowPyTorchCUDAOpenCLCMakeLLMDeep LearningHigh-Performance ComputingGPU ProgrammingDistributed SystemsModel Optimization
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Category other
Posted 7 months ago