About the Role

The role involves designing and improving software for large-scale language model inference, with a focus on performance, correctness, and integration within AI ecosystems.

Responsibilities

Develop and optimize core components for large language model inference
Improve runtime performance and memory efficiency on GPU architectures
Collaborate on building scalable inference solutions for production use
Implement new features in the inference engine for advanced model support
Work closely with research and framework teams to integrate new technologies
Diagnose and resolve performance bottlenecks in distributed systems
Contribute to kernel optimization for tensor operations
Support model compatibility across different AI frameworks
Ensure correctness and numerical accuracy in inference outputs
Participate in code reviews and maintain high code quality standards
Document technical designs and system behavior for team reference
Drive improvements in latency, throughput, and scalability
Assist in debugging complex system-level issues
Contribute to open-source projects related to inference
Stay current with advancements in AI model architectures
Collaborate on benchmarking and profiling tools
Optimize for real-time and batch inference scenarios
Support deployment in cloud and data center environments
Work on model quantization and compression techniques
Integrate security and safety features into inference pipelines

Nice to Have

Advanced degree in computer science or related technical field
Direct experience with large language model inference
Contributions to open-source deep learning projects
Experience with model quantization techniques
Knowledge of safety and alignment in AI systems
Familiarity with CI/CD pipelines for AI software
Background in compiler optimization for AI workloads
Experience with kernel fusion and graph optimization
Published work in systems or AI conferences
Leadership in technical design and architecture

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model

Team

Part of the accelerated computing and AI software team

About the Team

The team focuses on building high-performance software for AI inference, enabling faster and more efficient deployment of large language models across industries.
Work is centered on pushing the boundaries of what's possible with GPU-accelerated computing in AI applications.

Why Join Us

Opportunity to work on cutting-edge AI technologies at scale.
Collaborative environment with experts in systems, machine learning, and compilers.
Impact on real-world AI deployment across cloud, enterprise, and research domains.

Available for qualified candidates

NVIDIA is hiring a Senior Software Development Engineer, TensorRT-LLM

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

About the Team

Why Join Us

Similar Jobs

Staff Machine Learning Engineer

Sr Developer -AI Engineer

Principal Software Engineer - AI/ML (Ireland)

Senior Computer Vision Engineer

Machine Learning Engineer

Machine Learning Engineer III

Related Articles

Become an AI Developer: Your Career Guide