About the Role

This role involves developing and optimizing system software to support partner integration and deployment of high-performance computing solutions, with a focus on improving performance, scalability, and collaboration across distributed systems.

Responsibilities

Design and optimize low-level software components for distributed computing environments
Collaborate with partner engineering teams to integrate communication libraries
Improve performance and scalability of system-level software in GPU-accelerated clusters
Diagnose and resolve complex software issues impacting partner deployments
Develop tools and frameworks to streamline integration workflows
Support debugging and tuning of communication primitives across hardware platforms
Contribute to the evolution of collective communication algorithms
Work closely with hardware and driver teams to ensure compatibility
Produce technical documentation for internal and external stakeholders
Assist partners in adopting optimized communication libraries
Analyze system bottlenecks and propose architectural improvements
Ensure software reliability under high-load conditions
Participate in code reviews and maintain code quality standards
Implement testing strategies for cross-platform validation
Stay current with advancements in parallel computing and networking
Optimize software for diverse data center configurations
Support performance benchmarking and profiling activities
Integrate feedback from partners into product enhancements
Contribute to open-source projects related to communication layers
Facilitate knowledge transfer between internal and external teams
Ensure compliance with software interface standards
Develop proof-of-concept implementations for new features
Collaborate on defining roadmap priorities for system software
Troubleshoot interoperability issues across software stacks
Promote best practices in system-level software development

Compensation

Competitive salary and benefits package commensurate with experience

Work Arrangement

Hybrid work model with flexibility based on role and location

Team

Part of a global engineering team focused on system software and partner collaboration

About the Team

This team focuses on developing core communication libraries that power large-scale AI and high-performance computing systems, enabling seamless integration across diverse hardware and software environments.

Why This Role Matters

The work directly impacts the efficiency and scalability of distributed computing solutions used by leading research and enterprise organizations worldwide.

Limited sponsorship may be available for qualified candidates

Nvidia is hiring a Senior System Software Engineer, NCCL - Partner Enablement

About the Role

Responsibilities

Compensation

Work Arrangement

Team

About the Team

Why This Role Matters

Similar Jobs

Implementation Engineer

Senior Software Engineer - Cloud

Software Engineer / DevOps

Principal + Staff Software Engineers

Trainee DevOps Engineer - Tieto Tech Consulting (m/f/d)

Staff / Senior Infrastructure Engineer (relocation)

Related Articles

Platform Engineering: Kubernetes for All

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026