About the Role

Develop and maintain scalable AI infrastructure solutions that support high-performance computing environments for advanced machine learning applications.

Responsibilities

Architect robust systems for AI-driven data center operations
Optimize infrastructure performance for large-scale model training
Collaborate on developing automation tools for system deployment
Ensure reliability and scalability of compute and storage resources
Integrate hardware and software components for AI workloads
Troubleshoot complex system-level issues in distributed environments
Support the deployment of AI-optimized server platforms
Improve system efficiency through performance benchmarking
Develop monitoring solutions for infrastructure health
Work closely with research and development teams on system requirements
Implement secure and resilient infrastructure designs
Contribute to the design of next-generation data center technologies
Manage firmware and driver integration for AI platforms
Enable low-latency communication across computing nodes
Support rapid iteration cycles for infrastructure upgrades
Ensure compatibility across heterogeneous computing architectures
Drive improvements in power and thermal efficiency
Participate in defining standards for AI infrastructure
Maintain documentation for system configurations and processes
Respond to critical production incidents with speed and precision
Evaluate emerging technologies for potential integration
Optimize resource utilization in virtualized environments
Support global deployment of AI infrastructure solutions
Collaborate across engineering disciplines for end-to-end delivery
Contribute to capacity planning for growing AI workloads

Nice to Have

Master’s or PhD in a technical field
Direct experience with AI model training infrastructure
Background in cloud-scale data center operations
Experience with RDMA and high-speed interconnects
Knowledge of AI frameworks like TensorFlow or PyTorch
Familiarity with hardware accelerators beyond GPU
Experience with large-scale fleet management
Contributions to open-source infrastructure projects
Published work in systems or AI conferences
Experience with formal verification methods

Compensation

Competitive salary and comprehensive benefits package including equity incentives and performance bonuses.

Work Arrangement

Hybrid work model with flexibility based on role and location.

Team

Part of a global engineering team focused on advancing AI infrastructure at scale.

About DGXC Lepton

DGXC Lepton is a specialized initiative focused on accelerating AI infrastructure innovation through tightly integrated hardware and software solutions.
The team develops next-generation platforms that power large-scale AI training and inference workloads.
Projects emphasize performance, scalability, and efficiency in data center environments.

Impact

Engineers contribute directly to foundational technologies enabling breakthroughs in artificial intelligence.
Work impacts a wide range of applications from scientific research to industrial AI deployment.
Solutions are deployed globally across enterprise and cloud environments.

Visa sponsorship is available for qualified candidates.

NVIDIA is hiring an AI Infrastructure Engineer, DGXC Lepton

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

About DGXC Lepton

Impact

Similar Jobs

Database Platform Engineer

Principal + Staff Software Engineers

Lead Engineer – Platform & Infrastructure

KTO - Platform Engineer - SRE - Lever

Data & ML Platform Engineer (Hybrid)

Senior Engineer - Site Reliability Engineering

Related Articles

Become an AI Developer: Your Career Guide

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026