About the Role
Develop and maintain scalable AI infrastructure solutions that support high-performance computing environments for advanced machine learning applications.
Responsibilities
- Architect robust systems for AI-driven data center operations
- Optimize infrastructure performance for large-scale model training
- Collaborate on developing automation tools for system deployment
- Ensure reliability and scalability of compute and storage resources
- Integrate hardware and software components for AI workloads
- Troubleshoot complex system-level issues in distributed environments
- Support the deployment of AI-optimized server platforms
- Improve system efficiency through performance benchmarking
- Develop monitoring solutions for infrastructure health
- Work closely with research and development teams on system requirements
- Implement secure and resilient infrastructure designs
- Contribute to the design of next-generation data center technologies
- Manage firmware and driver integration for AI platforms
- Enable low-latency communication across computing nodes
- Support rapid iteration cycles for infrastructure upgrades
- Ensure compatibility across heterogeneous computing architectures
- Drive improvements in power and thermal efficiency
- Participate in defining standards for AI infrastructure
- Maintain documentation for system configurations and processes
- Respond to critical production incidents with speed and precision
- Evaluate emerging technologies for potential integration
- Optimize resource utilization in virtualized environments
- Support global deployment of AI infrastructure solutions
- Collaborate across engineering disciplines for end-to-end delivery
- Contribute to capacity planning for growing AI workloads
Nice to Have
- Master’s or PhD in a technical field
- Direct experience with AI model training infrastructure
- Background in cloud-scale data center operations
- Experience with RDMA and high-speed interconnects
- Knowledge of AI frameworks like TensorFlow or PyTorch
- Familiarity with hardware accelerators beyond GPU
- Experience with large-scale fleet management
- Contributions to open-source infrastructure projects
- Published work in systems or AI conferences
- Experience with formal verification methods
Compensation
Competitive salary and comprehensive benefits package including equity incentives and performance bonuses.
Work Arrangement
Hybrid work model with flexibility based on role and location.
Team
Part of a global engineering team focused on advancing AI infrastructure at scale.
About DGXC Lepton
- DGXC Lepton is a specialized initiative focused on accelerating AI infrastructure innovation through tightly integrated hardware and software solutions.
- The team develops next-generation platforms that power large-scale AI training and inference workloads.
- Projects emphasize performance, scalability, and efficiency in data center environments.
Impact
- Engineers contribute directly to foundational technologies enabling breakthroughs in artificial intelligence.
- Work impacts a wide range of applications from scientific research to industrial AI deployment.
- Solutions are deployed globally across enterprise and cloud environments.
Visa sponsorship is available for qualified candidates.

