NVIDIA is looking for a Solutions Architect, DGX Cloud to guide and enable the successful adoption at scale of DGX Cloud and NVIDIA AI Enterprise Software in production. You will work closely with DGX Cloud Partners as their trusted technical advisor to ensure they accomplish their business goals.
What You'll Do
- Work closely with DGX Cloud Partners, become their trusted technical advisor, advocate for their needs, and ensure they are successful in accomplishing their business goals with the platform.
- Accelerate NVIDIA Cloud Partner onboarding time, cluster manageability and reliability.
- Scale knowledge, reach, and opportunities by building and educating vertical teams and communities on DGX Cloud and NVIDIA Reference Architectures.
- Communicate to our Reference Architecture teams findings gathered from the field.
- Provide technical education and facilitate field product feedback to improve DGX Cloud.
- Enable partners to participate in the DGX Cloud Ecosystem with the goal of end-user satisfaction and increased sales.
What We're Looking For
- Strong foundational expertise, from a BS, MS, or Ph.D. degree in Engineering, Mathematics, Physics, Computer Science, Data Science (or equivalent experience).
- 5+ years of proven experience with one or more Cloud Service Providers (AWS, Azure, GCP or OCI), NVIDIA Cloud Partners (CoreWeave, Lambda Labs, Crusoe, etc) and cloud-native architectures and software.
- Demonstrated experience in technical leadership, strong understanding of NVIDIA technologies, and success in working with customers.
- Expertise with parallel filesystems (e.g. Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects (InfiniBand, Omni Path, RoCE, and Gig-E).
- Strong coding and debugging skills, and demonstrated expertise in one or more of the following areas: Machine Learning, Deep Learning, Slurm, Kubernetes, MPI, MLOps, LLMOps, Ansible, Terraform, and other high-performance AI cluster solutions.
- Proficient in deploying GPU applications in Slurm, Kubernetes, docker, helm, registries.
- Experience with Linux-based configuration management and monitoring solutions, system administration, OS installation, configuration, and troubleshooting.
- Knowledge of networking technologies (e.g. router, firewall, load balancer, DNS, VPN) for complex infrastructure configuration.
Nice to Have
- Experience using DGX Cloud, NVIDIA AI Enterprise AI Software including Base Command Manager, NeMo, and NVIDIA's Inference Microservices.
- Experience with AI application development and deployment.
- Background with deploying and configuring observability tooling including Grafana, Prometheus, W&B, Nagios, Zabbix.
- Experience with high performance or large-scale computing environments.
Technical Stack
- Cloud: AWS, Azure, GCP, OCI
- Storage: Lustre, GPFS, BeeGFS, WekaIO
- Interconnect: InfiniBand, Omni Path, RoCE, Gig-E
- AI/ML: Machine Learning, Deep Learning, MLOps, LLMOps
- Orchestration: Slurm, Kubernetes, MPI
- Infrastructure as Code: Ansible, Terraform
- Containers: docker, helm
- Observability: Grafana, Prometheus, W&B, Nagios, Zabbix
Team & Environment
Part of the DGX Cloud SA Segment Team.
Benefits & Compensation
- Compensation range: 148,000 USD - 235,750 USD + equity.
- Equity and benefits.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.



