Full-time

NVIDIA is hiring a Solutions Architect, DGX Cloud

About the Role

NVIDIA is looking for a Solutions Architect, DGX Cloud to guide and enable the successful adoption at scale of DGX Cloud and NVIDIA AI Enterprise Software in production. You will work closely with DGX Cloud Partners as their trusted technical advisor to ensure they accomplish their business goals.

What You'll Do

  • Work closely with DGX Cloud Partners, become their trusted technical advisor, advocate for their needs, and ensure they are successful in accomplishing their business goals with the platform.
  • Accelerate NVIDIA Cloud Partner onboarding time, cluster manageability and reliability.
  • Scale knowledge, reach, and opportunities by building and educating vertical teams and communities on DGX Cloud and NVIDIA Reference Architectures.
  • Communicate to our Reference Architecture teams findings gathered from the field.
  • Provide technical education and facilitate field product feedback to improve DGX Cloud.
  • Enable partners to participate in the DGX Cloud Ecosystem with the goal of end-user satisfaction and increased sales.

What We're Looking For

  • Strong foundational expertise, from a BS, MS, or Ph.D. degree in Engineering, Mathematics, Physics, Computer Science, Data Science (or equivalent experience).
  • 5+ years of proven experience with one or more Cloud Service Providers (AWS, Azure, GCP or OCI), NVIDIA Cloud Partners (CoreWeave, Lambda Labs, Crusoe, etc) and cloud-native architectures and software.
  • Demonstrated experience in technical leadership, strong understanding of NVIDIA technologies, and success in working with customers.
  • Expertise with parallel filesystems (e.g. Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects (InfiniBand, Omni Path, RoCE, and Gig-E).
  • Strong coding and debugging skills, and demonstrated expertise in one or more of the following areas: Machine Learning, Deep Learning, Slurm, Kubernetes, MPI, MLOps, LLMOps, Ansible, Terraform, and other high-performance AI cluster solutions.
  • Proficient in deploying GPU applications in Slurm, Kubernetes, docker, helm, registries.
  • Experience with Linux-based configuration management and monitoring solutions, system administration, OS installation, configuration, and troubleshooting.
  • Knowledge of networking technologies (e.g. router, firewall, load balancer, DNS, VPN) for complex infrastructure configuration.

Nice to Have

  • Experience using DGX Cloud, NVIDIA AI Enterprise AI Software including Base Command Manager, NeMo, and NVIDIA's Inference Microservices.
  • Experience with AI application development and deployment.
  • Background with deploying and configuring observability tooling including Grafana, Prometheus, W&B, Nagios, Zabbix.
  • Experience with high performance or large-scale computing environments.

Technical Stack

  • Cloud: AWS, Azure, GCP, OCI
  • Storage: Lustre, GPFS, BeeGFS, WekaIO
  • Interconnect: InfiniBand, Omni Path, RoCE, Gig-E
  • AI/ML: Machine Learning, Deep Learning, MLOps, LLMOps
  • Orchestration: Slurm, Kubernetes, MPI
  • Infrastructure as Code: Ansible, Terraform
  • Containers: docker, helm
  • Observability: Grafana, Prometheus, W&B, Nagios, Zabbix

Team & Environment

Part of the DGX Cloud SA Segment Team.

Benefits & Compensation

  • Compensation range: 148,000 USD - 235,750 USD + equity.
  • Equity and benefits.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.

Required Skills
AWSAzureGCPOCIInfiniBandLustreGPFSBeeGFSWekaIOOmni PathHigh Performance ComputingDistributed StorageCloud Architecture
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
NVIDIA

NVIDIA is the platform upon which every new AI‑powered application is built.

Visit website
Job Details
Category infrastructure
Posted 7 months ago