Australia Employment

NVIDIA is hiring a Solution Architect

About the Role

NVIDIA is hiring a Senior AI/HPC Engineer to join its Infrastructure Specialist team. In this role, you will be the technical face to the customer, interacting with partners and internal teams to analyze, define, and implement large-scale AI/HPC projects across Networking, System Design, and Automation.

What You'll Do

  • Deploy, manage, and maintain AI/HPC infrastructure in Linux-based environments.
  • Act as the domain expert for customers from planning calls through implementation.
  • Create comprehensive handover documentation and perform knowledge transfers for sophisticated systems.
  • Provide feedback to internal teams through bug reports, workarounds, and suggested improvements.

What We're Looking For

  • A BS/MS/PhD or equivalent experience in Computer Science, Engineering, Physics, Mathematics, or a related field.
  • 5+ years providing in-depth support and deployment services for hardware and software products.
  • Expert knowledge and experience with Linux System Administration, including process management, package management, kernel management, boot troubleshooting, and performance optimization.
  • Experience with cluster management technologies and schedulers such as SLURM, LSF, or UGE.
  • Scripting proficiency and strong organizational skills with the ability to prioritize tasks with limited supervision.
  • Excellent verbal and written English skills and good interpersonal skills for resolving critical customer issues.
  • Industry-standard Linux certifications.
  • Experience with advanced networking, including routing, tuning, and monitoring.

Nice to Have

  • Hands-on experience with MPI (e.g., OpenMPI, MPICH), including distributed communication programming and cluster debugging.
  • In-depth understanding of NCCL principles and expertise in collective communication optimization for NVIDIA GPU clusters.
  • Experience deploying and optimizing high-speed networks (InfiniBand/Ethernet) and understanding their impact on GPU cluster performance.
  • Familiarity with automation tools like Ansible, Salt, or Puppet for batch configuration and operational automation.
  • Knowledge and hands-on experience with Kubernetes for container orchestration, resource scheduling, and integration with HPC environments.

Technical Stack

  • Linux, SLURM, LSF, UGE
  • MPI (OpenMPI, MPICH), NCCL
  • InfiniBand, Ethernet
  • Ansible, Salt, Puppet, Kubernetes

Team & Environment

You will join NVIDIA's Infrastructure Specialist team, a diverse and supportive environment where everyone is inspired to do their best work.

NVIDIA is an equal opportunity employer and values diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Required Skills
LinuxSLURMLSFUGEMPINCCLInfiniBandEthernetAnsibleSaltCluster ManagementScriptingSystem Administration
Got hired remotely?

Get paid like a professional

Remote clients expect company invoices, not personal PayPal requests. Glopay forms an EU partnership that makes you look legitimate while you stay independent.

Professional invoices with EU company details
Compliance handled automatically
Withdraw to any bank account
Income reports for easy tax filing
Create free account
Free signup • 5 min setup
About company
NVIDIA

NVIDIA is the platform upon which every new AI‑powered application is built.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago