NVIDIA is hiring a Senior HPC DevOps Engineer

Responsibilities

  • Architect, deploy, and manage large-scale high-performance computing and artificial intelligence clusters with advanced monitoring, logging, and alerting capabilities.
  • Build and use infrastructure-as-code tools to enable consistent, scalable provisioning of systems and environments.
  • Create and manage continuous integration and continuous delivery workflows to automate software deployment and improve release efficiency.
  • Write automation scripts and develop tools to streamline system deployment, configuration, and operational oversight.
  • Design and implement advanced networking automation solutions for complex infrastructure environments.
  • Diagnose and resolve technical issues across hardware, operating systems, and applications to ensure optimal performance and reliability.
  • Act as a technical expert by creating and disseminating best practices across internal engineering teams.
  • Contribute to research and development initiatives, including proof of concepts and proof of value projects for emerging technologies.

Other

  • The company supports diversity and is dedicated to fostering an inclusive workplace for all team members.
  • Employment decisions are made without regard to race, religion, color, national origin, sex, gender identity, sexual orientation, age, marital status, veteran status, or disability.
  • Applicants and employees receive reasonable accommodations to participate in hiring processes, perform job duties, and access employment benefits.
Required Skills
JenkinsAnsiblePuppet/ChefLinuxRedhat/CentOSUbuntuInfiniBandEthernetSlurmKubernetesLustreHPCDevOpsWindows
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Category infrastructure
Posted 5 months ago