NVIDIA is hiring a Senior HPC DevOps Engineer

Responsibilities

Architect, deploy, and manage large-scale high-performance computing and artificial intelligence clusters with advanced monitoring, logging, and alerting capabilities.
Build and use infrastructure-as-code tools to enable consistent, scalable provisioning of systems and environments.
Create and manage continuous integration and continuous delivery workflows to automate software deployment and improve release efficiency.
Write automation scripts and develop tools to streamline system deployment, configuration, and operational oversight.
Design and implement advanced networking automation solutions for complex infrastructure environments.
Diagnose and resolve technical issues across hardware, operating systems, and applications to ensure optimal performance and reliability.
Act as a technical expert by creating and disseminating best practices across internal engineering teams.
Contribute to research and development initiatives, including proof of concepts and proof of value projects for emerging technologies.

Other

The company supports diversity and is dedicated to fostering an inclusive workplace for all team members.
Employment decisions are made without regard to race, religion, color, national origin, sex, gender identity, sexual orientation, age, marital status, veteran status, or disability.
Applicants and employees receive reasonable accommodations to participate in hiring processes, perform job duties, and access employment benefits.

Required Skills

JenkinsAnsiblePuppet/ChefLinuxRedhat/CentOSUbuntuInfiniBandEthernetSlurmKubernetesLustreHPCDevOpsWindows

About company

NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.

All jobs at NVIDIA Visit website

Job Details

Category infrastructure

Posted 5 months ago

Similar Jobs

Other opportunities you might be interested in

Senior DevOps Engineer (m/w/d) im KI-Startup

Codefy GmbH

Heidelberg Hybrid

Site Reliability Engineer (SRE/ DevOps) - Engineering Productivity

Arista Networks

Poland-Remote remote

Senior Solutions Architect, Cloud Infrastructure and DevOps

NVIDIA

Software Engineer / DevOps

Applied Intuition

Sunnyvale On-site

Platform Engineer, Infrastructure

Field Nation

Senior Solutions Engineer - F5 Distributed Cloud

F5

Related Articles

Insights related to this role

Data center rack with network switches and fiber connections, illustrating automated network deployment using CI/CD and network configuration as code.

Network Configuration as Code: CI/CD for Automation | NVIDIA

4 min 2 months ago

A remote developer working in a well-lit, modern workspace, illustrating a productive environment enabled by a developer experience platform.

Developer Experience Platform: Lessons from Europe

5 min a month ago

Home office setup with laptop running cloud monitoring tools, symbolizing remote SRE jobs in financial services cloud transformation.

Remote SRE Jobs: Vanguard’s Cloud Transformation

4 min a month ago