NVIDIA is hiring a Senior Software Engineer – Inference Platform Infrastructure

Responsibilities

  • Develop automated systems for scalable inference operations, including setup, configuration, updates, rollbacks, and regular maintenance, emphasizing consistency and safety.
  • Design and refine deployment strategies for inference tasks on Kubernetes, covering deployment methods, dynamic scaling, multi-cluster setups, GPU resource management, and secure update procedures.
  • Ensure platform reliability through software solutions by establishing and enhancing service level indicators, objectives, error tolerance, alerting effectiveness, and automated responses to recurring issues.
  • Manage and maintain a large-scale infrastructure of GPU and datacenter systems, supporting hardware from early testing through production deployment.

Benefits

  • Eligible for equity compensation
  • Comprehensive benefits package available at the official website

Responsibilities

  • Develop automated systems for scalable inference operations, including setup, configuration, updates, rollbacks, and regular maintenance, emphasizing consistency and safety.
  • Design and refine deployment strategies for inference tasks on Kubernetes, covering deployment methods, dynamic scaling, multi-cluster setups, GPU resource management, and secure update procedures.
  • Ensure platform reliability through software solutions by establishing and enhancing service level indicators, objectives, error tolerance, alerting effectiveness, and automated responses to recurring issues.
  • Manage and maintain a large-scale infrastructure of GPU and datacenter systems, supporting hardware from early testing through production deployment.

Benefits

  • Eligible for equity compensation
  • Comprehensive benefits package available at the official website

Other

  • Job applications will remain open until at least February 21, 2026.
  • Artificial intelligence tools are utilized in the recruitment process.
  • The company is dedicated to building a diverse and inclusive workplace and adheres to equal opportunity employment practices, prohibiting discrimination based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability, or any other legally protected status.
Required Skills
KubernetesContainersTritonTensorRT-LLMKServeRay ServePythonC++Distributed SystemsGPU ComputingML InferencePerformance OptimizationMicroservicesCloud Infrastructure KubernetesContainersTritonTensorRT-LLMKServeRay ServePythonC++Distributed SystemsGPU ComputingML InferencePerformance OptimizationMicroservicesCloud Infrastructure
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Category infrastructure
Posted 3 months ago