NVIDIA is looking for a Senior Software Engineer, DevOps - Server Infrastructure to be responsible for architecting the build and deployment process of GPU-based servers for our Metropolis platforms. This role focuses on automating the delivery pipeline and managing infrastructure for AI and machine learning applications in streaming video and data analytics.
What You'll Do
- Build, deploy, and maintain GPU-based Servers for use in Metropolis blueprints, platforms, and machine learning applications for test, development, and production environments.
- Lead design and be responsible for infrastructure components on Network topologies, Streaming Servers, and Security.
- Collaborate with different software, IT, Security, and hardware teams across geographies to solve critical problems and performance issues.
- Establish configuration environment for servers by creating processes and tools for software development, debugging, testing, benchmarking, and documentation.
- Automate provisioning and management of bare-metals, internal cloud, Microsoft Azure, and Amazon AWS.
- Implement automated monitoring and operating procedures for a range of domains across on-premise/cloud environments.
- Build and maintain infrastructures related to the delivery of software artifacts produced by Metropolis application development teams.
- Create detailed documentation to allow customers, partners, and system integrators to replicate the deployment architecture prototyped.
What We're Looking For
- BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
- 8+ years of proven ability in Configuration Management and Server administration (Linux) in an Engineering Hardware Lab environment.
- Good programming skills in Python, Shell Scripting, Ansible, Terraform, Helm Template, Docker, Docker Compose.
- Good understanding of configuring and managing Elasticsearch, Logstash, Kibana, and the Kafka ecosystem.
- Software build, package, and delivery skills with Jenkins, Pipeline Scripting, Dockerfile, Artifactory integration, Container Registry, and Helm Package repositories.
- Good understanding of the Kubernetes ecosystem and helm-based application deployment patterns.
- Infrastructure provisioning automation with AWS, GCP, Azure.
- Experience building configuration management, monitoring, and automation tools.
- Familiarity with management of large scale of edge servers deployed in indoor and outdoor environments.
- Strong interpersonal skills.
Technical Stack
- Languages & Scripting: Python, Shell Scripting
- Infrastructure as Code: Ansible, Terraform
- Containers & Orchestration: Docker, Docker Compose, Kubernetes, Helm Template
- Observability: Elasticsearch, Logstash, Kibana, Kafka
- CI/CD & Delivery: Jenkins, Artifactory
- Cloud Platforms: AWS, GCP, Azure
- Operating System: Linux
Team & Environment
You will be a key member of the Metropolis team, collaborating with software, IT, Security, and hardware teams across geographies.
Benefits & Compensation
- Compensation: $184,000 USD - $287,500 USD + equity: Eligible for equity
- Highly competitive salaries
- Comprehensive benefits package
- Equity eligibility
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.




