As a Senior Solutions Architect specializing in Cloud Infrastructure and DevOps, you will play a central role in shaping how enterprise customers design, deploy, and manage advanced computing environments. Your work will center on AI and high-performance computing (HPC) platforms, with a strong emphasis on Kubernetes, GPU integration, and automated infrastructure solutions.

Key Responsibilities

Guide clients through the design and optimization of large-scale computing systems, including implementation of monitoring, logging, and workload orchestration using Kubernetes and Linux-based schedulers
Deliver hands-on technical support across the full stack—from hardware and operating systems to container platforms, networking, and storage
Evaluate existing infrastructure and recommend production-ready, Kubernetes-driven container platforms integrated with enterprise storage and networking
Develop and maintain technical methodologies, operational playbooks, and best practices for internal and customer use
Support research initiatives and lead proof-of-concept projects to validate new architectures, features, and upgrade paths
Produce detailed technical documentation, including runbooks, onboarding guides, and reference architectures
Act as the primary technical advisor for key accounts, influencing long-term decisions around platform architecture and DevOps strategy

Required Qualifications

Advanced degree in Computer Science, Engineering, Physics, Mathematics, or a related field—equivalent experience accepted
Minimum of eight years in roles focused on cloud infrastructure, automation, and scalable system design
Proven experience deploying and tuning HPC and AI clusters, with strong knowledge of data center networking and system architecture
Hands-on deployment and optimization of NVIDIA GPU-based systems, including CUDA integration and GPU workload analysis
Extensive Kubernetes experience, particularly in GPU and HPC environments, covering orchestration, scaling, and resource scheduling
Strong command of Linux systems (RedHat, Ubuntu), OS security, and networking protocols
Experience with high-performance storage systems such as Lustre, GPFS, ZFS, and XFS, including Kubernetes-native storage solutions
Proficiency in scripting (Python, Bash) and Infrastructure-as-Code tools like Ansible and Terraform
Familiarity with observability platforms including Grafana, Prometheus, and Loki for building resilient, monitored systems
Track record of designing scalable technical solutions and advising enterprise clients through architectural reviews and technical workshops

Preferred Skills

Experience with CI/CD pipelines and automated software delivery
Hands-on use of NVIDIA GPU and Network Operators for managing GPU and network resources in Kubernetes
Direct experience with NVIDIA Base Command Manager (BCM) for large-scale GPU cluster provisioning and management
Working knowledge of RDMA technologies, including InfiniBand and RoCE, in AI or HPC contexts

Technology Environment

Key tools and platforms include Kubernetes, Linux (RedHat, Ubuntu), Lustre, GPFS, ZFS, XFS, Python, Bash, Ansible, Terraform, Grafana, Prometheus, Loki, CUDA, NVIDIA GPUs, NVIDIA Base Command Manager (BCM), RDMA, InfiniBand, RoCE, and GPU and Network Operators.

NVIDIA is hiring a Senior Solutions Architect, Cloud Infrastructure and DevOps

Key Responsibilities

Required Qualifications

Preferred Skills

Technology Environment

Similar Jobs

Sales Engineer | South India (Bangalore)

Cloud Systems Engineer

DevOps & Solution Architect

DevOps Engineer (Remote in Canada)

Customer Support Engineer

Cloud Systems Engineer (Cleared)

Related Articles

Platform Engineering: Kubernetes for All

Network Configuration as Code: CI/CD for Automation | NVIDIA

Become an AI Developer: Your Career Guide