About the Role

This role involves designing and maintaining highly available cloud services through automation, monitoring, and incident response, ensuring systems scale efficiently and remain stable under load.

Responsibilities

Design and implement scalable infrastructure for cloud platforms
Develop automation tools to streamline operations and reduce manual intervention
Monitor system performance and proactively address potential issues
Respond to incidents with a focus on rapid resolution and root cause analysis
Collaborate with development teams to improve service reliability
Maintain system uptime and optimize availability across services
Create and manage configurations for cloud environments
Support deployment pipelines and continuous integration workflows
Enforce security standards within infrastructure and deployment processes
Document system architecture and operational procedures

Nice to Have

Master's degree in a technical discipline
Experience supporting large-scale production systems
Background in site reliability or platform engineering
In-depth knowledge of CI/CD pipelines
Familiarity with service mesh technologies
Exposure to security compliance frameworks
Contributions to open-source projects
Experience with multi-region cloud deployments
Strong written and verbal communication skills
Ability to mentor junior engineers

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model with flexibility based on location

Team

Part of the global cloud infrastructure team focused on scalable systems

Why Join Us

Work on cutting-edge cloud infrastructure supporting AI and high-performance computing
Collaborate with world-class engineers solving complex scalability challenges
Opportunity to influence architecture and operational practices

What We Offer

Comprehensive health and wellness benefits
Professional development and training programs
Employee resource groups and inclusive culture

Visa sponsorship available for qualified candidates

NVIDIA is hiring a Senior Site Reliability Engineer, Cloud

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

Why Join Us

What We Offer

Similar Jobs

Implementation Engineer

DevOps & Solution Architect

Software Engineer / DevOps

DevOps Engineer

Sr. DevOps Engineer - Multiple roles - Remote

Senior Infrastructure Engineer /DevOps (relocation)

Related Articles

Platform Engineering: Kubernetes for All

AI Boom Job Impact: Tech Decline vs. Service Growth in SF

Tech Layoffs AI Efficiency: Block Cuts 40% Workforce