VIANT is looking for a Cloud Reliability Engineer to build and maintain resilient, scalable cloud platforms. You will work within our engineering organization to ensure high availability and performance of our core systems.
What You'll Do
- Design, implement, and manage monitoring, alerting, and observability solutions for cloud infrastructure
- Build automation for deployment, scaling, and recovery processes to improve system reliability
- Lead incident response, conduct post-mortems, and implement preventive measures
- Collaborate with development teams to establish and enforce SLOs, SLIs, and error budgets
- Optimize cloud resource utilization and manage costs effectively
What We're Looking For
- Proven experience designing and operating highly available, large-scale cloud environments
- Strong proficiency with infrastructure-as-code tools like Terraform or CloudFormation
- Expertise in container orchestration with Kubernetes and related ecosystem tools
- Deep knowledge of at least one major cloud provider (AWS, GCP, Azure)
- Experience implementing comprehensive monitoring with tools like Prometheus, Grafana, Datadog, or similar
- Strong scripting and automation skills in Python, Go, or Bash
- Systematic approach to problem-solving and incident management
Work Mode
This position is remote. We are also open to candidates based in Irvine, CA.
VIANT is an equal opportunity employer.


