Berkeley, CA Hybrid Employment

Valency is hiring a Senior AI-Native DevOps / Operations Engineer (AMER)

About the Role

The role involves building and operating resilient, automated systems tailored for AI workloads, ensuring high availability, observability, and rapid iteration across distributed environments.

Compensation

Competitive salary and equity package

Work Arrangement

Remote within the Americas

Team

Small, autonomous engineering team focused on AI infrastructure

What You’ll Do

  • Design and implement infrastructure that natively supports AI model training and inference.
  • Automate deployment pipelines for machine learning models and supporting services.
  • Monitor system performance and proactively address reliability concerns.
  • Collaborate with research and engineering teams to operationalize experimental systems.
  • Optimize cloud resource usage for cost efficiency and scalability.
  • Maintain secure, compliant environments aligned with data governance policies.
  • Troubleshoot complex issues across distributed compute and storage layers.
  • Develop tooling to streamline developer workflows and reduce operational overhead.
  • Lead incident response and post-mortem analysis for production systems.
  • Contribute to architectural decisions for long-term platform sustainability.

What We Look For

  • Proven experience with cloud platforms such as AWS, GCP, or Azure.
  • Strong scripting skills in Python, Bash, or similar languages.
  • Familiarity with containerization and orchestration tools like Docker and Kubernetes.
  • Hands-on experience with infrastructure-as-code tools such as Terraform or Pulumi.
  • Deep understanding of networking, security, and identity management in cloud environments.
  • Experience with CI/CD systems and automated testing frameworks.
  • Knowledge of observability tools including logging, metrics, and tracing platforms.
  • Background in managing GPU-accelerated workloads is highly desirable.
  • Ability to debug performance bottlenecks in distributed systems.
  • Clear communication skills for cross-functional collaboration.

Why This Role Stands Out

  • Work directly on infrastructure that powers cutting-edge AI applications.
  • Shape operational practices in a growing technical organization.
  • Solve challenging scalability problems at the intersection of ML and systems engineering.
  • Influence tooling and architecture decisions from an early stage.
  • Operate with high autonomy and measurable impact on product delivery.

Application Process

  • Submit your resume and a brief note explaining your interest.
  • Complete a technical screening focused on real-world scenarios.
  • Participate in a pair-programming session with the engineering team.
  • Final interview with leadership to discuss alignment and expectations.

Available for qualified candidates

Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation
Project choice & autonomy
International client base
Career growth support
Check compensation
Top earners exceed market rate
About company
Valency
Valency Systems is a small, dynamic team of engineers, scientists, and researchers building the global hub for the agentic research era. We're based in Berkeley, California, and we're building something that matters. If you care about open science, advancing research at the speed of thought, and using AI to accelerate discovery, we'd love to talk. We're a hybrid team. We come together in person usually 3 days a week at our office, with the option for 2 days of flexible remote work.
All jobs at Valency Visit website
Job Details
Department Engineering & Technology
Category infrastructure
Posted a month ago