About the Role

The role involves building and operating resilient, automated systems tailored for AI workloads, ensuring high availability, observability, and rapid iteration across distributed environments.

Compensation

Competitive salary and equity package

Work Arrangement

Remote within the Americas

Team

Small, autonomous engineering team focused on AI infrastructure

What You’ll Do

Design and implement infrastructure that natively supports AI model training and inference.
Automate deployment pipelines for machine learning models and supporting services.
Monitor system performance and proactively address reliability concerns.
Collaborate with research and engineering teams to operationalize experimental systems.
Optimize cloud resource usage for cost efficiency and scalability.
Maintain secure, compliant environments aligned with data governance policies.
Troubleshoot complex issues across distributed compute and storage layers.
Develop tooling to streamline developer workflows and reduce operational overhead.
Lead incident response and post-mortem analysis for production systems.
Contribute to architectural decisions for long-term platform sustainability.

What We Look For

Proven experience with cloud platforms such as AWS, GCP, or Azure.
Strong scripting skills in Python, Bash, or similar languages.
Familiarity with containerization and orchestration tools like Docker and Kubernetes.
Hands-on experience with infrastructure-as-code tools such as Terraform or Pulumi.
Deep understanding of networking, security, and identity management in cloud environments.
Experience with CI/CD systems and automated testing frameworks.
Knowledge of observability tools including logging, metrics, and tracing platforms.
Background in managing GPU-accelerated workloads is highly desirable.
Ability to debug performance bottlenecks in distributed systems.
Clear communication skills for cross-functional collaboration.

Why This Role Stands Out

Work directly on infrastructure that powers cutting-edge AI applications.
Shape operational practices in a growing technical organization.
Solve challenging scalability problems at the intersection of ML and systems engineering.
Influence tooling and architecture decisions from an early stage.
Operate with high autonomy and measurable impact on product delivery.

Application Process

Submit your resume and a brief note explaining your interest.
Complete a technical screening focused on real-world scenarios.
Participate in a pair-programming session with the engineering team.
Final interview with leadership to discuss alignment and expectations.

Available for qualified candidates

Valency is hiring a Senior AI-Native DevOps / Operations Engineer (AMER)

About the Role

Compensation

Work Arrangement

Team

What You’ll Do

What We Look For

Why This Role Stands Out

Application Process

Performance pay that rewards your skills

Similar Jobs

IT Infrastructure Engineer

Cloud / Platform Engineer (Remote)

Cloud Engineer (Generative AI & LLM) (m/w/d)

Senior Technology Consultant

DevOps Engineer