Bangladesh, South Asia Remote (Country)

NexGen Cloud is hiring a Lead MLOps Engineer

Responsibilities

Own the design, implementation, and evolution of core MLOps systems across Hyperstack — including the infrastructure and workflows that underpin AI Studio
Build and improve systems that orchestrate model training, fine-tuning, evaluation, and deployment — engineered for long-running, resource-intensive GPU workloads
Own production readiness across ML infrastructure — monitoring, alerting, incident response, and continuous improvement based on real-world usage
Define and embed strong MLOps practices across teams — model versioning, reproducibility, deployment safety, rollback strategies, and environment management
Provide technical leadership through architecture decisions, implementation guidance, and shared standards — working closely with Product, Engineering, and cross-functional teams

Requirements

Proven experience designing, building, and operating production ML infrastructure, platform systems, or MLOps workflows in cloud environments
Hands-on Python development skills, with experience building backend systems, automation, and developer or platform tooling
Experience supporting LLM, generative AI, or fine-tuning workflows in production — including training, evaluation, deployment, inference, and lifecycle management
Production-grade experience with Docker, Kubernetes, CI/CD, and infrastructure-as-code in real, operational environments
Experience owning complex, asynchronous, or resource-intensive workloads end to end — including orchestration, reliability, observability, and incident response
Ability to work cross-functionally and provide technical leadership through influence — shaping standards, direction, and ways of working across engineering teams

Nice to Have

Exposure to GPU-intensive, distributed, or performance-sensitive ML workloads
Experience building internal developer platforms or tooling that improve experimentation, reproducibility, and delivery speed for ML teams
Background in cloud infrastructure, platform products, or technically complex B2B software

Benefits

Competitive salary and annual discretionary bonus scheme
Employee wellbeing benefits
25 days of holiday, plus public holidays
Flexible working arrangements (remote or hybrid, depending on role and location)
Real ownership and autonomy, with the trust to take initiative and experiment
The opportunity to make a visible, meaningful impact as we scale
Clear career progression and growth opportunities in a fast-growing company
A collaborative, international culture built on trust, transparency, and ownership
The chance to help shape NexGen Cloud's team, culture, and future alongside ambitious, mission-driven colleagues

Required Skills

DockerKubernetesCI/CD

About company

NexGen Cloud is the company behind Hyperstack, a full-stack AI cloud serving tens of thousands of customers from AI researchers to enterprises running the world's most compute-intensive workloads. We deliver on-demand and private GPU infrastructure to teams who treat performance as a requirement, not a feature. We're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure. We practice what we preach, equipping our people with AI at every level so we can solve harder problems, ship faster, and keep raising the bar for what enterprise GPU infrastructure looks like.

All jobs at NexGen Cloud Visit website

Job Details

Department Software Engineering

Category data

Posted a month ago