ML Ops Engineer responsible for building and operating reliable, efficient ML inference systems for a fast-growing distributed cloud infrastructure startup. Focuses on model serving, GPU-based workloads, and decentralized AI/ML architectures in a remote-first EMEA setup.
Responsibilities
- Develop and manage production-level model serving infrastructure using tools like vLLM, TGI, Triton, or similar frameworks
- Create deployment pipelines with blue/green and canary release strategies tailored for ML models
- Build and sustain auto-scaling mechanisms, multi-model serving setups, and intelligent routing for inference requests
- Improve efficiency in GPU usage, memory management, network performance, and model storage systems
- Design monitoring solutions to track inference latency, throughput, GPU utilization, cost, and system health
- Maintain model registries and implement CI/CD pipelines for automated, reproducible model deployments
- Oversee end-to-end ML system lifecycle, including development, production deployment, and on-call support
- Establish engineering standards and support platform scalability within a fast-paced startup environment
Requirements
- Minimum of 4 years in ML Ops, Platform Engineering, SRE, or related infrastructure roles with focus on ML systems
- Direct experience with model serving technologies such as vLLM, TGI, Triton, or comparable tools
- Solid background in managing containerized GPU workloads in production using orchestration platforms
- Proven experience with MLOps tooling including model registries, experiment tracking, and automated deployment systems
- Proficient in Python and infrastructure-as-code tools such as Terraform, Helm, or equivalent
- Strong grasp of distributed systems, performance optimization, and reliability engineering in production environments
- Ability to leverage AI coding assistants effectively to speed up development and debugging tasks
- Demonstrated ownership and ability to work independently in a remote-first setting
Nice to Have
- Experience working with ML platforms like Kubeflow, MLflow, or KubeAI
- Knowledge of GPU scheduling, CUDA/ROCm optimization, or multi-tenant inference architectures
- Track record in optimizing costs across different GPU types and inference workloads
- Background in early-stage startups or building greenfield infrastructure projects
- Proven ability to design and implement production systems from scratch rather than maintaining legacy systems
Tech Stack
vLLM, TGI, Triton, Python, Terraform, Helm, Kubernetes, GPU-based workloads, CUDA, ROCm, CI/CD, model registries, experiment tracking, distributed systems, container orchestration
Benefits
- Lead critical infrastructure development for a rapidly expanding AI-native cloud platform
- Design and implement foundational ML inference systems from the ground up in a high-growth environment
- Work at the intersection of distributed systems, GPU computing, and sustainable cloud architecture
- Develop deep expertise in next-generation AI infrastructure and large-scale model serving
- Shape core engineering decisions and define scalable best practices for the organization
Work Arrangement
global — EMEA — Fully remote with focus on EMEA timezone
Team
Distributed team within a fast-scaling startup; collaborates with infrastructure, platform, and applied AI teams
- Fair, transparent, and inclusive recruitment process
- No discrimination based on age, disability, gender, gender identity or expression, marital or civil partner status, pregnancy or maternity, race, religion or belief, sex, or sexual orientation
Additional Information
- Location: Fully remote (EMEA timezone)
- Start date: ASAP
- Languages: Fluent English required
- Industry: Cloud Computing / AI / European Deep-Tech SaaS
- Personal data will be processed lawfully, fairly, and securely under GDPR for recruitment purposes only
- Role includes on-call responsibilities
- Remote-first environment demands independent operation and strong ownership


