Sonia Solutions is hiring an ML Platform Engineer to support and enhance our Kubernetes-based infrastructure for AI/ML workloads, with a focus on reliability, scalability, and performance. You will be instrumental in deploying and optimizing LLM inference systems and refining our MLOps practices to power our mission to revolutionize healthcare.
What You'll Do
- Support and enhance Kubernetes-based infrastructure in cloud environments running ML/LLM workloads and general applications
- Deploy and optimize LLM inference systems
- Design, build, and improve MLOps/DevOps pipelines to support the entire development lifecycle
- Manage GPU scheduling and autoscaling with Kubernetes-native tooling
- Ensure observability and alerting across the platform
- Operate and troubleshoot supporting infrastructure
- Contribute to platform reliability, security, and performance through automation and best practices
What We're Looking For
- 5+ years of experience in MLOps or SRE
- Strong hands-on Kubernetes experience, including GitOps (Flux or ArgoCD), Kustomize, Helm and production troubleshooting
- Familiarity with LLM inference deployment and optimization in Kubernetes (e.g., vLLM, LMCache, llm-d)
- Experience with MLOps supporting tools such as MLflow or Argo Workflows
- Understanding of GPU resource orchestration in Kubernetes environments
- Profound knowledge of observability tools, such as VictoriaMetrics, VictoriaLogs and Grafana
- Knowledge of database and broker administration (PostgreSQL, Redis and RabbitMQ)
- Solid scripting skills in Python
- Comfortable working with cloud platforms (OVHcloud, AWS, GCP or Azure)
Nice to Have
- Experience with audio ML models or real-time inference
- Exposure to CI/CD practices tailored for ML systems
- Familiarity with Kubernetes networking, security, or performance tuning
Technical Stack
- Kubernetes, GitOps (Flux, ArgoCD), Kustomize, Helm
- vLLM, LMCache, llm-d, MLflow, Argo Workflows
- VictoriaMetrics, VictoriaLogs, Grafana, PostgreSQL, Redis, RabbitMQ
- Python, OVHcloud, AWS, GCP, Azure
Team & Environment
Works closely with ML engineers.
Benefits & Compensation
- Full ownership of a mission-critical platform
- A team that values curiosity, learning, and experimentation
- Remote-first setup with the option to work in Berlin office
- Competitive salary depending on experience
- Work on AI infrastructure that directly impacts healthcare innovation
Work Mode
This is a hybrid role open to candidates in Germany and Luxembourg, with office options in Berlin and Luxembourg.
Sonia Solutions is an equal opportunity employer.


