Gather AI seeks a Senior ML Engineer (Ops) to own the infrastructure engine behind our machine learning platform. This is a hands-on, high-ownership role focused on the 'last mile' problem of ensuring sophisticated vision models run reliably at scale in production. You will be the primary builder and maintainer of our MLOps platform, leading the transition from manual deployments to a fully automated, enterprise-grade system.
What You'll Do
- Migrate box and barcode detection pipelines to cloud infrastructure following MLOps best practices.
- Build and maintain CI/CD pipelines for deployment across production and non-production environments.
- Implement automated rollback, canary, and blue-green deployment strategies for ML microservices.
- Build out a multi-tenant MLOps platform using tools like Prefect, ZenML, or similar orchestration frameworks.
- Establish a centralized model registry and versioning system for all production assets.
- Instrument observability across the ML stack — logging, metrics, and distributed tracing — to ensure reliability at scale.
What We're Looking For
- 6+ years of industry experience (outside academia) in ML engineering, MLOps, or infrastructure engineering.
- Deep operational fluency with Kubernetes and Docker for ML workload orchestration.
- Strong production-grade Python skills with a track record of hardening research code into scalable microservices.
- Hands-on experience with CI/CD for ML (e.g., GitHub Actions, GitLab CI) and model serving frameworks (e.g., KServe, SageMaker, Vertex AI Endpoints).
- Experience with pipeline orchestration and model lifecycle tools such as Airflow, MLflow, Kubeflow, or Flyte.
- Proven ownership of production system reliability, including SRE principles, observability stacks, and automated failure safeguards.
Nice to Have
- Prior experience building end-to-end MLOps pipelines (data, model, and inference) from scratch.
- Domain experience in logistics, supply chain, or robotics-adjacent cloud platforms.
- Familiarity with feature stores and training/serving data consistency patterns.
- Experience with Infrastructure as Code tools such as Terraform.
Technical Stack
- Orchestration: Kubernetes, Docker
- Core Language: Python
- CI/CD: GitHub Actions, GitLab CI
- Model Serving: KServe, SageMaker, Vertex AI Endpoints
- Pipeline Orchestration: Airflow, MLflow, Kubeflow, Flyte
- MLOps Platforms: Prefect, ZenML
- Infrastructure as Code: Terraform
Team & Environment
Our Engineering team builds the systems that turn cutting-edge ML research into reliable, production-grade infrastructure. We operate at the intersection of machine learning, cloud infrastructure, and real-world logistics. We value operational excellence, first-principles thinking, and the kind of engineering that makes complex systems look effortless.





