On-site Full-time

Fundamental is hiring a MLOps Team Lead

About the Role

Fundamental is looking for an MLOps Team Lead to lead and mentor a team of MLOps engineers, define the strategic roadmap, and architect scalable ML infrastructure and pipelines. You will bridge the critical gap between research and production in our mission-driven, low-ego environment.

What You'll Do

  • Lead and mentor a team of MLOps engineers, fostering technical growth and a culture of operational excellence
  • Define and drive the MLOps roadmap, aligning infrastructure capabilities with Research, Engineering and product objectives
  • Establish best practices, standards, and processes for ML infrastructure, deployment, and operations
  • Own technical decision-making for ML infrastructure architecture and tooling choices
  • Architect and oversee scalable, automated machine learning pipelines, CI/CD workflows, and orchestration frameworks
  • Drive the design and implementation of robust model serving infrastructure using platforms like Triton, TorchServe, TensorFlow Serving, and KServe
  • Define inference architecture strategy optimized for ultra-low latency and high throughput
  • Design and maintain feature stores, robust data pipelines, and scalable storage solutions to efficiently handle large volumes of data
  • Collaborate with research teams to bridge the gap between experimentation and production
  • Define logging, alerting, and monitoring strategy to track model performance, drift, and system reliability

What We're Looking For

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)
  • 7+ years of experience in MLOps, with 3+ years in a technical leadership role
  • Strong software engineering skills in Python, with experience in Bash and/or Go
  • Proven track record of building and leading high-performing MLOps or infrastructure teams
  • Experience building and designing MLOps infrastructure from the ground up
  • Deep experience with MLOps platforms (MLflow, WandB, etc.) and frameworks (PyTorch, TensorFlow, etc.)
  • Deep experience with model serving frameworks (Triton, TorchServe, TensorFlow Serving, KServe) for high scalability and low latency inference
  • Experience building and managing data pipelines to support both model training and inference
  • Good experience with Kubernetes on a major cloud provider (AWS, GCP, or Azure) and with infrastructure as code (Terraform, Helm, GitOps)
  • Proficient with observability and monitoring tools (Prometheus, Grafana, Datadog, OpenTelemetry)
  • Excellent communication skills with ability to translate between research and production contexts

Nice to Have

  • Experience with workflow orchestration tools (Kubeflow, Airflow, Argo Workflows)
  • Experience with FastAPI and backend applications
  • Familiarity with data platforms like Databricks or Snowflake
  • Experience with LLM/foundation model serving and optimization
  • Exposure to SRE practices or cloud security certifications
  • Experience scaling ML infrastructure for AI startups

Technical Stack

  • Languages: Python, Bash, Go
  • MLOps Platforms: MLflow, WandB
  • ML Frameworks: PyTorch, TensorFlow
  • Model Serving: Triton, TorchServe, TensorFlow Serving, KServe
  • Infrastructure: Kubernetes, AWS, GCP, Azure, Terraform, Helm, GitOps
  • Observability: Prometheus, Grafana, Datadog, OpenTelemetry
  • Orchestration & Tools: Kubeflow, Airflow, Argo Workflows, FastAPI, Databricks, Snowflake

Team & Environment

You will lead and mentor a team of MLOps engineers. The company culture is mission-driven and low-ego, valuing diversity of thought, ownership, and bias toward action.

Benefits & Compensation

  • Competitive compensation with salary and equity
  • Comprehensive health coverage, including medical, dental, vision, and 401K
  • Fertility support
  • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys
  • Relocation support for employees moving to join the team in one of our office locations

We are an equal opportunity employer.

Required Skills
PythonMLflowWandBPyTorchTensorFlowTritonTorchServeTensorFlow ServingKubernetesDockerAWSCI/CDTeam LeadershipML InfrastructureBash
Freelancing without stability?

Get steady projects, keep your freedom

Iglu connects you with international clients and handles contracts, payments, and admin. You get consistent work and flexibility — no more chasing invoices or worrying about gaps.

Consistent client projects
Contract & payment management
Flexible work schedule
Revenue-sharing compensation
See open positions
Work from anywhere
About company
Fundamental

Fundamental is an AI company pioneering the future of enterprise decision-making. It has developed NEXUS – the world's most powerful Large Tabular Model (LTM) – purpose-built for the structured records that drive enterprise decisions.

Visit website
Job Details
Category management
Posted a month ago