Budapest, Budapest, Hungary Remote (Global)

Zyte is hiring a Core & ML Ops Team Lead - Remote

About the Role

Zyte is seeking a Core & ML Ops Team Lead to build the bedrock infrastructure that powers our services at scale. In this hands-on technical leadership role, you will lead a cross-functional squad responsible for designing and maintaining the scalable foundation for all Zyte services.

What You'll Do

  • Design and evolve the core platform, including Kubernetes, Mesos, GPU scheduling/autoscaling, and distributed compute.
  • Own the model platform: registry, experiment tracking, training orchestration, evaluation, serving, and monitoring.
  • Build the Golden Path, including reference repos, a scaffold CLI, opinionated CI/CD pipelines, runtime contracts, high-performance clients, and production‑ready defaults.
  • Operate a secure, multi‑tenant model registry and training platform with standardized experiment/evaluation harnesses.
  • Provide turnkey serving patterns, drift/quality monitoring, and rollback playbooks.
  • Integrate public/open‑source AI capabilities as managed platform services with cost and data‑governance guardrails.
  • Run the squad: own roadmap/prioritization, delivery, mentoring, and high engineering standards.
  • Partner with product engineering, Prod Ops, and Security on adoption and rollout plans.
  • Own container orchestration, GPU provisioning & autoscaling, environment & secret management.
  • Own operators, sidecars, and internal SDKs/libraries (Go/Rust/Python/Java) that enforce the golden path contract.
  • Own observability: logging/metrics/tracing pipelines.
  • Own billing pipeline: metering/events/cost tracking abstractions.
  • Own Golden Path: Java, Python, ML templates, CI/CD blueprints, docs, and scaffold CLI.
  • Own reliability enablement (SRE practices), cost governance, and supply‑chain security.

What We're Looking For

  • 5+ years experience building distributed systems.
  • 3+ years in MLOps/ML platform engineering, or equivalent impact.
  • Knowledge of Linux/OS internals, networking (TCP/IP, HTTP/2), concurrency, and performance profiling.
  • Deep understanding of Kubernetes.
  • Proficiency developing high-performance services in Java, Rust, Go or C++.
  • Strong Python skills.
  • Experience with GPU infrastructure (scheduling, containerization, optimization).
  • Track record of designing and operating model platforms in production.
  • Demonstrated success leading technical teams and implementing organization-wide platform solutions.

Nice to Have

  • Streaming & workflows: Kafka plus Argo/Temporal/Airflow or equivalents.
  • eBPF‑based observability, perf tooling, or io_uring experience.
  • Cost optimization for ML/AI; multi‑tenant quotas and fairness.
  • Hands‑on experience authoring Golden Paths (service chassis/templates, CI/CD blueprints, CLI scaffolds).
  • SRE practices (SLIs/SLOs, incident management).

Technical Stack

  • Platform: Kubernetes, Mesos, GPU infrastructure
  • Languages: Java, Rust, Go, C++, Python
  • Frameworks: vert.x, Netty
  • Streaming & Workflow: Kafka, Argo, Temporal, Airflow
  • Systems: eBPF, io_uring

Team & Environment

The Core & MLOps Squad is part of a globally distributed team of over 250.

Benefits & Compensation

  • Work from anywhere in a completely remote company.
  • Work with a wide range of open-source technologies and tools.
  • Be part of a self-motivated, progressive, multi-cultural team.
  • Foster and nourish new ideas and bring them to market.

Work Mode

This is a global, fully remote role. You can work from over 28 countries.

Zyte is an equal opportunity employer.

Required Skills
KubernetesMesosJavaRustGoC++Pythonvert.xNettyGPU infrastructureML OpsTeam LeadershipDistributed SystemsPerformance OptimizationCloud Infrastructure
Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation
Project choice & autonomy
International client base
Career growth support
Check compensation
Top earners exceed market rate
About company
Zyte

Zyte builds powerful, easy-to-use tools to collect, format, and deliver web data quickly, dependably, and at scale. The data extracted helps thousands of organizations make smarter business decisions, secure competitive advantage, and drive sustainable growth. Over 3,000 companies and 1 million developers rely on Zyte's tools and services.

Visit website
Job Details
Category management
Posted 5 months ago