Sunnyvale or Bellevue Hybrid USD 206,000 – 303,000 / year

CoreWeave is hiring a Principal Engineer

Responsibilities

  • Develop long-term architectural strategies for orchestration platforms across various systems.
  • Serve as a technical expert in scheduling, quota management, fairness, pre-emption, and multi-tenant GPU isolation.
  • Balance performance, reliability, cost, and operational complexity in design decisions.
  • Drive the evolution of Kubernetes-native control planes, including custom operators.
  • Design systems for workload admission, validation, and rollout, including model onboarding processes.
  • Identify and eliminate scaling limitations across schedulers, control planes, registries, networking, and storage.
  • Establish standards for reliability, observability, and operational readiness across orchestration services.
  • Define Service Level Objectives (SLOs), alerting, and incident response practices for critical systems.
  • Ensure systems maintain predictable behavior during failures, peak load, and rapid growth.
  • Write and review production code for Kubernetes controllers, schedulers, admission logic, and internal tooling.
  • Measure and enhance scheduling latency, container startup time, image distribution, and cold-start performance.
  • Conduct architecture and design reviews across infrastructure teams.
  • Provide mentorship to senior and staff engineers, fostering technical leadership growth.
  • Influence platform, infrastructure, security, and product teams through clear technical judgment.
  • Engage with customers and open-source communities on complex technical topics as needed.

Requirements

  • 15+ years of experience in building and operating large-scale distributed systems.
  • In-depth, practical knowledge of Kubernetes and Slurm internals.
  • Experience managing GPU-intensive platforms for AI training, inference, or HPC workloads.
  • Strong background in Go and cloud-native systems development.
  • Proven ability to set technical direction across teams without direct authority.
  • Comfortable making high-impact technical decisions in complex systems.
  • Bachelor’s or Master’s degree in a relevant field, or equivalent experience.

Nice to Have

  • Experience with systems such as Kueue, Kubeflow, Argo Workflows, Ray, Istio, or Knative.
  • Background in ML platform engineering, model onboarding, or lifecycle management.
  • Strong understanding of scheduling strategies, pre-emption, quota enforcement, and elastic scaling.
  • Track record of operating highly reliable systems with clear SLOs and incident processes.
  • Contributions to Kubernetes, ML infrastructure, or related open-source projects.
  • Experience mentoring senior engineers and raising engineering standards.

Benefits

  • Comprehensive medical, dental, and vision insurance fully covered by the employer.
  • Company-paid Life Insurance.
  • Voluntary supplemental life insurance.
  • Short and long-term disability insurance.
  • Flexible Spending Account.
  • Health Savings Account.
  • Tuition Reimbursement.
  • Employee Stock Purchase Program (ESPP) participation.
  • Mental Wellness Benefits through Spring Health.
  • Family-Forming support provided by Carrot.
  • Paid Parental Leave.
  • Flexible, full-service childcare support with Kinside.
  • 401(k) with a generous employer match.
  • Flexible PTO.
  • Catered lunch each day in office and data center locations.
  • Casual work environment.
  • Innovative and disruptive work culture.

Compensation

Competitive

Work Arrangement

Hybrid

Team

Collaborative

Other

  • Hybrid work environment prioritized, with remote work considered for candidates over 30 miles from an office, based on role requirements. New hires will attend onboarding at a hub within their first month, and teams gather quarterly to support collaboration.
  • This position requires access to export-controlled information. Applicants must be a U.S. person, eligible to access the information without a required export authorization, or eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. The company may decline to pursue any export licensing process for legitimate business reasons.

Not provided

Required Skills
Kubernetes
About company
CoreWeave
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability.
All jobs at CoreWeave Visit website
Job Details
Department Engineering
Category other
Posted 4 months ago