London, England, United Kingdom Hybrid Employment

Wayve is hiring a Senior Site Reliability Engineer

About the Role

Wayve is seeking a Senior Site Reliability Engineer to join our Vehicle Software team. In this critical role, you will be responsible for keeping Wayve’s autonomous driving fleet reliable, observable, and safe while it operates on public roads. You will work at the boundary of software, hardware, and operations, turning real-world incidents and performance bottlenecks into lasting engineering improvements.

What You'll Do

  • Own and improve the reliability, availability, and performance of vehicle software systems used across the development fleet.
  • Take part in a team on-call rotation, providing out-of-hours support for live systems when required.
  • Build and operate monitoring, logging, alerting, and on-call tooling that enables fast detection, diagnosis, and recovery.
  • Drive incident response and post-incident learning, translating root causes into durable fixes and preventative controls.
  • Design and deliver automation for fleet operations, deployments, and repetitive workflows to reduce manual intervention.
  • Partner closely with Vehicle SW, operations, and platform teams to define SLOs, reliability metrics, and release readiness.
  • Continuously harden the production environment through capacity planning, change management, and reliability-focused reviews.

What We're Looking For

  • Proven experience in an SRE, production reliability, or platform operations role for complex distributed systems.
  • Strong Linux fundamentals and hands-on experience with CI/CD, containers (Docker), and orchestration (Kubernetes).
  • Proficiency in at least one systems or scripting language (Python, C++, or Rust) with a bias for automation.
  • Deep troubleshooting skills across networking, distributed systems, and databases, including performance and availability issues.
  • Experience designing observability stacks and using tools such as Datadog, Prometheus, Grafana, OpenTelemetry, Splunk, or Humio.
  • Clear communication skills, including incident leadership, writing postmortems, and influencing engineering priorities.

Nice to Have

  • Cloud platform experience (AWS, GCP, or Azure), including infrastructure-as-code and secure production operations.
  • Experience with real-time or safety-critical systems, hardware-in-the-loop, or embedded/robotics environments.
  • Familiarity with fleet operations, telemetry pipelines, and operating software on edge devices at scale.
  • Experience defining and running SLOs/SLIs and reliability programs across multiple teams.

Technical Stack

  • Linux, CI/CD, Docker, Kubernetes
  • Languages: Python, C++, Rust
  • Observability: Datadog, Prometheus, Grafana, OpenTelemetry, Splunk, Humio
  • Cloud: AWS, GCP, Azure

Work Mode

This is a hybrid role based in London.

At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition (including breastfeeding) or any other basis as protected by applicable law.

Required Skills
LinuxCI/CDDockerKubernetesPythonC++RustDatadogPrometheusGrafanaOpenTelemetrySplunkHumioNetworkingDistributed Systems
Relocating to Thailand?

Visa and work permit handled by experts

SVBL manages your entire visa process — from application to approval. Work permits, extensions, and compliance all covered. One partner for legal, immigration, and settling in.

Work permit processing
Visa extensions & renewals
Immigration compliance
Banking & housing guidance
Get free consultation
Free initial consultation
About company
Wayve

Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems.

Visit website
Job Details
Department Engineering
Category infrastructure
Posted 14 days ago