About the Role
The candidate will collaborate with engineering teams to build and maintain highly available systems, improve observability, automate operations, and drive reliability best practices across production environments.
Responsibilities
- Design and implement scalable infrastructure for autonomous vehicle operations
- Own on-call incident response and postmortem analysis processes
- Develop automation tools to reduce manual operational overhead
- Enhance monitoring, alerting, and metrics collection systems
- Collaborate with development teams to improve service reliability
- Drive adoption of SRE principles across engineering teams
- Optimize system performance and troubleshoot complex production issues
- Maintain and evolve CI/CD pipelines and deployment strategies
- Ensure infrastructure meets security and compliance standards
- Lead capacity planning and scalability initiatives
- Contribute to disaster recovery and business continuity planning
- Improve logging infrastructure for faster root cause analysis
- Support cloud and edge computing environments
- Work closely with product teams to influence system design
- Promote blameless postmortem culture and follow-through on action items
- Evaluate and integrate new reliability tools and technologies
- Document system architecture and operational procedures
- Mentor junior engineers in SRE best practices
- Participate in system design reviews and provide operational feedback
- Monitor service level objectives and error budget management
- Reduce technical debt in production systems
- Implement proactive alerting to minimize mean time to detection
- Support high-availability requirements for real-time vehicle operations
- Contribute to incident command structure during major outages
- Ensure systems are resilient under peak load conditions
Compensation
Competitive salary and equity package
Work Arrangement
Hybrid or remote with team presence in the Bay Area
Team
Engineering team focused on autonomous middle-mile logistics
Why This Role Matters
- The systems you maintain directly impact the safety and efficiency of autonomous delivery fleets operating in real-world conditions.
- Your work ensures minimal downtime for critical logistics operations serving commercial customers.
- You’ll help scale infrastructure to support rapid geographic and operational expansion.
Tech Stack
- Kubernetes for container orchestration
- AWS and hybrid cloud environments
- Prometheus, Grafana, and ELK stack for observability
- Terraform for infrastructure as code
- GitLab CI/CD for pipelines
- Go and Python for tooling and automation
- gRPC and REST APIs for service communication
Growth Opportunities
- Opportunity to shape SRE practices in a growing engineering organization.
- Exposure to cutting-edge challenges in autonomous vehicle operations.
- Leadership roles available for staff-level contributors.
- Cross-functional collaboration with AI, robotics, and product teams.
Available for qualified candidates


