Mountain View, CA, USA On-site USD 180,000 – 260,000 / year

Gatik AI is hiring a Senior/Staff Site Reliability Engineer

About the Role

The candidate will collaborate with engineering teams to build and maintain highly available systems, improve observability, automate operations, and drive reliability best practices across production environments.

Responsibilities

  • Design and implement scalable infrastructure for autonomous vehicle operations
  • Own on-call incident response and postmortem analysis processes
  • Develop automation tools to reduce manual operational overhead
  • Enhance monitoring, alerting, and metrics collection systems
  • Collaborate with development teams to improve service reliability
  • Drive adoption of SRE principles across engineering teams
  • Optimize system performance and troubleshoot complex production issues
  • Maintain and evolve CI/CD pipelines and deployment strategies
  • Ensure infrastructure meets security and compliance standards
  • Lead capacity planning and scalability initiatives
  • Contribute to disaster recovery and business continuity planning
  • Improve logging infrastructure for faster root cause analysis
  • Support cloud and edge computing environments
  • Work closely with product teams to influence system design
  • Promote blameless postmortem culture and follow-through on action items
  • Evaluate and integrate new reliability tools and technologies
  • Document system architecture and operational procedures
  • Mentor junior engineers in SRE best practices
  • Participate in system design reviews and provide operational feedback
  • Monitor service level objectives and error budget management
  • Reduce technical debt in production systems
  • Implement proactive alerting to minimize mean time to detection
  • Support high-availability requirements for real-time vehicle operations
  • Contribute to incident command structure during major outages
  • Ensure systems are resilient under peak load conditions

Compensation

Competitive salary and equity package

Work Arrangement

Hybrid or remote with team presence in the Bay Area

Team

Engineering team focused on autonomous middle-mile logistics

Why This Role Matters

  • The systems you maintain directly impact the safety and efficiency of autonomous delivery fleets operating in real-world conditions.
  • Your work ensures minimal downtime for critical logistics operations serving commercial customers.
  • You’ll help scale infrastructure to support rapid geographic and operational expansion.

Tech Stack

  • Kubernetes for container orchestration
  • AWS and hybrid cloud environments
  • Prometheus, Grafana, and ELK stack for observability
  • Terraform for infrastructure as code
  • GitLab CI/CD for pipelines
  • Go and Python for tooling and automation
  • gRPC and REST APIs for service communication

Growth Opportunities

  • Opportunity to shape SRE practices in a growing engineering organization.
  • Exposure to cutting-edge challenges in autonomous vehicle operations.
  • Leadership roles available for staff-level contributors.
  • Cross-functional collaboration with AI, robotics, and product teams.

Available for qualified candidates

Required Skills
DockerKubernetesHelmPostgreSQLInfluxDBAirflowPythonBashInfrastructureMonitoringAutomation
About company
Gatik AI
Gatik is the leader in autonomous middle-mile logistics, revolutionizing the B2B supply chain with its autonomous transportation-as-a-service (ATaaS) solution. The company focuses on short-haul, B2B logistics for Fortune 500 retailers and launched the world’s first fully driverless commercial transportation service with Walmart.
All jobs at Gatik AI Visit website
Job Details
Category infrastructure
Posted 9 months ago