San Francisco, California, United States Hybrid USD 194,000 – 267,000 / year

Okta is hiring a Site Reliability Engineer

Responsibilities

  • Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
  • GCP Observabilty Engineering: Optimize the collection, processing, and storage of Observabilty data to ensure high reliability and low latency of our Splunk and Grafana services
  • Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and 'observability-driven development.'
  • Automation: Eliminate 'toil' by automating the deployment and scaling of observability agents and collectors.

Requirements

  • GKE: Minimum 5+ Experience scaling and managing observability in a Google Cloud platform.
  • Visualization: Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources.
  • SRE Mindset: Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
  • Programming Proficiency: Strong coding skills in Python, Go for building internal tools and automating workflows.
  • Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
  • Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.

Nice to Have

  • Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
  • Grafana Loki: Experience in migrating Splunk to Grafana Loki
  • Other Cloud Platforms: Experience managing observability native tools within AWS.

Benefits

  • equity (where applicable)
  • bonus
  • health, dental and vision insurance
  • 401(k)
  • flexible spending account
  • paid leave (including PTO and parental leave)
Required Skills
DevOpsOpenTelemetry
About company
Okta
Okta is a leading identity and access management platform that helps organizations secure their digital interactions and manage user authentication across various systems and applications.
All jobs at Okta Visit website
Job Details
Department Engineering
Category infrastructure
Posted 3 months ago