San Francisco, California, United States Hybrid USD 194,000 – 267,000 / year

Okta is hiring a Site Reliability Engineer

Responsibilities

Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
GCP Observabilty Engineering: Optimize the collection, processing, and storage of Observabilty data to ensure high reliability and low latency of our Splunk and Grafana services
Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and 'observability-driven development.'
Automation: Eliminate 'toil' by automating the deployment and scaling of observability agents and collectors.

Requirements

GKE: Minimum 5+ Experience scaling and managing observability in a Google Cloud platform.
Visualization: Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources.
SRE Mindset: Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
Programming Proficiency: Strong coding skills in Python, Go for building internal tools and automating workflows.
Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE).
Problem Solving: A data-driven approach to debugging complex, cross-service performance bottlenecks.

Nice to Have

Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
Grafana Loki: Experience in migrating Splunk to Grafana Loki
Other Cloud Platforms: Experience managing observability native tools within AWS.

Benefits

equity (where applicable)
bonus
health, dental and vision insurance
401(k)
flexible spending account
paid leave (including PTO and parental leave)

Required Skills

DevOpsOpenTelemetry

About company

Okta is a leading identity and access management platform that helps organizations secure their digital interactions and manage user authentication across various systems and applications.

All jobs at Okta Visit website

Job Details

Department Engineering

Category DevOps & SRE

Posted 5 months ago

Similar Jobs

Other opportunities you might be interested in

Implementation Engineer

EngFlow

Auckland Remote (Global)

Sr. Devops EngineerMexico City Mexico

Monks

Mexico City Remote (City)

Senior Infrastructure Engineer

SentiLink

Software Engineer - Observability

Scaleway

KTO - Platform Engineer - SRE - Lever

KTO

Porto Alegre Remote (Country)

Sr. Devops Engineer

Monks

Bogotá Remote (Global)

Related Articles

Insights related to this role

Workspace setup for an AI developer, showing dual monitors with code and neural networks, symbolizing the AI developer career path.

Become an AI Developer: Your Career Guide

5 min 4 months ago

A remote developer working in a well-lit, modern workspace, illustrating a productive environment enabled by a developer experience platform.

Developer Experience Platform: Lessons from Europe

5 min 4 months ago

Home office setup with dual monitors showing Kubernetes dashboards, representing the rise of Kubernetes remote jobs in AI and cloud-native careers 2026.

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

5 min 4 months ago