Okta is looking for a Site Reliability Engineer to own and architect our observability ecosystem. In this role, you will move beyond simple monitoring to build a comprehensive, scalable telemetry platform, serving as our expert in Splunk optimization and Grafana dashboarding.
What You'll Do
- Lead the design and tuning of Splunk environments, optimizing indexer performance, search efficiency, and data models.
- Architect and maintain sophisticated Grafana dashboards that correlate disparate data sources into a single pane of glass for real-time system health.
- Design, build, and maintain scalable observability infrastructure using tools like Terraform.
- Optimize the collection, processing, and storage of telemetry data (Metrics, Logs, Traces) to ensure high reliability and low latency.
- Develop custom Splunk workflows and integrations that trigger automated responses to system events, reducing Mean Time to Resolution (MTTR).
- Participate in on-call rotations and lead post-incident reviews to drive systemic improvements.
What We're Looking For
- Deep, hands-on experience with Splunk administration, search optimization (SPL), and architecting complex data pipelines.
- Proven ability to build actionable, intuitive dashboards in Grafana that provide deep operational insights.
- Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
- Strong coding skills in Go, Python, or Ruby for building internal tools and automating observability workflows.
- Hands-on experience with OpenTelemetry (OTel), Prometheus, or similar frameworks for instrumenting applications.
- Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).
Nice to Have
- Implementation of distributed tracing (Jaeger, Tempo, or Honeycomb) to visualize request flow across microservices.
- Experience using Splunk for security orchestration (SOAR) or SIEM-related workflows.
- Experience managing observability native tools within AWS, Azure, or GCP.
Technical Stack
- Splunk, Grafana, Terraform, Go, Python, Ruby
- OpenTelemetry, Prometheus
- Linux, Kubernetes, EKS
- AWS, Azure, GCP
Work Mode
This role offers a hybrid work arrangement.
Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran.



