This position is no longer available
Bengaluru, Karnataka, India Hybrid

Okta was looking for a Site Reliability Engineer

Okta is looking for a Site Reliability Engineer to own and architect our observability ecosystem. In this role, you will move beyond simple monitoring to build a comprehensive, scalable telemetry platform, serving as our expert in Splunk optimization and Grafana dashboarding.

What You'll Do

  • Lead the design and tuning of Splunk environments, optimizing indexer performance, search efficiency, and data models.
  • Architect and maintain sophisticated Grafana dashboards that correlate disparate data sources into a single pane of glass for real-time system health.
  • Design, build, and maintain scalable observability infrastructure using tools like Terraform.
  • Optimize the collection, processing, and storage of telemetry data (Metrics, Logs, Traces) to ensure high reliability and low latency.
  • Develop custom Splunk workflows and integrations that trigger automated responses to system events, reducing Mean Time to Resolution (MTTR).
  • Participate in on-call rotations and lead post-incident reviews to drive systemic improvements.

What We're Looking For

  • Deep, hands-on experience with Splunk administration, search optimization (SPL), and architecting complex data pipelines.
  • Proven ability to build actionable, intuitive dashboards in Grafana that provide deep operational insights.
  • Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
  • Strong coding skills in Go, Python, or Ruby for building internal tools and automating observability workflows.
  • Hands-on experience with OpenTelemetry (OTel), Prometheus, or similar frameworks for instrumenting applications.
  • Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).

Nice to Have

  • Implementation of distributed tracing (Jaeger, Tempo, or Honeycomb) to visualize request flow across microservices.
  • Experience using Splunk for security orchestration (SOAR) or SIEM-related workflows.
  • Experience managing observability native tools within AWS, Azure, or GCP.

Technical Stack

  • Splunk, Grafana, Terraform, Go, Python, Ruby
  • OpenTelemetry, Prometheus
  • Linux, Kubernetes, EKS
  • AWS, Azure, GCP

Work Mode

This role offers a hybrid work arrangement.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran.

Required Skills
GrafanaTerraformGoPythonRubyOpenTelemetryPrometheusLinuxKubernetes
About company
Okta
Okta is a leading identity and access management platform that helps organizations secure their digital interactions and manage user authentication across various systems and applications.
All jobs at Okta Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 3 months ago