Bengaluru, Karnataka, India Hybrid Employment

Okta is hiring a Site Reliability Engineer

About the Role

Okta is looking for a Site Reliability Engineer to own and architect our observability ecosystem. In this role, you will move beyond simple monitoring to build a comprehensive, scalable telemetry platform, serving as our expert in Splunk optimization and Grafana dashboarding.

What You'll Do

  • Lead the design and tuning of Splunk environments, optimizing indexer performance, search efficiency, and data models.
  • Architect and maintain sophisticated Grafana dashboards that correlate disparate data sources into a single pane of glass for real-time system health.
  • Design, build, and maintain scalable observability infrastructure using tools like Terraform.
  • Optimize the collection, processing, and storage of telemetry data (Metrics, Logs, Traces) to ensure high reliability and low latency.
  • Develop custom Splunk workflows and integrations that trigger automated responses to system events, reducing Mean Time to Resolution (MTTR).
  • Participate in on-call rotations and lead post-incident reviews to drive systemic improvements.

What We're Looking For

  • Deep, hands-on experience with Splunk administration, search optimization (SPL), and architecting complex data pipelines.
  • Proven ability to build actionable, intuitive dashboards in Grafana that provide deep operational insights.
  • Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
  • Strong coding skills in Go, Python, or Ruby for building internal tools and automating observability workflows.
  • Hands-on experience with OpenTelemetry (OTel), Prometheus, or similar frameworks for instrumenting applications.
  • Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).

Nice to Have

  • Implementation of distributed tracing (Jaeger, Tempo, or Honeycomb) to visualize request flow across microservices.
  • Experience using Splunk for security orchestration (SOAR) or SIEM-related workflows.
  • Experience managing observability native tools within AWS, Azure, or GCP.

Technical Stack

  • Splunk, Grafana, Terraform, Go, Python, Ruby
  • OpenTelemetry, Prometheus
  • Linux, Kubernetes, EKS
  • AWS, Azure, GCP

Work Mode

This role offers a hybrid work arrangement.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran.

Required Skills
SplunkGrafanaTerraformGoPythonRubyOpenTelemetryPrometheusLinuxKubernetes
Looking for a remote dev community?

200+ professionals, 37 countries, one network

Working remotely doesn't mean working alone. Iglu connects you with developers, designers, and digital experts worldwide. Collaborate, learn, and grow together.

Global professional network
Knowledge sharing & collaboration
Regular community events
Cross-project opportunities
Join the community
37 countries represented
About company
Okta

Okta is a leading identity and access management platform that helps organizations secure their digital interactions and manage user authentication across various systems and applications.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago