Responsibilities
- Designing, deploying, and maintaining production observability stacks (logs, metrics, traces)
- Scaling observability infrastructure across multiple data centers and Kubernetes clusters
- Owning logging pipeline architecture and refactoring efforts
- Improving distributed tracing coverage and driving adoption across engineering teams
- Managing EKS upgrades, node exporters, agents, and collectors
- Automating operational tasks to reduce toil and improve system stability
- Contributing to compliance and audit readiness (access controls, data handling, pipeline integrity)
- Evaluating and adopting new observability tooling - knowing what's worth pursuing and what isn't
Requirements
- 8+ years of industry experience
- Solid hands-on production experience with observability systems
- Experience with logging pipelines: design, deployment, and refactoring
- Understanding of distributed tracing and SPM (Service Performance Monitoring) built on top of tracing
- Experience with Kubernetes cluster lifecycle management (EKS preferred)
- Practical knowledge of storage trade-offs for observability data at scale
Nice to Have
- Strong plus - familiarity with OpenTelemetry, Kafka, Vector, and VictoriaMetrics (vmagent, alerting rules)
- Experience using AI to automate infrastructure or observability tasks (e.g., automated RCA, PR generation, deployment workflows)
- Familiarity with AI-assisted tooling selection and workflow integration
- Experience with MCP (custom or open-source implementations)
- Background in cloud account or environment migrations
- Experience preparing infrastructure for compliance/audit processes
- Understanding network architecture, troubleshooting and incident resolution skills in Production environment, writing Post-mortems
- Experience with containers and Kubernetes (installation and configuration of operators)
- Basic knowledge of one or more high-level programming languages, such as Python, Golang, Java
Additional Information
- Good communication and collaboration skills in international technological companies
- Interest in modern big distributed storage technologies, architectures
- Good Spoken English to participate in product-related, architectural and technical discussions
- Proper balance between being hands-on and deeper analytical approaches