Responsibilities
- Design, build, and operate distributed systems that power observability across ClickHouse Cloud
- Own reliability, performance, and cost-efficiency of our telemetry pipeline and storage systems
- Take part in the on-call rotation and help drive root-cause resolution and long-term fixes
- Build tooling and automation to eliminate repetitive operational work
- Help shape the roadmap for observability by identifying bottlenecks and scaling challenges
- Collaborate with other engineering teams to improve their observability posture
- Contribute to design discussions, architecture reviews, and mentor teammates
Requirements
- 5+ years building and running production systems at scale
- Proficiency in Golang
- Experience with Kubernetes, Helm, ArgoCD, and Terraform or similar IaC tools
- Comfortable working with at least one major cloud provider (AWS, GCP, Azure)
- Experience with OpenTelemetry, Prometheus, Grafana, or similar tools
Nice to Have
- Experience with ClickHouse
Team
Structure: hybrid software, systems, and infrastructure engineers


