Responsibilities
- Design, deploy, and maintain ClickHouse clusters on Kubernetes at global scale.
- Build and operate database infrastructure for multiple database technologies (PostgreSQL, Opensearch, etc.) in Kubernetes.
- Develop automation for database provisioning, scaling, backup, and recovery workflows.
- Optimize database performance, reliability, and resource efficiency in containerized environments.
- Create and maintain Kubernetes operators and manifests for database workloads.
- Manage infrastructure through Terraform and Infrastructure as Code practices.
- Lead incident response and post-mortem processes for database-related issues.
- Collaborate with Data Engineers and application teams on architecture and capacity planning.
- Build internal tooling to improve database observability and operational workflows.
Requirements
- 10+ years of experience in infrastructure or database engineering roles.
- Strong experience running stateful workloads on Kubernetes (operators, StatefulSets, persistent volumes).
- Proficiency with Linux systems, performance tuning, and troubleshooting.
- Experience with Infrastructure as Code (Terraform) and configuration management.
- Solid understanding of distributed systems architecture and database internals.
- Proven track record building automation and improving system reliability.
- Experience with cloud infrastructure (AWS preferred).
- Strong analytical and incident management skills.
Nice to Have
- Expertise with ClickHouse administration, optimization, and scaling in production.
- Experience with PostgreSQL, Opensearch, Redis, or other database systems.
- Familiarity with Golang or Python for tooling or operators.
- Experience with Kafka, VictoriaMetrics, ArgoCD, or Vault.
- Contributions to database-related open-source projects.
- Understanding of database replication, sharding, and high-availability patterns.
Additional Information
- relocation