CloudLinux is hiring a Senior Database Reliability Engineer (DBRE) & Architect to lead the evolution of our data platform. You will transform classic database administration into an Internal Database-as-a-Service (DBaaS) model, designing resilient distributed systems and transforming databases into a reliable service for product teams.
What You'll Do
- Design and implement a self-service DBaaS platform using Terraform and Ansible to deploy HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) across a heterogeneous environment (Bare Metal, OpenNebula, Kubernetes, Public Clouds).
- Manage and scale exponentially growing ClickHouse analytics clusters (12+ clusters, tens of terabytes of data), tackling sharding, table engine optimization (ReplicatedMergeTree), and building reliable S3 backup pipelines under high load.
- Maintain and scale infrastructure for Apache Airflow and Redash, ensuring the reliability of ETL pipelines and visualization tools.
- Implement SRE practices in data management, replacing manual incident response with automated self-healing and defining/implementing SLO/SLI for all databases.
- Lead migration from legacy solutions to modern cloud patterns and participate in decision-making regarding Kubernetes operators for stateful workloads.
- Serve as a technical authority for product teams, helping them optimize data schemas and SQL queries for high-load systems.
What We're Looking For
- Deep experience designing and managing a self-service DBaaS or similar internal platform.
- Expert-level knowledge in managing and scaling ClickHouse clusters, including sharding and performance optimization.
- Proven experience automating database infrastructure at scale using infrastructure-as-code tools like Terraform and Ansible.
- Strong background implementing SRE/SLO practices for database services.
- Experience operating databases across multiple environments, including Bare Metal, Kubernetes, and Public Clouds.
- Ability to lead technical decision-making and serve as an authority for engineering teams on data-layer optimization.
Technical Stack
- Databases: PostgreSQL 15+ (Patroni, PgBouncer), ClickHouse (Sharded/Replicated), MongoDB, Redis, Kafka
- Platform & Orchestration: Apache Airflow, Redash, OpenNebula, Kubernetes, Bare Metal
- Cloud Providers: AWS, Google Cloud, Azure, DO
- Automation & IaC: Terraform, Ansible
- Languages & Tools: Python/Go, GitLab, Jenkins, Gerrit
- Monitoring & Observability: VictoriaMetrics, Grafana, Loki
Benefits & Compensation
- Remote-first culture.
- Professional development support, including paid training and conferences.
Work Mode
This is a worldwide remote position. CloudLinux is a remote-first company with an 'Employees First' principle, valuing results over hours in the office.



