CloudLinux is hiring a Senior Database Reliability Engineer & Architect to lead the strategic evolution of our data platform. You will be instrumental in shifting from classic database administration to an Internal Database-as-a-Service model, designing resilient distributed systems and writing code to automate infrastructure.
What You'll Do
- Design and implement a self-service DBaaS platform using Terraform and Ansible, enabling HA cluster deployment (PostgreSQL, ClickHouse, MongoDB, Redis) across Bare Metal, OpenNebula, Kubernetes, and Public Clouds.
- Manage exponentially growing analytics clusters (12+ clusters, tens of terabytes), tackling sharding, ClickHouse table engine optimization, and building reliable S3 backup pipelines under high load.
- Maintain and scale infrastructure for Apache Airflow and Redash, ensuring the reliability of ETL pipelines and visualization tools.
- Implement SRE practices in data management, replacing manual incident response with automated self-healing and defining SLO/SLI for all databases.
- Lead migrations from legacy solutions to modern cloud patterns and participate in decision-making regarding Kubernetes operators for stateful workloads.
- Serve as the technical authority for product teams, helping them optimize data schemas and SQL queries for high-load systems.
What We're Looking For
- Deep PostgreSQL Expertise (5+ years): Knowledge of MVCC internals, locking mechanics, ability to configure Patroni and PgBouncer, and experience with seamless major version upgrades under load.
- ClickHouse Mastery: Experience operating large clusters, understanding ZooKeeper/ClickHouse Keeper, sharding, replication internals, and diagnosing performance issues at the data-part level.
- Engineering Mindset (SRE/DevOps): Experience writing complex Terraform modules and Ansible roles. Programming skills in Python or Go for automation are a huge plus.
- Hybrid Environment Experience: Understanding the differences between running DBs on Bare Metal vs. Kubernetes vs. Cloud and knowing how to optimize TCO and disk subsystem performance.
- Systems Approach: Ability to see the big picture from network packet to application business logic, with an understanding of security (FIPS, Audit logs) and Disaster Recovery.
Nice to Have
- Experience building an Internal Developer Platform (IDP).
- Experience operating databases in Kubernetes (CloudNativePG, Altinity Operator).
- Experience working in Cloud and Hosting providers on similar services.
Technical Stack
- Databases: PostgreSQL 15+ (Patroni, PgBouncer), ClickHouse (Sharded/Replicated), MongoDB, Redis, Kafka
- Data & Analytics: Apache Airflow, Redash (Infrastructure & Integration)
- Infrastructure: Own 3+DC colocation (OpenNebula, Kubernetes, Bare Metal), AWS, Google Cloud, Azure, DO – Hybrid Cloud
- Automation & IaC: Terraform, Ansible, Python/Go, GitLab, Jenkins, Gerrit
- Observability: VictoriaMetrics, Grafana, Loki
Benefits & Compensation
- A focus on professional development.
- Interesting and challenging projects.
- Fully remote work with flexible working hours, work from any location worldwide.
- Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
- Compensation for private medical insurance.
- Co-working and gym/sports reimbursement.
- Budget for education.
- The opportunity to receive a reward for the most innovative idea that the company can patent.
Work Mode
This is a worldwide remote position. CloudLinux is a Remote-first company with an 'Employees First' principle. We value results, not hours in the office. Your architectural decisions will determine the stability of services used by thousands of companies globally. We support professional development and pay for training and conferences.

