Practicebetter is hiring a Senior Site Reliability Engineer to be one of the first members of the Reliability Engineering team at ClickHouse. You will build and lead processes to ensure the reliability, availability, scalability, and performance of our cloud infrastructure running ClickHouse databases. You'll collaborate with multiple engineering teams to design and implement secure, highly available, and fault-tolerant distributed systems.
What You'll Do
- Collaborate with engineering teams to design and implement scalable, secure, and highly available systems for ClickHouse.
- Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.
- Ensure all infrastructure components have monitoring and alerting for timely incident detection and resolution.
- Enhance incident response processes and conduct post-mortem analysis for outages.
- Continuously improve the reliability and performance of our ClickHouse services.
- Plan, enable, and drive Chaos initiatives across Engineering teams.
- Manage on-call processes to respond to performance and reliability issues.
- Develop software platforms and tools to optimize operational and engineering efficiencies.
What We're Looking For
- Bachelor’s or Master’s degree in Computer Science or a related field.
- At least 8 years of experience in Site Reliability Engineering or a closely related field.
- Previous production experience using ClickHouse.
- Hands-on experience with Go and/or Python.
- Strong knowledge of cloud platforms like AWS, Azure, or Google Cloud Platform.
- Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm.
- Strong experience with automation and configuration management tools like Ansible, Terraform, or Puppet.
- Strong problem-solving and production debugging skills.
- Passionate about efficiency, availability, scalability, and data governance.
- Ability to thrive in a fast-paced environment.
- High level of responsibility, ownership, and accountability.
- Excellent communication and interpersonal skills.
Nice to Have
- Excellent understanding of distributed databases and SQL, particularly ClickHouse.
Technical Stack
- Go, Python
- AWS, Azure, Google Cloud Platform
- Kubernetes, Docker Swarm
- Ansible, Terraform, Puppet
- ClickHouse
Team & Environment
You will join a newly formed Site Reliability Engineering team and collaborate with Control Plane, Dataplane, Core, Security, Support, and Operations teams.
Benefits & Compensation
- Flexible work environment - ClickHouse is a globally distributed company and remote-friendly.
- Healthcare - Employer contributions towards your healthcare.
- Equity in the company - Every new team member receives stock options.
- Flexible time off in the US, generous entitlement in other countries.
- A $500 home office setup for remote employees.
- Global Gatherings – opportunities to engage with colleagues at company-wide offsites.
- Typical starting salary in the US is $141,000 - $208,000 USD. In US Premium Markets (e.g., San Francisco Bay Area, New York City Metro Area) it is $157,000 - $230,000 USD. + equity: Stock options.
Work Mode
This is a remote position open to candidates in the United States.
ClickHouse provides equal employment opportunities to all employees and applicants and prohibits discrimination and harassment of any type based on factors such as race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.





