Continuously enhance the reliability and performance of the core database system.
Develop and improve metrics and alerts to identify and prevent production issues before they impact users.
Investigate common customer issues to find root causes and propose fixes, reports, and improvements.
Refine incident response processes and post-mortem analyses for outages, collaborating with support and cloud teams to inform affected users.
Plan, implement, and lead chaos initiatives across engineering teams based on internal priorities.
Oversee on-call processes to address performance and reliability issues, establishing best practices for issue resolution and minimizing user impact.

Bachelor’s or Master’s degree in Computer Science or a related field.
At least 5 years of experience in Reliability Engineering, QA, or customer-facing engineering.
Previous experience operating the core database system or other SQL databases in production.
Scripting experience with Shell or Python, and ability to read and understand C++ code.
Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
Strong problem-solving skills and solid production debugging abilities.
Ability to thrive in a fast-paced environment as part of a global team, with a focus on business goals.
High level of responsibility, ownership, and accountability.
Excellent communication skills

Excellent understanding of distributed database internals and SQL, particularly the core database system.

Remote (Worldwide)

Site Reliability Engineering team in the core database system

Continuously enhance the reliability and performance of the core database system.
Develop and improve metrics and alerts to identify and prevent production issues before they impact users.
Investigate common customer issues to find root causes and propose fixes, reports, and improvements.
Refine incident response processes and post-mortem analyses for outages, collaborating with support and cloud teams to inform affected users.
Plan, implement, and lead chaos initiatives across engineering teams based on internal priorities.
Oversee on-call processes to address performance and reliability issues, establishing best practices for issue resolution and minimizing user impact.

Bachelor’s or Master’s degree in Computer Science or a related field.
At least 5 years of experience in Reliability Engineering, QA, or customer-facing engineering.
Previous experience operating the core database system or other SQL databases in production.
Scripting experience with Shell or Python, and ability to read and understand C++ code.
Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
Strong problem-solving skills and solid production debugging abilities.
Ability to thrive in a fast-paced environment as part of a global team, with a focus on business goals.
High level of responsibility, ownership, and accountability.
Excellent communication skills

Excellent understanding of distributed database internals and SQL, particularly the core database system.

ClickHouse is hiring a Database Reliability Engineer - Core Team

Similar Jobs