Act as the primary SRE partner for the DBA team, bringing deep systems and reliability expertise to all managed database platforms.
Own the end-to-end observability stack for databases – define and implement metrics, logs, traces, dashboards, and actionable alerts for PostgreSQL, MongoDB, Redis, OpenSearch, YugabyteDB and related services.
Design, implement, and continuously improve monitoring and alerting specifically for database reliability (replication lag, connection saturation, query latency, storage and WAL usage, vacuum/bloat, backup health, etc.).
Lead and participate in on-call and incident response for database or system incidents
Build and maintain automation and self-service workflows for the DBA team using any Infrastructure and configuration management (Puppet/Ansible) – e.g. cluster provisioning, configuration rollout, user/role management, backup/restore orchestration, and failover procedures.
Develop and maintain runbooks, playbooks, and standard operating procedures for SRE aspects of database operations
Champion an automation-first, “no manual changes” culture for database and platform changes, and actively work to reduce toil through tooling and platform improvements.
Be a member of a weekly 24/7 on-call rotation
Strong collaboration skills as well as the ability to work independently

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent practical experience).
4–7 years of total experience, with strong focus on SRE / Production Engineering / Platform Engineering
Solid experience running Linux-based production systems at scale
Proficiency in Infrastructure and configuration management (Puppet)
Strong scripting / programming skills in one or more: Python, Go, Shell (bash), or similar
Solid experience with Git and GitHub – branching and PR workflows, code reviews, and automating operations via GitHub Actions or similar CI systems.
Experience building/maintaining CI/CD pipelines and integrating operational checks and tests
Excellent problem-solving skills and a proactive, "can-do" attitude.
Strong communication and collaboration skills to work effectively with cross-functional teams.
Experience with continuous integration and automation tools such as Puppet, as well as strong unix/linux skills are required.
You should have excellent communication skills .
Must be able to work independently and without direct supervision.

Familiarity with containerization and orchestration technologies (Docker, Kubernetes)
Interest in learning more about how databases behave in production (replication, failover, backups, performance) from a reliability and systems perspective.

Work from (almost) anywhere for up to 20 days per year
Focus on mental health and well-being
Company-paid therapy sessions through SpringHealth
Company-paid subscription to Headspace
Annual company-wide week off a year - the whole team fully recharges (and returns without a pile-up of work!)
Paid parental leave
Generous paid vacation + time off for your birthday
Paid volunteer time
Focus on your career growth
Development Dollars
Leadership development
Access to thousands of on-demand e-learnings
Travel Discounts
Employee Resource Groups
Quarterly team offsite
Tax optimisation options
Generous health insurance
Pension fund

Remote (Worldwide)

Structure: global team and its portfolio of metasearch brands

While the majority of your responsibilities may align with conventional business hours, there will be instances where you are expected to manage communications - via calls, Slack messages, or emails - outside of regular working hours to effectively collaborate with international colleagues, respond to restaurant partners, and/or address urgent matters.

OpenTable is hiring a Site Reliability Engineer II, Data Platforms

Similar Jobs