Responsibilities
- Act as the primary SRE partner for the DBA team, bringing deep systems and reliability expertise to all managed database platforms.
- Own the end-to-end observability stack for databases – define and implement metrics, logs, traces, dashboards, and actionable alerts for PostgreSQL, MongoDB, Redis, OpenSearch, YugabyteDB and related services.
- Design, implement, and continuously improve monitoring and alerting specifically for database reliability (replication lag, connection saturation, query latency, storage and WAL usage, vacuum/bloat, backup health, etc.).
- Lead and participate in on-call and incident response for database or system incidents
- Build and maintain automation and self-service workflows for the DBA team using any Infrastructure and configuration management (Puppet/Ansible) – e.g. cluster provisioning, configuration rollout, user/role management, backup/restore orchestration, and failover procedures.
- Develop and maintain runbooks, playbooks, and standard operating procedures for SRE aspects of database operations
- Champion an automation-first, “no manual changes” culture for database and platform changes, and actively work to reduce toil through tooling and platform improvements.
- Be a member of a weekly 24/7 on-call rotation
- Strong collaboration skills as well as the ability to work independently
Requirements
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent practical experience).
- 4–7 years of total experience, with strong focus on SRE / Production Engineering / Platform Engineering
- Solid experience running Linux-based production systems at scale
- Proficiency in Infrastructure and configuration management (Puppet)
- Strong scripting / programming skills in one or more: Python, Go, Shell (bash), or similar
- Solid experience with Git and GitHub – branching and PR workflows, code reviews, and automating operations via GitHub Actions or similar CI systems.
- Experience building/maintaining CI/CD pipelines and integrating operational checks and tests
- Excellent problem-solving skills and a proactive, "can-do" attitude.
- Strong communication and collaboration skills to work effectively with cross-functional teams.
- Experience with continuous integration and automation tools such as Puppet, as well as strong unix/linux skills are required.
- You should have excellent communication skills .
- Must be able to work independently and without direct supervision.
Nice to Have
- Familiarity with containerization and orchestration technologies (Docker, Kubernetes)
- Interest in learning more about how databases behave in production (replication, failover, backups, performance) from a reliability and systems perspective.
Benefits
- Work from (almost) anywhere for up to 20 days per year
- Focus on mental health and well-being
- Company-paid therapy sessions through SpringHealth
- Company-paid subscription to Headspace
- Annual company-wide week off a year - the whole team fully recharges (and returns without a pile-up of work!)
- Paid parental leave
- Generous paid vacation + time off for your birthday
- Paid volunteer time
- Focus on your career growth
- Development Dollars
- Leadership development
- Access to thousands of on-demand e-learnings
- Travel Discounts
- Employee Resource Groups
- Quarterly team offsite
- Tax optimisation options
- Generous health insurance
- Pension fund
Work Arrangement
Remote (Worldwide)
Team
Structure: global team and its portfolio of metasearch brands
Additional Information
- While the majority of your responsibilities may align with conventional business hours, there will be instances where you are expected to manage communications - via calls, Slack messages, or emails - outside of regular working hours to effectively collaborate with international colleagues, respond to restaurant partners, and/or address urgent matters.


