Lead improvement projects for our datastores and platform teams to align with the company’s long term objectives.
Maintain Infrastructure Uptime, monitor performance, and ensure infrastructure continues scaling as we grow.
Develop Immutable infrastructure patterns, and automate Infrastructure provisioning via Code (Terraform, Python, Golang, Ansible etc ..)
Ensure adherence to PCI and ISO27001 compliance as well as SOC 2 security requirements, modifying CI/CD processes when necessary, and upholding policies and standards.
Advocate for and implement positive changes in tools and processes through healthy discussions.
Participate in the on-call rotation, demonstrating a systematic approach to incident management.
Participate in day-to-day activities, support requests, and project-related tasks for the team.
Contribute to documentation, maintain ticketing queues, provide project support, troubleshoot, and offer after-hours assistance as required
Provide coaching and mentorship to new hires, fostering their technical growth and integration into the team. Maintain close communication with team members throughout their tenure

3-5 years of professional experience working in a Cloud Engineering team (also SRE/DBRE team) with Infrastructure responsibilities in managing large production workloads.
Proficiency with managing MySQL at scale (Horizontal Scaling, sharding, InnoDB optimizations, Query Optimization, HA/DR, Monitoring, Backups Strategy, Security, Automations).
Strong understanding in supporting datastores running behind Kubernetes Workloads in Production.
Proficiency with tools like Terraform, Ansible, Git and how to work with Infrastructure as Code, and automated provisioning.
Strong experience in Kafka/MSK cluster management, topic configuration, performance tuning, and ensuring high availability and fault tolerance.
Strong Experience with Distributed Caching (Redis, Valley, Memcache) or similar products
Experience in Python / Golang
Knowledge of configuration management tools, monitoring systems (Datadog or similar) for database infrastructure, and scaling strategies for handling increased data volumes.
Strong troubleshooting skills to diagnose complex database issues.
Hands-on experience with AWS cloud infrastructure and a grasp of security best practices.

Full access to Udemy courses
A monthly UDay to invest in yourself, and a budget to spend on whatever helps you improve.
AI is real here. We use it in the way we learn and the way we work. You’ll have the space and tools to experiment, apply, and get better at using AI in practical ways.
You’ll own your work. We trust people to lead, make decisions, and follow through. You don’t need to wait for permission or layers of approval to have an impact.
You’ll build with others. We collaborate openly and shape ideas together. Everyone has a voice, and good thinking is welcomed from any direction.
You’ll see your impact. What you build helps people grow their skills, change their careers, or find a path forward. You’ve got the experience, why not use it to help others gain theirs?

Hybrid

Structure: The team is split between EU and US regions.

Udemy is hiring a Senior Staff Database Reliability Engineer

Similar Jobs