Udemy is seeking a Senior Staff Database Reliability Engineer to join our Datastore Infrastructure (DSI) team. You will oversee all aspects of Udemy's critical data infrastructure, including databases, streaming, and caching systems, ensuring their reliability, security, performance, and future growth.
What You'll Do
- Lead improvement projects for datastores and platform teams to align with company long-term objectives.
- Maintain infrastructure uptime, monitor performance, and ensure infrastructure scales as the company grows.
- Develop immutable infrastructure patterns and automate infrastructure provisioning via code using Terraform, Python, Golang, and Ansible.
- Ensure adherence to PCI, ISO27001, and SOC 2 security requirements, modifying CI/CD processes when necessary.
- Advocate for and implement positive changes in tools and processes.
- Participate in the on-call rotation with a systematic approach to incident management.
- Participate in day-to-day team activities, support requests, and project-related tasks.
- Contribute to documentation, maintain ticketing queues, provide project support, troubleshoot, and offer after-hours assistance.
- Provide coaching and mentorship to new hires, fostering their technical growth and integration.
What We're Looking For
- 3-5 years of professional experience in a Cloud Engineering, SRE, or DBRE team managing large production workloads.
- Proficiency with managing MySQL at scale, including horizontal scaling, sharding, InnoDB optimizations, query optimization, HA/DR, monitoring, backups, security, and automations.
- Strong understanding in supporting datastores running behind Kubernetes workloads in production.
- Proficiency with tools like Terraform, Ansible, Git and working with Infrastructure as Code.
- Strong experience in Kafka/MSK cluster management, topic configuration, performance tuning, and ensuring high availability.
- Strong experience with Distributed Caching (Redis, Valkey, Memcache) or similar products.
- Experience in Python or Golang.
- Knowledge of configuration management tools, monitoring systems like Datadog for database infrastructure, and scaling strategies for increased data volumes.
- Strong troubleshooting skills to diagnose complex database issues.
- Hands-on experience with AWS cloud infrastructure and a grasp of security best practices.
- Adaptability and comfort working in a fast-paced, hands-on environment.
Technical Stack
- Databases: MySQL, PGSQL, Aurora, DynamoDB
- Streaming & Caching: Kafka, Redis, Valkey, Memcache
- Infrastructure: Kubernetes, AWS
- Platform Tools: Terraform, Ansible, Git
- Languages: Python, Golang
- Monitoring: Datadog
Team & Environment
The Datastore Infrastructure (DSI) team is part of Udemy's Platform team and is split between EU and US regions.
Benefits & Compensation
- Full access to Udemy courses
- Monthly UDay to invest in yourself
- Budget for self-improvement tools
- Health insurance and other region-specific benefits
Work Mode
This role follows a hybrid work model and is open to candidates in San Francisco, CA; Denver, CO; Austin, TX; Australia; India; Ireland; Mexico; and Türkiye.
At Udemy, we value diversity and inclusion and consider qualified applicants without regard to race, color, religion, sex, national origin, ancestry, age, genetic information, sexual orientation, gender identity, marital or family status, veteran status, medical condition, or disability.



