São Paulo Hybrid Full-time

Braze is hiring a Senior Site Reliability Engineer

Responsibilities

  • Design and operate Braze’s MongoDB infrastructure to meet strict enterprise-grade SLAs, with deep ownership of availability, durability, and query performance
  • Build proactive monitoring and alerting that fires on symptoms – before customers feel impact – with rich MongoDB-specific observability (oplog lag, replication health, lock contention, index hit rates, etc.)
  • Lead capacity planning and sharding strategy as data volumes and query patterns evolve
  • Drive root-cause analysis on MongoDB incidents and translate findings into permanent system improvements
  • Partner with product engineering teams to review schema designs, index strategies, and aggregation pipelines – catching scalability anti-patterns before they reach production
  • Build self-service tooling, automation, and runbooks that let engineers interact with MongoDB safely and efficiently without needing to page the platform team
  • Define and enforce connection pool sizing, write-concern defaults, and read-preference standards across the fleet
  • Manage MongoDB cluster lifecycle (provisioning, upgrades, failovers, decommissions) on Kubernetes using the MongoDB Enterprise Kubernetes Operator, with infrastructure defined as code via Terraform and Ansible
  • Develop and maintain automated backup, restore, and point-in-time recovery workflows – tested regularly against real workloads
  • Contribute to internal platform tooling in Ruby and/or Go that reduces operational toil across the SRE organization
  • Participate in a PagerDuty on-call rotation with a clear charter: use every quiet shift to eliminate the next page
  • Lead incident retrospectives with a bias toward systemic fixes, automation, and documentation – not blame
  • Maintain and improve runbooks so that any engineer on the team can respond effectively to MongoDB incidents

Requirements

  • 5+ years of experience as a Software Engineer, DevOps Engineer, or Site Reliability Engineer in a production environment
  • Hands-on MongoDB expertise: replica sets, sharding, index design, aggregation pipelines, explain plans, and performance tuning under real load
  • Strong Linux fundamentals and comfort operating at the OS level (disk I/O, memory, networking, process management)
  • Strong programming skills in one or more of: Python, Go, Ruby, or JavaScript – you write automation, not just scripts (JavaScript/Python experience is a plus for MongoDB shell scripting and aggregation pipeline work)
  • Experience with IaC tools: Terraform, Ansible, or equivalent
  • Experience with container orchestration: Docker and Kubernetes
  • A systems thinker who reasons about interfaces, failure modes, edge cases, and cascading effects across the stack
  • Bias toward documentation and asynchronous collaboration across global remote teams

Nice to Have

  • Experience running MongoDB at multi-terabyte scale or in a sharded topology
  • Familiarity with MongoDB Atlas, Ops Manager, or Cloud Manager
  • Experience with complementary data technologies in Braze’s stack: Redis, Kafka, Postgres
  • Prior work on database platform engineering or database reliability engineering (DBRE) teams
About company
Braze
Braze is the leading customer engagement platform that empowers brands to Be Absolutely Engaging. The platform allows marketers to collect and take action on data from any source to creatively engage with customers in real time, across channels from one platform.
All jobs at Braze Visit website
Job Details
Department MongoDB Platform team
Category infrastructure
Posted 3 hours ago