What You'll Do
Oversee the stability, scalability, and security of enterprise-grade database environments spanning multiple cloud platforms. You'll manage complex database ecosystems including Couchbase, PostgreSQL, DynamoDB, Cosmos DB, and Snowflake, ensuring optimal performance and uptime.
Lead configuration, replication, and monitoring of Couchbase clusters using XDCR, and fine-tune query performance through indexing and execution plan analysis. Implement vertical and horizontal scaling strategies, sharding, and capacity planning to support growing data demands.
Design and enforce backup and recovery frameworks with defined retention, RTO, and RPO targets. Maintain point-in-time recovery capabilities and conduct regular disaster recovery testing. Manage cross-region replication and leverage AWS Backup and native tools for comprehensive data protection.
Respond to critical incidents as part of a 24/7 on-call rotation, leading root cause investigations and resolving issues like replication lag, deadlocks, and resource bottlenecks. Collaborate with application and infrastructure teams during outages and performance events.
Build and refine monitoring with Datadog, CloudWatch, and Azure Monitor—creating dashboards, setting intelligent alerts, and defining escalation paths. Automate routine operations using Python, Bash, or PowerShell, and apply Infrastructure as Code with Terraform or CloudFormation.
Support database migrations across platforms and cloud providers, ensuring data integrity and minimal service disruption. Contribute to architectural decisions, document configurations and SOPs, and guide teams on security best practices including encryption, IAM policies, and secrets management via AWS Secrets Manager and Azure Key Vault.
Requirements
- Minimum of 5 years managing production databases at scale
- Proven experience with AWS database services such as RDS, Aurora, DynamoDB, and DocumentDB
- Hands-on work with Couchbase, including cluster administration, XDCR, and N1QL optimization
- Familiarity with Azure Cosmos DB and Azure Database for PostgreSQL
- Deep knowledge of query tuning, indexing, and database internals
- Experience with backup, restore, PITR, and disaster recovery validation
- Proficiency in monitoring tools like Datadog, CloudWatch, and Azure Monitor
- Scripting skills in Python, Bash, or PowerShell for automation
- Experience with Terraform or CloudFormation
- Strong grasp of database security, encryption, IAM, and compliance standards
- Ability to troubleshoot under pressure and participate in on-call rotations
Benefits
- 90% company-paid healthcare premiums
- HSA contribution from company
- 401K match at 4% with immediate vesting
- Flexible PTO policy—typically 3 to 4 weeks annually
- 10 paid company holidays
- Monthly wellness and lifestyle stipend
- Access to LinkedIn Learning with dedicated time for development
