About the Role
The engineer will play a central role in developing and maintaining the control plane that manages database provisioning, scaling, and lifecycle operations in a distributed cloud environment, ensuring seamless and secure interactions between users and backend systems.
Responsibilities
- Design and implement control plane services for cloud database operations
- Ensure high availability and fault tolerance of management systems
- Develop automation for provisioning and lifecycle management of database instances
- Monitor system health and respond to operational incidents
- Collaborate with product and infrastructure teams to define scalable architectures
- Optimize control plane performance under high load conditions
- Maintain secure communication between management components and data nodes
- Contribute to disaster recovery and failover mechanisms
- Write production-grade code with comprehensive testing and observability
- Support the deployment and operation of multi-cloud environments
- Troubleshoot complex distributed system issues
- Participate in on-call rotations for critical services
- Document system design and operational procedures
- Integrate with identity and access management systems
- Ensure compliance with security and operational standards
- Improve CI/CD pipelines for control plane components
- Evaluate and adopt new technologies to enhance platform capabilities
- Work with telemetry systems to track control plane metrics
- Coordinate with QA teams to validate system behavior
- Assist in capacity planning for management infrastructure
- Respond to customer escalations related to platform functionality
- Maintain backward compatibility during system upgrades
- Support database clustering and replication controls
- Implement configuration management at scale
- Contribute to post-mortem analyses after incidents
Nice to Have
- Experience with database clustering technologies
- Knowledge of consensus algorithms like Raft or Paxos
- Background in building self-healing infrastructure
- Familiarity with gRPC and protocol buffers
- Experience with observability stacks like Prometheus and Grafana
- Involvement in large-scale cloud migrations
- Understanding of multi-tenancy architectures
- Prior work on automated failover systems
- Contributions to internal developer platforms
- Exposure to edge computing or low-latency control systems
Compensation
Competitive salary and benefits package
Work Arrangement
Remote-friendly with potential for hybrid or office-based work depending on location
Team
Part of a core engineering team focused on cloud infrastructure and platform reliability
Tech Stack
Go, Kubernetes, Terraform, Prometheus, Grafana, gRPC, AWS, GCP, CI/CD pipelines, distributed databases
What We Value
- Ownership of systems from design to operation
- Clear technical communication
- Focus on reliability and performance
- Proactive problem solving
- Collaborative engineering culture
Available for qualified candidates in select regions