About the Role
The role involves developing and maintaining core components of a distributed control plane, working across frontend and backend systems to ensure high availability and performance at scale.
Responsibilities
- Design and implement scalable backend services for system orchestration
- Build intuitive web interfaces for managing distributed infrastructure
- Improve reliability and fault tolerance of control plane components
- Collaborate with infrastructure teams to integrate new features
- Diagnose and resolve performance bottlenecks in production systems
- Write clean, maintainable code with comprehensive test coverage
- Contribute to architectural decisions for long-term system growth
- Monitor system behavior and respond to operational alerts
- Develop APIs for internal and external service communication
- Optimize data flow between control and data plane layers
- Support deployment automation and configuration management
- Enhance observability through logging, metrics, and tracing
- Participate in code reviews and knowledge sharing sessions
- Troubleshoot cross-service issues in distributed environments
- Maintain documentation for system design and workflows
- Ensure security best practices across all layers
- Work with database systems to manage state and metadata
- Refactor legacy components to improve maintainability
- Assist in defining service level objectives and error budgets
- Contribute to disaster recovery and high availability strategies
- Evaluate new technologies for system improvements
- Support CI/CD pipelines for rapid and safe deployments
- Collaborate on incident response and post-mortem analysis
- Drive improvements in developer experience and tooling
- Ensure backward compatibility during system upgrades
Nice to Have
- Experience with Kubernetes or similar orchestration platforms
- Contributions to open-source distributed systems
- Background in database or query engine development
- Familiarity with gRPC and protocol buffers
- Knowledge of service mesh technologies
- Experience with large-scale production deployments
- Understanding of consensus protocols like Raft or Paxos
- Exposure to real-time monitoring dashboards
- Prior work on control plane or management plane systems
- Involvement in incident management and postmortems
Compensation
Competitive salary with performance-based incentives
Work Arrangement
Remote with flexible hours
Team
Collaborative engineering team focused on distributed systems
Tech Stack
- Backend services in C++ with high-performance requirements
- Frontend applications using modern JavaScript frameworks
- Distributed coordination via ZooKeeper or similar systems
- Container orchestration using Kubernetes
- Cloud infrastructure on major providers with global reach
- CI/CD pipelines powered by modern automation tools
- Monitoring stack including metrics, logs, and traces
- gRPC for inter-service communication
- SQL-based metadata storage with high availability
Impact
- Your work will directly influence system uptime and scalability
- You will shape tools used by internal teams worldwide
- Code contributions will affect data reliability and access speed
- Improvements will reduce operational overhead for engineers
- Your features will enable faster service deployments
- You'll help define best practices for distributed control logic
Available for qualified candidates