About the Role
This role involves leading the development and optimization of large-scale, low-latency systems that process massive data volumes across distributed environments.
Responsibilities
- Design and maintain systems processing approximately 2 million requests per second
- Ensure sub-millisecond response times across critical services
- Manage infrastructure handling petabyte-scale data storage and processing
- Operate distributed systems across geographically dispersed data centers
- Optimize performance of JVM-based applications for low latency and high throughput
- Tune Linux kernel parameters to enhance system efficiency and reliability
- Oversee deployment and management of thousands of servers in owned data centers
- Lead incident response and root cause analysis for production issues
- Collaborate on capacity planning and scalability forecasting
- Implement monitoring and observability tooling for real-time insights
- Drive automation of operational workflows and deployment pipelines
- Enforce best practices in configuration management and infrastructure as code
- Evaluate and integrate new technologies to improve system capabilities
- Mentor engineers on performance optimization and systems design
- Participate in architectural reviews and technical decision-making
- Ensure fault tolerance and resilience in distributed components
- Optimize data replication and consistency models across regions
- Support security audits and compliance requirements for infrastructure
- Maintain documentation for system architecture and operational procedures
- Coordinate with cross-functional teams on integration and performance goals
- Improve energy efficiency and resource utilization in data centers
- Troubleshoot network, storage, and compute bottlenecks
- Evaluate hardware procurement and server lifecycle management
- Contribute to disaster recovery planning and execution
- Drive initiatives to reduce technical debt in core systems
Compensation
Competitive salary and equity package commensurate with experience and impact.
Work Arrangement
Hybrid work model with flexibility to work remotely or from company offices.
Team
The team tackles core infrastructure challenges involving extreme scale, low latency, and high availability, operating across multiple data centers with full ownership of hardware and software layers.
Team
Responsible for some of the most technically challenging work including handling ~2 Million Req/sec in sub-millisecond latency, managing ~Petabytes of data, managing distributed systems across multiple data centers, optimizing JVM and Linux kernel, managing own global data center with thousands of servers.
Reports to
Co-founder & CTO
Visa sponsorship is available for qualified candidates requiring relocation.