About the Role

This role involves leading the development and optimization of large-scale, low-latency systems that process massive data volumes across distributed environments.

Responsibilities

Design and maintain systems processing approximately 2 million requests per second
Ensure sub-millisecond response times across critical services
Manage infrastructure handling petabyte-scale data storage and processing
Operate distributed systems across geographically dispersed data centers
Optimize performance of JVM-based applications for low latency and high throughput
Tune Linux kernel parameters to enhance system efficiency and reliability
Oversee deployment and management of thousands of servers in owned data centers
Lead incident response and root cause analysis for production issues
Collaborate on capacity planning and scalability forecasting
Implement monitoring and observability tooling for real-time insights
Drive automation of operational workflows and deployment pipelines
Enforce best practices in configuration management and infrastructure as code
Evaluate and integrate new technologies to improve system capabilities
Mentor engineers on performance optimization and systems design
Participate in architectural reviews and technical decision-making
Ensure fault tolerance and resilience in distributed components
Optimize data replication and consistency models across regions
Support security audits and compliance requirements for infrastructure
Maintain documentation for system architecture and operational procedures
Coordinate with cross-functional teams on integration and performance goals
Improve energy efficiency and resource utilization in data centers
Troubleshoot network, storage, and compute bottlenecks
Evaluate hardware procurement and server lifecycle management
Contribute to disaster recovery planning and execution
Drive initiatives to reduce technical debt in core systems

Compensation

Competitive salary and equity package commensurate with experience and impact.

Work Arrangement

Hybrid work model with flexibility to work remotely or from company offices.

Team

The team tackles core infrastructure challenges involving extreme scale, low latency, and high availability, operating across multiple data centers with full ownership of hardware and software layers.

Team

Responsible for some of the most technically challenging work including handling ~2 Million Req/sec in sub-millisecond latency, managing ~Petabytes of data, managing distributed systems across multiple data centers, optimizing JVM and Linux kernel, managing own global data center with thousands of servers.

Reports to

Co-founder & CTO

Visa sponsorship is available for qualified candidates requiring relocation.

Kayzen is hiring a LLM Platform Engineer/Lead (m/f/d)

About the Role

Responsibilities

Compensation

Work Arrangement

Team

Team

Reports to