About the Role
The role involves ensuring system stability, automating operations, and enhancing observability across distributed services that process large-scale data.
Responsibilities
- Design and maintain scalable infrastructure for data processing systems
- Implement automated solutions to reduce manual operational tasks
- Monitor system performance and respond to incidents efficiently
- Collaborate with engineering teams to improve service resilience
- Develop and enforce best practices for deployment and configuration management
- Troubleshoot complex production issues across distributed environments
- Optimize system reliability and reduce error rates through proactive measures
- Support incident response and lead post-mortem analyses
- Maintain comprehensive documentation for systems and procedures
- Evaluate and integrate new tools for monitoring and observability
- Ensure infrastructure meets security and compliance standards
- Participate in on-call rotations with support for rapid resolution
- Drive improvements in system uptime and mean time to recovery
- Work closely with developers to refine service-level objectives
- Contribute to capacity planning and performance testing
- Implement infrastructure as code using modern tooling
- Enhance alerting systems to reduce noise and improve response
- Manage and scale containerized workloads and orchestration platforms
- Support continuous integration and delivery pipelines
- Promote a culture of reliability across engineering teams
Nice to Have
- Master’s degree in a technical field
- Experience with high-throughput data pipelines
- Contributions to open-source infrastructure projects
- Certifications in cloud or systems engineering
- Exposure to observability platforms like Datadog or New Relic
- Background in database administration or tuning
- Familiarity with service mesh technologies
- Experience in fast-growing startup environments
Compensation
Competitive salary and equity package
Work Arrangement
Remote-friendly with flexible hours
Team
Collaborative engineering team focused on data infrastructure and reliability
Why Join Us
- Opportunity to shape the reliability culture of a growing data platform
- Work with cutting-edge technologies at scale
- Impactful role with direct influence on product stability and performance
Benefits
- Health, dental, and vision insurance
- 401(k) plan with company match
- Generous paid time off and parental leave
- Professional development stipend
- Home office setup allowance
Available for qualified candidates


