Responsibilities
- Guide the technical vision for platform infrastructure, including architecture, tooling, and long-term strategic planning
- Establish and implement reliability practices across systems using service level objectives, indicators, error budgets, observability, and incident response frameworks
- Lead cross-functional infrastructure projects from concept to completion, ensuring alignment with organizational objectives and effective delegation
- Spearhead AI integration efforts across engineering to reduce operational burden, speed up development, enhance incident handling, and expand team capabilities
- Develop AI-driven automation for platform operations, including smart alerting, incident classification, self-correcting systems, and AI-supported runbooks to minimize manual work
- Support infrastructure capacity planning and drive improvements in cost efficiency
- Provide technical mentorship to senior engineers through code reviews, design critiques, and direct coaching
- Partner with security specialists to strengthen platform defenses, address threats, and meet compliance requirements
- Promote engineering excellence within the SRE team by advocating best practices, managing technical debt, and progressively raising quality standards
- Assist in recruiting, integrating new hires, and refining team processes to improve operational effectiveness
Work Arrangement
Remote (Worldwide)