Responsibilities
- Guide a team of approximately ten SRE engineers spanning several functional areas, collaborating closely with technical leads to define strategic direction.
- Advance the team’s transition from manual processes to automated, scalable infrastructure by identifying operational toil, setting limits, and driving engineering initiatives that reduce it.
- Remain actively involved in technical work, including coding, architectural reviews, incident leadership, and key technical decisions.
- Establish consistent coaching and feedback practices to support engineer growth, particularly in incident management, on-call behavior, and solving systemic issues.
- Improve on-call procedures and incident response effectiveness, ensuring blameless postmortems lead to concrete engineering actions.
- Collaborate with a peer SRE engineering manager across time zones to align on organizational practices, hiring strategies, and operational improvements.
- Support team growth by managing hiring, performance leveling, and career progression for engineers in your region.
- Manage capacity planning, prioritize engineering work across functional areas, and advocate for SRE priorities in cross-organizational engineering discussions.
Work Arrangement
On-site
Team
Team of 20 engineers organized into three functional areas: bare-metal and day-0/day-2 operations, inference platform, and virtual clusters platform.
Other
Relocation assistance is provided.