About the Role
Role details below.
Responsibilities
- Coordinate and support daily activities for SREs on the team and partner with their managers to determine approach for managing daily tasks
- Track success on the team based on established goals and objectives
- Work on issues of limited scope with the ability to find and execute solutions to routine problems
- Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices
- Mentor team members and drive initiatives
- Drive a design for a feature while understanding system-wide and architectural concerns
- Understand the basic day-to-day tasks traits of a production environment and participate in on-call support
- Engage and collaborate with other disciplines within the design, deployment, operation and optimization of services
- Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems
- Identifies opportunities both in processes and tools to improve the overall productivity of the team
- Identify great talent and excite them to join our team
- Provide estimations, track progress and manage risk as well as team members' time
- Participate in an on-call shift along with other disciplines to respond to incidents
- Become involved in tech communities and add contributions to enhance them
- Lean into our business domain and needs as well as our company vision, mission and strategy to deliver on our short and long term goals
Requirements
- Experience in managing Site Reliability Engineering teams
- Ability to work on issues of limited scope with the ability to find and execute solutions to routine problems
- Experience mentoring team members and driving initiatives
- Understanding of system-wide and architectural concerns when driving feature design
- Understanding of the day-to-day operations of a production environment
- Willingness to participate in on-call support
- Experience engaging and collaborating across disciplines in design, deployment, operation, and optimization of services
- Ability to debug production issues across services and levels of the stack
- Experience practicing incident response and blameless postmortems
- Ability to identify opportunities in processes and tools to improve team productivity
- Ability to identify and attract top talent
- Experience providing estimations, tracking progress, managing risk, and managing team members' time
- Willingness to participate in on-call rotations with other disciplines
- Involvement in tech communities and a track record of contributions
Additional Information
- Participate in an on-call shift along with other disciplines to respond to incidents
- Become involved in tech communities and add contributions to enhance them
- Lean into our business domain and needs as well as our company vision, mission and strategy to deliver on our short and long term goals