The Site Reliability Engineer ensures system reliability and scalability by collaborating with development teams, implementing automation, and maintaining resilient infrastructure through proactive monitoring and incident response.
Responsibilities
- Work closely with development teams to design and implement robust, scalable system architectures
- Define and improve service level objectives to ensure high reliability and performance
- Write and maintain code in Python and Go to support system operations and automation
- Proactively test system resilience by simulating failures and implementing recovery procedures
- Diagnose and resolve application issues using metrics, logs, and distributed tracing
- Participate in on-call rotations, providing timely incident response and support
- Drive improvements in engineering practices across teams to enhance system reliability
- Be available for occasional night shifts as part of on-call responsibilities
Requirements
- Minimum of 3 years of experience in a Site Reliability Engineering or similar role
- Hands-on experience with Kubernetes, Helm, and cloud infrastructure platforms
- Proficiency in writing code using Python or Go
- Solid understanding of application failure modes and incident response
- Experience debugging systems using monitoring metrics and performance data
- Ability to implement and work with distributed tracing mechanisms
- Willingness to participate in on-call rotations and work flexible hours
- Strong collaboration and communication skills in a team environment
Nice to Have
- Familiarity with Google Cloud Platform (GCP)
- Alignment with core values: trust through open communication, strong ownership, and a mindset of continuous improvement
Tech Stack
Kubernetes, Helm, Cloud providers, Python, Go, GCP
Benefits
- Flexible working hours to support work-life balance
- Unlimited paid time off
- Flexible benefit for personal hobbies and interests
- Employee Support Program for personal and professional well-being
- Financial assistance in case of family member loss
- Participation in Employee Resource Groups
- Access to training programs, courses, and industry conferences
- Occasional corporate events and teambuilding activities
- Meals, snacks, and beverages provided at the office
- Regular teambuilding opportunities
Work Arrangement
global — Flexible working hours
Team
Over 1,700 people worldwide contribute to product development. The SRE team works with cross-functional groups to proactively identify and resolve infrastructure and application weaknesses.
- Trust through open communication and authenticity
- Strong sense of ownership in all responsibilities
- Enthusiasm for continuous improvement and change
Additional Information
- Possible night shifts are required as part of on-call duties
- Candidates must be willing to be on call and work flexible hours


