The Site Reliability Engineer ensures system reliability and scalability by collaborating with development teams, implementing automation, and maintaining resilient infrastructure through proactive monitoring and incident response.

Responsibilities

Work closely with development teams to design and implement robust, scalable system architectures
Define and improve service level objectives to ensure high reliability and performance
Write and maintain code in Python and Go to support system operations and automation
Proactively test system resilience by simulating failures and implementing recovery procedures
Diagnose and resolve application issues using metrics, logs, and distributed tracing
Participate in on-call rotations, providing timely incident response and support
Drive improvements in engineering practices across teams to enhance system reliability
Be available for occasional night shifts as part of on-call responsibilities

Requirements

Minimum of 3 years of experience in a Site Reliability Engineering or similar role
Hands-on experience with Kubernetes, Helm, and cloud infrastructure platforms
Proficiency in writing code using Python or Go
Solid understanding of application failure modes and incident response
Experience debugging systems using monitoring metrics and performance data
Ability to implement and work with distributed tracing mechanisms
Willingness to participate in on-call rotations and work flexible hours
Strong collaboration and communication skills in a team environment

Nice to Have

Familiarity with Google Cloud Platform (GCP)
Alignment with core values: trust through open communication, strong ownership, and a mindset of continuous improvement

Tech Stack

Kubernetes, Helm, Cloud providers, Python, Go, GCP

Benefits

Flexible working hours to support work-life balance
Unlimited paid time off
Flexible benefit for personal hobbies and interests
Employee Support Program for personal and professional well-being
Financial assistance in case of family member loss
Participation in Employee Resource Groups
Access to training programs, courses, and industry conferences
Occasional corporate events and teambuilding activities
Meals, snacks, and beverages provided at the office
Regular teambuilding opportunities

Work Arrangement

global — Flexible working hours

Team

Over 1,700 people worldwide contribute to product development. The SRE team works with cross-functional groups to proactively identify and resolve infrastructure and application weaknesses.

Trust through open communication and authenticity
Strong sense of ownership in all responsibilities
Enthusiasm for continuous improvement and change

Additional Information

Possible night shifts are required as part of on-call duties
Candidates must be willing to be on call and work flexible hours

Semrush Inc. is hiring a Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Work Arrangement

Team

Additional Information

Similar Jobs

Senior DevOps Engineer (hiring in US/CAN & LATAM)

Senior Software Engineer - Cloud

DevOps Engineer (Mid level)

Platform Engineer - Product Reliability (Mid Level)

Senior DevOps / Infrastructure Engineer

Senior Infrastructure Engineer

Related Articles

Become an AI Developer: Your Career Guide

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

Semrush Inc. is hiring a Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Work Arrangement

Team

Additional Information

Similar Jobs

Senior DevOps Engineer (hiring in US/CAN &amp; LATAM)

Senior Software Engineer - Cloud

DevOps Engineer (Mid level)

Platform Engineer - Product Reliability (Mid Level)

Senior DevOps / Infrastructure Engineer

Senior Infrastructure Engineer

Related Articles

Become an AI Developer: Your Career Guide

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

Senior DevOps Engineer (hiring in US/CAN & LATAM)