Spain Remote (Global)

Semrush Inc. is hiring a Site Reliability Engineer

The Site Reliability Engineer ensures system reliability and scalability by collaborating with development teams, implementing automation, and maintaining resilient infrastructure through proactive monitoring and incident response.

Responsibilities

  • Work closely with development teams to design and implement robust, scalable system architectures
  • Define and improve service level objectives to ensure high reliability and performance
  • Write and maintain code in Python and Go to support system operations and automation
  • Proactively test system resilience by simulating failures and implementing recovery procedures
  • Diagnose and resolve application issues using metrics, logs, and distributed tracing
  • Participate in on-call rotations, providing timely incident response and support
  • Drive improvements in engineering practices across teams to enhance system reliability
  • Be available for occasional night shifts as part of on-call responsibilities

Requirements

  • Minimum of 3 years of experience in a Site Reliability Engineering or similar role
  • Hands-on experience with Kubernetes, Helm, and cloud infrastructure platforms
  • Proficiency in writing code using Python or Go
  • Solid understanding of application failure modes and incident response
  • Experience debugging systems using monitoring metrics and performance data
  • Ability to implement and work with distributed tracing mechanisms
  • Willingness to participate in on-call rotations and work flexible hours
  • Strong collaboration and communication skills in a team environment

Nice to Have

  • Familiarity with Google Cloud Platform (GCP)
  • Alignment with core values: trust through open communication, strong ownership, and a mindset of continuous improvement

Tech Stack

Kubernetes, Helm, Cloud providers, Python, Go, GCP

Benefits

  • Flexible working hours to support work-life balance
  • Unlimited paid time off
  • Flexible benefit for personal hobbies and interests
  • Employee Support Program for personal and professional well-being
  • Financial assistance in case of family member loss
  • Participation in Employee Resource Groups
  • Access to training programs, courses, and industry conferences
  • Occasional corporate events and teambuilding activities
  • Meals, snacks, and beverages provided at the office
  • Regular teambuilding opportunities

Work Arrangement

global — Flexible working hours

Team

Over 1,700 people worldwide contribute to product development. The SRE team works with cross-functional groups to proactively identify and resolve infrastructure and application weaknesses.

  • Trust through open communication and authenticity
  • Strong sense of ownership in all responsibilities
  • Enthusiasm for continuous improvement and change

Additional Information

  • Possible night shifts are required as part of on-call duties
  • Candidates must be willing to be on call and work flexible hours
Required Skills
KubernetesHelmPythonGoGCP
About company
Semrush Inc.
Semrush is a leading online visibility management SaaS platform that enables businesses globally to run search engine optimization, pay-per-click, content, social media and competitive research campaigns and get measurable results from online marketing.
All jobs at Semrush Inc. Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 3 months ago