Remote (Global) Full-time

Zapier is hiring a Site Reliability Engineer - Europe

About the Role

Zapier is looking for a Site Reliability Engineer to join our Cloud team. You will focus on strengthening the reliability and scalability of our platform in AWS, designing and operating cloud systems, improving observability, and building resilient services for high-traffic workloads.

What You'll Do

  • Design and manage AWS infrastructure with Terraform and Helm that scales seamlessly and minimizes downtime.
  • Govern EKS clusters and Lambda functions to be cost-efficient and reliable under high traffic.
  • Instrument systems with effective monitoring, logging, and alerting for faster issue detection and resolution.
  • Automate key workflows to reduce manual toil and improve developer productivity.
  • Partner with engineering teams to solve infrastructure challenges and make services more resilient and scalable.
  • Execute critical migrations and integrations between systems smoothly with minimal disruption.
  • Apply site reliability engineering practices to improve uptime and performance.
  • Create documentation and communicate to help the team understand systems and best practices.
  • Identify, test, and recommend new tools or approaches to strengthen infrastructure.
  • Explore and apply AI tools to optimize workflows and speed up development and operations.
  • Contribute to business-hours on-call support, handling incidents calmly and learning from them.

What We're Looking For

  • At least 4 years of experience in cloud engineering, systems administration, or a related field.
  • Experience with cloud platforms such as AWS, GCP, or Azure.
  • Proficiency in at least one programming language such as Python, Go, or similar.
  • Experience with automation tools.
  • Ability to solve complex systems challenges and improve performance and reliability.
  • Effective communication skills for documenting processes and sharing knowledge.
  • Alignment with Zapier’s values and ability to thrive in a collaborative, remote work environment.
  • AI fluency: used AI tooling for work or personal use, or willingness to learn fast and use it regularly.

Technical Stack

  • Infrastructure & Cloud: AWS, Kubernetes, Terraform
  • Data & Messaging: Redis, Kafka, Opensearch
  • Observability: Grafana, Datadog, Prometheus, Sentry
  • Languages: Python, Go, TypeScript
  • Platform: GitLab, ArgoCD

Team & Environment

You will be part of the Cloud team, which owns AWS governance, cloud spending, and user experience.

Work Mode

This is a remote position open to candidates located in Europe.

Zapier is an equal-opportunity employer and we're excited to work with talented and empathetic people of all identities.

Required Skills
AWSKubernetesTerraformPrometheusGrafanaDatadogSentryOpensearchRedisKafkaSite Reliability EngineeringIncident ManagementCapacity PlanningObservabilityAutomation
Freelancing without stability?

Get steady projects, keep your freedom

Iglu connects you with international clients and handles contracts, payments, and admin. You get consistent work and flexibility — no more chasing invoices or worrying about gaps.

Consistent client projects
Contract & payment management
Flexible work schedule
Revenue-sharing compensation
See open positions
Work from anywhere
About company
Zapier

Zapier is leading the way in AI automation, helping businesses increase productivity and serve their customers better through an extensive platform of integrations, robust workflow automations, and practical AI applications.

Visit website
Job Details
Category infrastructure
Posted 5 months ago