Zapier is looking for a Site Reliability Engineer to join our Cloud team. You will focus on strengthening the reliability and scalability of our platform in AWS, designing and operating cloud systems, improving observability, and building resilient services for high-traffic workloads.
What You'll Do
- Design and manage AWS infrastructure with Terraform and Helm that scales seamlessly and minimizes downtime.
- Govern EKS clusters and Lambda functions to be cost-efficient and reliable under high traffic.
- Instrument systems with effective monitoring, logging, and alerting for faster issue detection and resolution.
- Automate key workflows to reduce manual toil and improve developer productivity.
- Partner with engineering teams to solve infrastructure challenges and make services more resilient and scalable.
- Execute critical migrations and integrations between systems smoothly with minimal disruption.
- Apply site reliability engineering practices to improve uptime and performance.
- Create documentation and communicate to help the team understand systems and best practices.
- Identify, test, and recommend new tools or approaches to strengthen infrastructure.
- Explore and apply AI tools to optimize workflows and speed up development and operations.
- Contribute to business-hours on-call support, handling incidents calmly and learning from them.
What We're Looking For
- At least 4 years of experience in cloud engineering, systems administration, or a related field.
- Experience with cloud platforms such as AWS, GCP, or Azure.
- Proficiency in at least one programming language such as Python, Go, or similar.
- Experience with automation tools.
- Ability to solve complex systems challenges and improve performance and reliability.
- Effective communication skills for documenting processes and sharing knowledge.
- Alignment with Zapier’s values and ability to thrive in a collaborative, remote work environment.
- AI fluency: used AI tooling for work or personal use, or willingness to learn fast and use it regularly.
Technical Stack
- Infrastructure & Cloud: AWS, Kubernetes, Terraform
- Data & Messaging: Redis, Kafka, Opensearch
- Observability: Grafana, Datadog, Prometheus, Sentry
- Languages: Python, Go, TypeScript
- Platform: GitLab, ArgoCD
Team & Environment
You will be part of the Cloud team, which owns AWS governance, cloud spending, and user experience.
Work Mode
This is a remote position open to candidates located in Europe.
Zapier is an equal-opportunity employer and we're excited to work with talented and empathetic people of all identities.



