Anaplan is looking for a Site Reliability Engineer to join our team. You will play a critical role in maintaining the reliability, availability, and performance of the Anaplan platform, working to build and scale resilient systems.
What You'll Do
- Build, monitor, and maintain highly available and scalable production systems.
- Develop and implement automation for operational tasks to improve system efficiency and reliability.
- Participate in on-call rotations, troubleshoot complex issues, and lead incident response.
- Collaborate with software engineering teams to design, launch, and operate services.
- Define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Drive continuous improvement in system performance, capacity planning, and cost optimization.
What We're Looking For
- Proven experience in an SRE, DevOps, or similar role focused on large-scale distributed systems.
- Strong programming and scripting skills in languages like Python, Go, or Java.
- Deep experience with cloud infrastructure (AWS, Azure, or GCP) and container orchestration (Kubernetes, Docker).
- Expertise in monitoring and observability tools like Prometheus, Grafana, Datadog, or Splunk.
- Strong understanding of Linux/Unix systems, networking, and security fundamentals.
- Experience with infrastructure-as-code tools like Terraform, Ansible, or CloudFormation.
Nice to Have
- Experience managing data platforms or large-scale databases.
- Knowledge of CI/CD pipelines and GitOps practices.
- Experience in a SaaS or platform engineering environment.
We believe in a hiring and working environment where all people are respected and valued, regardless of gender identity or expression, sexual orientation, religion, ethnicity, age, neurodiversity, disability status, citizenship, or any other aspect which makes people unique.


