This role focuses on maintaining and improving the reliability, scalability, and performance of production platforms through automation, incident resolution, and collaboration with engineering teams. The engineer will lead complex system changes, mentor peers, and advance observability and operational best practices across distributed systems.

Responsibilities

Support production systems with active participation in on-call rotations
Collaborate with service teams to enhance system reliability, performance, and scalability
Design and implement automation solutions to improve operational efficiency
Work cross-functionally with engineering teams to drive platform improvements
Lead critical system changes and help shape operational strategy
Mentor junior and peer SRE team members to strengthen team capabilities
Troubleshoot and resolve production incidents, implementing long-term fixes
Advance automation, monitoring, and observability practices across the platform
Partner with infrastructure teams to evolve and secure the technical platform
Champion and deliver high-impact projects initiated within the SRE function

Requirements

Minimum of six years of experience in site reliability engineering or operations-focused engineering roles
Strong Linux administration skills required from day one
Proven experience managing production Kubernetes environments
Ability to diagnose and resolve issues in complex, distributed systems
Proficiency in automation scripting using languages such as Python or Bash
Familiarity with Git, CI/CD pipelines, and DevOps methodologies

Nice to Have

Direct experience with public cloud platforms such as AWS, GCP, or Azure
Working knowledge of event streaming and integration technologies like Apache Pulsar
Experience with stream and batch processing frameworks including Apache Flink
Background in configuration management and infrastructure as code practices
Understanding of monitoring and observability best practices
Prior experience coaching or mentoring engineers

Tech Stack

Linux, Kubernetes, Python, Bash, Git, CI/CD, DevOps, AWS, GCP, Azure, Apache Pulsar, Apache Flink, Infrastructure as Code, Configuration Management, Observability Tools, Monitoring Systems

Benefits

Supportive environment that values employee growth and development
Recognition of achievements at all levels
Culture that promotes inclusion and diverse perspectives
Focus on achieving challenging objectives
Collaborative and connected work atmosphere

Work Arrangement

global

Team

Part of the global SRE team within the Infrastructure Engineering department

Winning Culture
Commitment to customers’ success
Diversity of thought and ideas
Leadership regardless of title
Achieving ambitious goals
Celebrating wins – big and small
Strategy-led
Values-based
Disciplined in execution
Inclusive and innovative environment

Additional Information

Operates under principles of being strategy-led, values-based, and disciplined in execution
Diversity, Equity, Inclusion and Belonging (DEIB) is a core organizational commitment
Reasonable accommodations are available for individuals with disabilities
Job offers are communicated first verbally, followed by official written communication from @anaplan.com
All official company emails originate from the @anaplan.com domain

Anaplan is hiring a Senior Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Work Arrangement

Team

Additional Information

Similar Jobs

Senior Software Engineer [REMOTE]

Technical Operations Engineer (Remote, GBR)

Senior Engineer - Cloud Platforms

DevOPS Engineer

Senior DevOps Engineer

Founding Support Engineer

Related Articles

Platform Engineering: Kubernetes for All

Network Configuration as Code: CI/CD for Automation | NVIDIA

Remote SRE Jobs: Vanguard’s Cloud Transformation