This role focuses on maintaining and improving the reliability, scalability, and performance of production platforms through automation, incident resolution, and collaboration with engineering teams. The engineer will lead complex system changes, mentor peers, and advance observability and operational best practices across distributed systems.
Responsibilities
- Support production systems with active participation in on-call rotations
- Collaborate with service teams to enhance system reliability, performance, and scalability
- Design and implement automation solutions to improve operational efficiency
- Work cross-functionally with engineering teams to drive platform improvements
- Lead critical system changes and help shape operational strategy
- Mentor junior and peer SRE team members to strengthen team capabilities
- Troubleshoot and resolve production incidents, implementing long-term fixes
- Advance automation, monitoring, and observability practices across the platform
- Partner with infrastructure teams to evolve and secure the technical platform
- Champion and deliver high-impact projects initiated within the SRE function
Requirements
- Minimum of six years of experience in site reliability engineering or operations-focused engineering roles
- Strong Linux administration skills required from day one
- Proven experience managing production Kubernetes environments
- Ability to diagnose and resolve issues in complex, distributed systems
- Proficiency in automation scripting using languages such as Python or Bash
- Familiarity with Git, CI/CD pipelines, and DevOps methodologies
Nice to Have
- Direct experience with public cloud platforms such as AWS, GCP, or Azure
- Working knowledge of event streaming and integration technologies like Apache Pulsar
- Experience with stream and batch processing frameworks including Apache Flink
- Background in configuration management and infrastructure as code practices
- Understanding of monitoring and observability best practices
- Prior experience coaching or mentoring engineers
Tech Stack
Linux, Kubernetes, Python, Bash, Git, CI/CD, DevOps, AWS, GCP, Azure, Apache Pulsar, Apache Flink, Infrastructure as Code, Configuration Management, Observability Tools, Monitoring Systems
Benefits
- Supportive environment that values employee growth and development
- Recognition of achievements at all levels
- Culture that promotes inclusion and diverse perspectives
- Focus on achieving challenging objectives
- Collaborative and connected work atmosphere
Work Arrangement
global
Team
Part of the global SRE team within the Infrastructure Engineering department
- Winning Culture
- Commitment to customers’ success
- Diversity of thought and ideas
- Leadership regardless of title
- Achieving ambitious goals
- Celebrating wins – big and small
- Strategy-led
- Values-based
- Disciplined in execution
- Inclusive and innovative environment
Additional Information
- Operates under principles of being strategy-led, values-based, and disciplined in execution
- Diversity, Equity, Inclusion and Belonging (DEIB) is a core organizational commitment
- Reasonable accommodations are available for individuals with disabilities
- Job offers are communicated first verbally, followed by official written communication from @anaplan.com
- All official company emails originate from the @anaplan.com domain


