The Senior Site Reliability Engineer is responsible for maintaining and improving the reliability, scalability, and performance of production platforms. This role includes active on-call participation, leading incident resolution, advancing automation, and working closely with engineering and infrastructure teams to strengthen system resilience and operational best practices.
Responsibilities
- Provide ongoing support for production systems, including participation in on-call rotations
- Collaborate with service and engineering teams to enhance system reliability, scalability, and performance
- Design and implement automation solutions to improve operational efficiency
- Lead complex technical changes and help shape platform operating strategies
- Mentor and guide other members of the SRE team to foster technical growth
- Identify and lead high-impact SRE initiatives that improve system stability
- Establish and maintain standards for production environments and operational practices
- Troubleshoot and resolve critical production incidents, ensuring long-term fixes are implemented
- Work with infrastructure teams to evolve and strengthen platform capabilities
- Drive improvements in observability, reliability, and automation practices across the platform
Requirements
- Minimum of 6 years of experience in site reliability engineering or similar operationally focused engineering roles
- Strong Linux administration skills required from day one
- Proven experience operating production-grade Kubernetes environments
- Demonstrated ability to diagnose and resolve issues in distributed systems
- Proficient in scripting languages such as Python or Bash for automation tasks
- Solid experience with Git, CI/CD pipelines, and DevOps methodologies
Nice to Have
- Hands-on experience with public cloud platforms such as AWS, GCP, or Azure
- Familiarity with event streaming and integration technologies like Apache Pulsar
- Experience with stream-processing or batch-processing frameworks such as Apache Flink
- Knowledge of configuration management and infrastructure-as-code practices
- Understanding of observability and monitoring best practices in distributed systems
- Prior experience mentoring or coaching engineers
Tech Stack
Kubernetes, Linux, Python, Bash, Git, CI/CD, DevOps, AWS, GCP, Azure, Apache Pulsar, Apache Flink, configuration management, infrastructure as code, observability, monitoring
Benefits
- Work in an environment that values inspiration, connection, development, and recognition
- Celebrate achievements, both major milestones and smaller wins
- Encouragement of diverse perspectives and inclusive collaboration
- Opportunities for leadership at all levels, regardless of formal title
- Focus on achieving ambitious, meaningful goals
- Operate within a framework of strategy-led, values-based, and disciplined execution
Compensation
Competitive salary and comprehensive benefits package aligned with global standards
Work Arrangement
Global — team spans multiple regions with distributed collaboration
Team
Part of a global SRE team within the Infrastructure Engineering organization
- Winning Culture
- Commitment to customer success
- Diversity of thought and ideas
- Leadership regardless of title
- Achieving ambitious goals
- Celebrating wins – big and small
- Strategy-led
- Values-based
- Disciplined in execution
Additional Information
- Guided by principles of being strategy-led, values-based, and disciplined in execution
- Inclusive environment that welcomes individuality and unique perspectives
- Be cautious of fraudulent job offers; legitimate offers follow a thorough interview process and are communicated verbally first
- Official communications are sent exclusively from @anaplan.com email addresses
- Accommodations available for candidates and employees with disabilities upon request
Visa sponsorship may be available depending on role location and candidate eligibility


