Responsibilities
- Ensure high standards of operational performance for designated systems and services within your value stream
- Take part in on-call duties, addressing incidents and guiding them to resolution
- Lead incident postmortems to determine root causes and apply preventive actions
- Create and manage infrastructure as code using Terraform across multiple AWS environments
- Operate and enhance EKS clusters with container orchestration best practices
- Design and deploy monitoring, alerting, and observability systems using Datadog and Splunk
- Develop automated tools and scripts to minimize manual work and increase efficiency
- Work with development teams on deployment approaches, applying progressive delivery methods
- Enhance and maintain CI/CD pipelines using GitLab CI and Jenkins
- Support capacity planning and performance tuning efforts
- Guide junior SREs with operational best practices and professional growth
- Maintain documentation for runbooks, architectural decisions, and system behaviors
Benefits
- Flexible work environment, fluid career paths, internal mobility, purpose, well-being, work-life balance, inclusive environment, volunteering opportunities
Work Arrangement
Hybrid


