Site Reliability Engineer at 1016 Meijer Great Lakes LP (Expired)

About the Role

The role involves combining software engineering and systems operations to build and maintain reliable, scalable systems. The engineer will focus on automation, monitoring, incident response, and improving system performance.

Responsibilities

Design and maintain infrastructure to ensure high availability and performance
Implement automated deployment and configuration management systems
Monitor systems to detect and resolve issues proactively
Respond to incidents and lead resolution efforts
Conduct post-incident reviews to identify root causes and prevent recurrence
Optimize system reliability and operational efficiency
Collaborate with development teams to improve service resilience
Develop tools and scripts to streamline operations
Manage cloud infrastructure and services
Ensure systems meet security and compliance standards
Participate in on-call rotations for critical systems
Troubleshoot complex production issues
Improve monitoring and alerting systems
Support disaster recovery planning and testing
Drive adoption of best practices in reliability engineering
Work on capacity planning and performance tuning
Integrate reliability into the development lifecycle
Maintain documentation for systems and procedures
Evaluate new technologies for operational improvements
Promote a culture of continuous improvement and learning

Compensation

Competitive salary based on experience

Work Arrangement

Hybrid work model with on-site and remote options

Team

Collaborative engineering environment focused on reliability and scalability

Why Join Us

Opportunity to work on large-scale systems with real impact
Supportive team culture that values innovation and ownership

Technology Stack

Uses modern cloud infrastructure and automation tools
Leverages Kubernetes, Terraform, and observability platforms

Not available

1016 Meijer Great Lakes LP was looking for a Site Reliability Engineer