METRO/MAKRO is looking for a Site Reliability Engineer to join the M.Store team. You will be responsible for building and maintaining scalable, resilient systems for METRO's digital products, ensuring the stability and reliability of our cloud-native applications.
What You'll Do
- Ensure the stability and reliability of cloud-native applications deployed on GCP, containerized with Docker and orchestrated via Kubernetes.
- Define, implement, and monitor SLOs, SLAs, and SLIs to measure system performance and user experience.
- Automate infrastructure provisioning using Terraform and manage Kubernetes configurations with Kustomize and Helm.
- Develop and maintain monitoring and alerting systems using Datadog and GCP-native tools.
- Conduct incident analysis and postmortems to drive continuous improvement.
- Collaborate with development teams to integrate reliability practices into CI/CD pipelines using GitHub Actions.
- Manage and troubleshoot database systems, particularly PostgreSQL and Cassandra.
- Apply networking knowledge and Linux system administration skills to troubleshoot and optimize system connectivity and performance.
What We're Looking For
- Educational background in Computer Science, Software Engineering, or equivalent practical experience.
- 5+ years of experience in Site Reliability Engineering.
- Proven experience designing and operating elastic, resilient systems in cloud environments.
- Strong understanding of GCP, Kubernetes, and container orchestration.
- Proficiency in infrastructure as code and configuration management tools (Terraform, Helm, Kustomize).
- Experience with monitoring and observability tools (Datadog, GCP Monitoring).
- Solid scripting skills in bash and familiarity with automation frameworks.
- Experience with CI/CD pipelines, especially using GitHub Actions.
- Familiarity with networking fundamentals and troubleshooting.
- Strong coding skills and ability to develop reliability-focused tooling.
- Strong problem-solving skills and a process-oriented mindset.
- Ability to work independently and collaboratively in a fast-paced environment.
- Passion for clean code, automation, and continuous improvement.
- Experience working within Agile/Scrum development teams.
- Very good fluency in English (written and spoken).
Nice to Have
- Availability for oncall.
Technical Stack
- Google Cloud Platform (GCP)
- Docker, Kubernetes, Terraform, Kustomize, Helm
- Datadog, GitHub Actions
- PostgreSQL, Cassandra
- Linux
Benefits & Compensation
- Flexible and remote work: create your own schedule.
- People development: individual and company-wide programs and trainings focused on development, leadership, appreciation.
- Support with individual solutions at every stage of life.
Work Mode
This role is hybrid, based in our offices in Brasov or Cluj.
METRO/MAKRO is an equal opportunity employer.





