France or Germany or Spain or United Kingdom Remote (Global) Employment

Upsun (formerly Platform.sh) is hiring a Site Reliability Engineer

About the Role

Platform.sh is seeking a Site Reliability Engineer to join our Upsun team. As a key addition, you will help transition from traditional Cloud Operations to an automation-driven SRE model. Your focus will be on improving infrastructure, automating operational tasks, and streamlining processes to enhance system reliability, scalability, and efficiency.

What You'll Do

  • Refine monitoring and observability using tools like Prometheus, Grafana, and ELK Stack to ensure system visibility aligns with business objectives.
  • Automate deployments and workflows by transitioning manual processes to automated solutions with IaC tools like Terraform and Ansible.
  • Optimize CI/CD pipelines to improve architecture for fast, reliable, and scalable releases.
  • Manage and scale cloud-based systems on platforms like AWS, GCP, and Azure while minimizing technical debt.
  • Support incident response and lead post-mortem analysis to ensure continuous improvement and knowledge sharing.
  • Collaborate with cross-functional engineering and product teams to integrate reliability practices into the development lifecycle.
  • Drive technical innovation by introducing new tools, technologies, and practices that improve system reliability, performance, and scalability.

What We're Looking For

  • A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
  • Advanced hands-on experience with Linux internals, including performance tuning, kernel configurations, and troubleshooting.
  • Proficiency in programming languages such as Go (preferred) or Python for building tools and automating processes.
  • Strong skills in scripting languages like Python, Bash, or Go to automate workflows and manage infrastructure.
  • Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
  • Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.

Nice to Have

  • Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications.

Technical Stack

  • Monitoring/Observability: Prometheus, Grafana, ELK Stack
  • Infrastructure as Code: Terraform, Ansible
  • Cloud Platforms: AWS, GCP, Azure
  • Languages: Go, Python, Bash
  • Containerization: Docker, Kubernetes

Team & Environment

You will report to the Director, Site Reliability Engineering.

Benefits & Compensation

  • Flexible PTO
  • Comprehensive healthcare coverage (UK, France, Spain)
  • Company stock options
  • Professional development budget
  • Office equipment budget
  • Wellness budget
  • Annual team gatherings
  • Internet reimbursement
  • Inclusive parental leave
  • Remote work travel program

Work Mode

This is a global remote position open to candidates in France, Germany, Spain, and the United Kingdom.

Platform.sh is an equal opportunity employer.

Required Skills
PrometheusGrafanaELK StackTerraformAnsibleAWSGCPAzureGoPythonLinuxDevOpsSRE
Want to work from Thailand?

Join a remote network built for tech talent

Iglu gives you real employment in Southeast Asia — visa, work permit, and projects included. Pick what you work on, earn performance-based pay, and live where you want.

Legal employment in Thailand & Vietnam
Choose your own projects
Performance-based revenue sharing
Relocation support available
Join Iglu
200+ professionals worldwide
About company
Upsun (formerly Platform.sh)

Upsun is the cloud application platform built for hybrid teams where AI agents write and test code and humans focus on solving problems. Developers, DevOps engineers, and platform teams use Upsun to build, ship, and scale confidently without wrestling with backend infrastructure.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago