France Remote (Country)

Upsun (formerly Platform.sh) is hiring a Site Reliability Engineer

This role is central to advancing the platform's operational model toward automation-driven site reliability engineering. You will improve system resilience, scalability, and efficiency by working closely with engineering and product teams to integrate reliability into every phase of the application lifecycle and optimize cloud infrastructure performance.

Responsibilities

  • Enhance system visibility by improving monitoring and observability with tools such as Prometheus, Grafana, and the ELK Stack, ensuring insights support business goals.
  • Automate infrastructure provisioning and operational workflows using Terraform and Ansible to increase efficiency and reduce manual intervention.
  • Optimize CI/CD pipelines to enable rapid, consistent, and scalable software releases.
  • Operate and scale cloud environments across AWS, GCP, and Azure while reducing technical debt and simplifying operational overhead.
  • Lead incident response efforts and conduct post-mortem reviews to drive systemic improvements and share lessons across teams.
  • Partner with engineering and product groups to embed reliability practices into development processes and system design.
  • Advance technical excellence by evaluating and adopting new tools and methodologies that enhance system performance, reliability, and scalability.

Requirements

  • Demonstrated knowledge of DevOps, Cloud Operations, or Site Reliability Engineering principles with an emphasis on system reliability and scalability.
  • Proven experience managing Linux systems, including performance optimization, kernel-level configuration, and issue resolution.
  • Proficiency in programming languages such as Go or Python, particularly for developing automation tools and system utilities.
  • Strong scripting abilities in Python, Bash, or Go to automate infrastructure tasks and operational workflows.
  • Hands-on experience operating and managing cloud platforms including AWS, GCP, and Azure.
  • Skilled in implementing and maintaining monitoring, logging, and CI/CD systems.
  • Solid problem-solving capabilities, experience with system architecture, and a collaborative approach to cross-team projects.

Nice to Have

  • Experience working with container technologies such as Docker and Kubernetes to deploy and manage scalable, distributed applications.

Tech Stack

Prometheus, Grafana, ELK Stack, Terraform, Ansible, AWS, GCP, Azure, Docker, Kubernetes, Go, Python, Bash

Benefits

  • Flexible paid time off policy
  • Comprehensive healthcare benefits available in the UK, France, and Spain
  • Eligibility for company stock options
  • Annual professional development allowance
  • Budget for office equipment
  • Wellness stipend
  • Annual in-person team events
  • Reimbursement for internet expenses
  • Inclusive parental leave policy
  • Opportunity to participate in remote work travel program
  • Work for a certified B Corporation with a mission-driven product
  • Recognized as an award-winning remote workplace
  • Culture that encourages input and values diverse perspectives
  • Global, diverse team with representation across many countries

Work Arrangement

Fully remote position available in France, Germany, Spain, or the United Kingdom, with flexible hours and support for global collaboration.

Team

Global team operating across more than 30 countries; structured as a remote, multicultural, and distributed organization focused on collaboration and innovation; reports to the Director of Site Reliability Engineering.

  • We make a positive impact.
  • We aim for the stars.
  • We care for each other.
  • Curious spirit and thirst for knowledge
  • Eagerness for innovative ideas and cultures
  • Open, welcoming, and inclusive environment
  • Committed to open source
  • Remote-first and globally distributed

Additional Information

  • Remote work is a core component of the company's operational model.
  • Applicants must be legally authorized to work in France, Germany, Spain, or the United Kingdom.
  • Background checks are mandatory for all hires.
  • The hiring process consists of three interviews via Google Meet: a 45-minute call with Talent Acquisition, a 60-minute discussion with the Hiring Manager (Director, SRE), and a 60-minute team interview.
  • The company holds B Corp certification, reflecting its commitment to social and environmental responsibility.
  • Diversity, equity, and inclusion are prioritized, with accommodations available upon request during the hiring process.

Visa sponsorship is not available for this role.

Required Skills
PrometheusGrafanaELK StackTerraformAnsibleAWSGCPAzureDockerKubernetesGoPythonBash PrometheusGrafanaELK StackTerraformAnsibleAWSGCPAzureDockerKubernetesGoPythonBash
About company
Upsun (formerly Platform.sh)
Upsun is the cloud application platform built for hybrid teams where AI agents write and test code and humans focus on solving problems. Developers, DevOps engineers, and platform teams use Upsun to build, ship, and scale confidently without wrestling with backend infrastructure.
All jobs at Upsun (formerly Platform.sh) Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago