This role is central to advancing the platform's operational model toward automation-driven site reliability engineering. You will improve system resilience, scalability, and efficiency by working closely with engineering and product teams to integrate reliability into every phase of the application lifecycle and optimize cloud infrastructure performance.

Responsibilities

Enhance system visibility by improving monitoring and observability with tools such as Prometheus, Grafana, and the ELK Stack, ensuring insights support business goals.
Automate infrastructure provisioning and operational workflows using Terraform and Ansible to increase efficiency and reduce manual intervention.
Optimize CI/CD pipelines to enable rapid, consistent, and scalable software releases.
Operate and scale cloud environments across AWS, GCP, and Azure while reducing technical debt and simplifying operational overhead.
Lead incident response efforts and conduct post-mortem reviews to drive systemic improvements and share lessons across teams.
Partner with engineering and product groups to embed reliability practices into development processes and system design.
Advance technical excellence by evaluating and adopting new tools and methodologies that enhance system performance, reliability, and scalability.

Requirements

Demonstrated knowledge of DevOps, Cloud Operations, or Site Reliability Engineering principles with an emphasis on system reliability and scalability.
Proven experience managing Linux systems, including performance optimization, kernel-level configuration, and issue resolution.
Proficiency in programming languages such as Go or Python, particularly for developing automation tools and system utilities.
Strong scripting abilities in Python, Bash, or Go to automate infrastructure tasks and operational workflows.
Hands-on experience operating and managing cloud platforms including AWS, GCP, and Azure.
Skilled in implementing and maintaining monitoring, logging, and CI/CD systems.
Solid problem-solving capabilities, experience with system architecture, and a collaborative approach to cross-team projects.

Nice to Have

Experience working with container technologies such as Docker and Kubernetes to deploy and manage scalable, distributed applications.

Tech Stack

Prometheus, Grafana, ELK Stack, Terraform, Ansible, AWS, GCP, Azure, Docker, Kubernetes, Go, Python, Bash

Benefits

Flexible paid time off policy
Comprehensive healthcare benefits available in the UK, France, and Spain
Eligibility for company stock options
Annual professional development allowance
Budget for office equipment
Wellness stipend
Annual in-person team events
Reimbursement for internet expenses
Inclusive parental leave policy
Opportunity to participate in remote work travel program
Work for a certified B Corporation with a mission-driven product
Recognized as an award-winning remote workplace
Culture that encourages input and values diverse perspectives
Global, diverse team with representation across many countries

Work Arrangement

Fully remote position available in France, Germany, Spain, or the United Kingdom, with flexible hours and support for global collaboration.

Team

Global team operating across more than 30 countries; structured as a remote, multicultural, and distributed organization focused on collaboration and innovation; reports to the Director of Site Reliability Engineering.

We make a positive impact.
We aim for the stars.
We care for each other.
Curious spirit and thirst for knowledge
Eagerness for innovative ideas and cultures
Open, welcoming, and inclusive environment
Committed to open source
Remote-first and globally distributed

Additional Information

Remote work is a core component of the company's operational model.
Applicants must be legally authorized to work in France, Germany, Spain, or the United Kingdom.
Background checks are mandatory for all hires.
The hiring process consists of three interviews via Google Meet: a 45-minute call with Talent Acquisition, a 60-minute discussion with the Hiring Manager (Director, SRE), and a 60-minute team interview.
The company holds B Corp certification, reflecting its commitment to social and environmental responsibility.
Diversity, equity, and inclusion are prioritized, with accommodations available upon request during the hiring process.

Visa sponsorship is not available for this role.

Upsun (formerly Platform.sh) is hiring a Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Work Arrangement

Team

Additional Information

Similar Jobs

Senior Site Reliability Engineer

Implementation Engineer

Cloud Systems Engineer

DevOPS Engineer

Senior Site Reliability Engineer - Ireland

Senior/Lead Cloud Automation Developer

Related Articles

Platform Engineering: Kubernetes for All

Network Configuration as Code: CI/CD for Automation | NVIDIA

Become an AI Developer: Your Career Guide