The Site Reliability Engineer will ensure the reliability, performance, security, and cost efficiency of a dual-platform infrastructure. This includes maintaining a custom multi-cloud PaaS and a large-scale AWS enterprise setup, with a balance of independent ownership and close collaboration within a small, remote-first team.
Responsibilities
- Maintain system stability through patching, performance tuning, incident response, and ongoing infrastructure health checks
- Lead long-term initiatives such as platform migrations, internal tool development, security enhancements, and monitoring improvements
- Help shape the evolution of the infrastructure to keep it modern, secure, and easy for developers to use
- Support internal technical teams by answering complex infrastructure and operations questions
- Collaborate with external development teams to assist with their daily technical challenges
Requirements
- Fluency in written and spoken English
- Strong experience with Docker, AWS, and cloud-native technologies
- Programming background, preferably in Python or TypeScript, though Go or Java experience is also acceptable
- Proficiency with configuration management and Infrastructure as Code tools, especially Ansible and AWS CDK
- Solid understanding of core systems including Linux, networking, TCP/IP, and load balancing
- Proactive and dependable work ethic with the ability to operate independently
- Comfort handling support responsibilities and communicating professionally with technical clients
Nice to Have
- Practical experience administering and tuning Linux systems
- Familiarity with Django or similar Python web frameworks
- Hands-on use of AWS CDK with TypeScript
- Operational experience with PostgreSQL, Redis, RabbitMQ, or Elasticsearch
- Experience with any of the technologies in our stack
Tech Stack
Docker, Django, Python, TypeScript, Ansible, AWS, EC2, S3, RDS, OpenSearch, Datadog, Redis, Elasticsearch, Nessus, AWS CDK, GitHub Actions, Cloudflare, DynamoDB, API Gateway, Gatsby, Storyblok, Lambda
Compensation
Not specified
Work Arrangement
Remote-first with team members across Europe
Team
Team of 18 people, remote-first, small and focused with minimal hierarchy
- Curiosity
- Ownership
- Clarity
- Collaborative problem-solving
- Engineer-led decision making
- Flexibility and adaptability
- Balancing quick fixes with deep refactors
Additional Information
- Fluency in written and spoken English is required
- The company runs two distinct development workflows: 2-week sprints with quarterly OKRs for Divio Cloud, and 3-week Scrum cycles for the Enterprise AWS project
- Support and on-call duties rotate weekly among team members
- The team emphasizes autonomy, responsibility, and open sharing of ideas
Not specified

