About the Role
We are looking for a Site Reliability Engineer to maintain and improve the reliability, scalability, and performance of our systems. You will work closely with engineering teams to automate infrastructure, monitor services, and resolve incidents efficiently.
Responsibilities
- Monitor system performance and respond to alerts promptly
- Design and implement automated deployment pipelines
- Troubleshoot production issues across distributed systems
- Maintain high availability and low latency for critical services
- Collaborate with developers to improve code deployability
- Optimize infrastructure for cost and performance
- Develop scripts and tools to enhance operational efficiency
- Participate in incident response and post-mortem analysis
- Ensure systems meet reliability and uptime targets
- Enforce security and compliance standards in infrastructure
- Manage configuration and version control for production systems
- Scale services to meet growing demand
- Document system architecture and operational procedures
- Improve monitoring and alerting coverage
- Conduct root cause analysis for recurring issues
- Support disaster recovery planning and testing
- Evaluate new technologies for operational improvements
- Promote best practices in system design and operations
- Work with telemetry data to identify performance bottlenecks
- Contribute to capacity planning initiatives
Nice to Have
- Experience with large-scale production systems
- Background in software engineering or systems programming
- Knowledge of service mesh technologies
- Contributions to open-source infrastructure projects
- Certifications in cloud or DevOps platforms
- Experience with high-frequency trading systems
- Familiarity with financial data infrastructure
- Prior work in remote-first organizations
Compensation
Competitive salary based on experience and location
Work Arrangement
Full remote within the European Union
Team
Small, autonomous teams with high ownership and fast decision-making
Tech Stack
We use Kubernetes for orchestration, Prometheus and Grafana for monitoring, Terraform for infrastructure as code, and GitLab CI for pipelines. Our services run on GCP and are built using Go and Python.
Culture
We value transparency, ownership, and continuous learning. Engineers are empowered to make decisions and drive improvements without bureaucracy.