As a Senior Site Reliability Engineer, you will play a key role in maintaining and enhancing the performance, availability, and scalability of our core in-memory database (IMF) and associated services. Hosted on Google Cloud Platform and orchestrated through Kubernetes, these systems are central to our analytics infrastructure.
Key Responsibilities
- Configure and manage Kubernetes components to ensure resilient, high-performing systems.
- Respond to incidents, conduct root cause analysis, and implement preventive measures.
- Participate in a rotating on-call schedule with one-week shifts to support 24/7 system reliability.
- Develop automation tools and scripts in Python and Go to streamline operations and reduce manual intervention.
- Monitor system health using tools like VictoriaMetrics and Grafana, proactively identifying and resolving issues.
- Plan capacity and ensure resource availability during high-demand periods.
- Maintain robust logging and monitoring setups to enable fast detection and diagnosis of problems.
- Ensure reliable backup strategies and efficient recovery processes for critical databases.
- Collaborate with software engineers and product managers to deliver scalable solutions on time.
- Partner with L2 support teams to improve operational workflows and issue resolution.
Required Expertise
- Proven experience in DevOps or Site Reliability Engineering roles.
- Strong understanding of DevOps practices and principles.
- Hands-on experience with Google Cloud Platform and Kubernetes (GKE).
- Solid background in building and managing CI/CD pipelines, particularly with GitLab.
- Proficiency in scripting languages such as Python, Go, or Shell for automation tasks.
- Experience with monitoring solutions including VictoriaMetrics, Grafana, InfluxDB, and Chronograf.
- Familiarity with logging systems and distributed tracing tools.
- Track record in incident management and post-mortem analysis.
- Excellent problem-solving abilities and attention to detail.
- Strong communication skills, with experience working in distributed, remote teams.
- Ability to operate independently and prioritize tasks effectively in a fast-moving environment.
Technology Environment
You'll work with a modern stack including IMF (a C++-based in-memory database), Apache Kafka, MongoDB, Kubernetes on GCP, gRPC, Python, Go, GitLab, VictoriaMetrics, Grafana, InfluxDB, Chronograf, and Sentry.
Work Model
This role supports remote work within the Central European Time Zone. While the team operates remotely, there are opportunities to meet in person in Brno, Prague (Czechia), or Bratislava (Slovakia).
Company Culture
We foster a creative, collaborative, and customer-focused environment where technical teams are empowered to innovate. Our culture values strategic thinking, adaptability, and continuous improvement, all within a fast-paced, AI-driven landscape.


