SEON Technologies is looking for a Senior Site Reliability Engineer to play a crucial role in our cloud infrastructure's reliability, scalability, and performance. You will work closely with cross-functional teams to ensure our systems are robust, scalable, and efficient.
What You'll Do
- Ensure the reliability, availability, and performance of our systems by implementing SRE best practices.
- Develop and maintain comprehensive monitoring and alerting systems using tools such as Prometheus, Grafana, and the ELK stack.
- Manage incident response and conduct root cause analysis for production issues.
- Conduct postmortems to learn from failures and drive continuous improvement.
- Continuously monitor and optimize cloud infrastructure for performance and cost-effectiveness.
- Automate routine tasks and processes to increase efficiency.
- Analyze current system capacity and plan for future growth.
- Define, measure, and monitor SLOs and SLIs to ensure services meet reliability targets.
- Work closely with engineering and product teams to provide feedback on new architectures.
- Develop and maintain comprehensive documentation for architecture and troubleshooting.
- Provide on-call support to ensure continuous availability.
- Ensure systems meet security and compliance requirements.
- Stay current with new technologies and evaluate their impact.
What We're Looking For
- 6+ years of experience as an SRE, DevOps, or in a similar engineering role.
- Strong hands-on experience with Kubernetes (AWS EKS preferred).
- Strong hands-on expertise in Terraform.
- Extensive experience working in a multi-region and multi-account AWS setup.
- Strong experience with monitoring and logging tools like Prometheus, Grafana, Elasticsearch, and Kibana.
- Strong experience deploying and troubleshooting scalable distributed components in a microservice architecture.
- Experience researching and improving customer-critical requests related to latency, availability, and performance.
- Ability to quickly troubleshoot complex infrastructure issues.
- Proficiency with incident management tools like PagerDuty or Opsgenie.
- Familiarity with CI pipelines and tools (Github Actions preferred).
- Experience working with GitOps practices and CD tools (ArgoCD preferred).
- A proactive approach to identifying and resolving issues independently.
- Excellent communication and collaboration skills.
Nice to Have
- AWS EKS experience.
- Github Actions experience.
- ArgoCD experience.
Technical Stack
- Kubernetes, AWS EKS, Terraform, AWS
- Prometheus, Grafana, ELK stack (Elasticsearch, Kibana)
- PagerDuty, Opsgenie
- Github Actions, ArgoCD
Work Mode
This is a hybrid position based in Budapest, Hungary.
SEON is an equal opportunity employer. We strive to embrace what makes each one of us unique; we each have our own story. Whether looking at our current staff or future team members, we believe that everyone has something to contribute, and our employment practices reflect that. We do not make an employment decision based upon race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws.



