Xebia is seeking a Senior Site Reliability Engineer - AWS & AI | EU to design, build, and maintain robust, scalable infrastructure for our platforms. You will apply SRE principles to ensure the reliability and performance of systems leveraging AWS and AI technologies.
What You'll Do
- Design, implement, and manage highly available, scalable infrastructure on AWS
- Build and maintain observability, monitoring, and alerting systems for AI-powered platforms
- Automate operational processes to improve efficiency and reduce toil
- Lead incident response, post-mortems, and implement preventive measures
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Collaborate with development teams to architect reliable, performant systems from inception
What We're Looking For
- Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role
- Deep, hands-on expertise with AWS services and infrastructure as code (e.g., Terraform, CloudFormation)
- Strong background in designing and implementing monitoring, logging, and tracing solutions
- Proficiency in scripting and automation using languages like Python, Go, or Shell
- Experience with containerization and orchestration (e.g., Docker, Kubernetes)
- Solid understanding of CI/CD pipelines and GitOps practices
- Excellent problem-solving and communication skills
Xebia is an equal opportunity employer.



