Site Reliability Engineer (SRE) at Cognigy (Expired)

Cognigy is looking for a Site Reliability Engineer (SRE) to join our Engineering department. You will be instrumental in ensuring the stability and scalability of our product by automating infrastructure, proactively monitoring systems, and mentoring fellow engineers. We need someone passionate about the full software lifecycle and cloud-native technologies.

What You'll Do

Automate the provisioning of Kubernetes clusters and deployment processes for Cognigy.AI.
Proactively monitor product installations and infrastructure to guarantee high availability and performance for our SaaS offerings.
Help engineering teams automate repetitive tasks to improve efficiency and reduce operational overhead.
Provide expertise and mentorship to fellow SRE engineers.
Engage in all phases of software engineering—from inception and coding to testing, deployment, and operations.
Enhance development team productivity by optimizing observability, scalability, availability, and reliability.
Lead postmortems after major incidents to drive continuous improvement.
Stay ahead of industry trends, continuously develop your skills, and share knowledge.
Seek and incorporate customer feedback into our development processes.

What We're Looking For

Several years of experience running containers in production.
Experience building Kubernetes clusters from the ground up.
Experience with managed Kubernetes services like AWS EKS, Azure AKS, or Google GKE.
Hands-on experience with major cloud platforms like AWS, Microsoft Azure, or GCP.
Understanding of networking concepts like VPCs, subnets, internet gateways, and web security best practices.
Multiple years of experience with CI/CD systems (e.g., Jenkins, GitLab).
Proficient with Infrastructure as Code tools like Terraform, Helm, and Flux.
Expertise in one or more programming languages such as Golang, Python, Ruby, JavaScript, Java, or Perl.
Familiarity with Docker, Kubernetes, Message brokers, NoSQL and SQL databases.
Experience with monitoring tools like Prometheus, Grafana, and the ELK Stack.
A proactive, solution-oriented mindset with full ownership of challenges.
Calm under pressure, able to make urgent decisions, and comfortable being on call.
Eager to collaborate in an international, dynamic, and highly motivated team.

Technical Stack

Orchestration: Kubernetes, AWS EKS, Azure AKS, Google GKE
Infrastructure as Code: Terraform, Helm, Flux
CI/CD: Jenkins, GitLab
Monitoring & Observability: Prometheus, Grafana, ELK Stack
Languages: Golang, Python, Ruby, JavaScript, Java, Perl
Containerization: Docker

Team & Environment

You will be part of the Engineering department, working with cross-functional development teams, Product Support, QA, SRE, Product, People, and Leadership teams. You will report to the Director/Architect and VP of Engineering.

Benefits & Compensation

Attractive and performance-oriented salary.
Company Pension Scheme.
25 days paid leave, plus 5 floating days, plus public holidays.
Flexible working options.
Colleague recognition, reward and celebration events.
Global Employee Assistance Program.
ClassPass membership.
Ongoing learning and development opportunities, including Udemy.
One paid ‘Giving Back Day' each year for volunteering.
Subscription to the Calm app for you plus five friends/family members.

Work Mode

This is a hybrid position located in Düsseldorf, Germany.

Cognigy does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.

Cognigy was looking for a Site Reliability Engineer (SRE)

What You'll Do

What We're Looking For

Technical Stack

Team & Environment

Benefits & Compensation

Work Mode

Similar Jobs

Senior Platform Engineer - Observability

Senior Site Reliability Engineer

Implementation Engineer

Cloud Systems Engineer

Software Engineer / DevOps

KTO - Platform Engineer - SRE - Lever

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026