Implement and maintain robust automation for deploying and operating Kong's Managed Gateways across various cloud environments.
Monitor system health, performance, and uptime, striving for 99.99% availability for our core infrastructure.
Resolve complex production incidents efficiently, participating actively in on-call rotations to maintain service continuity.
Build resilient tools and systems that enhance the overall reliability and operational efficiency of our platform.
Contribute proactively to the prevention of technical debt, ensuring sustainable and scalable operations as Kong grows.
Collaborate closely with engineering teams to design, review, and implement resilient and highly scalable services.

2+ years of experience applying Site Reliability Engineering (SRE) principles and practices in a production environment.
Proficiency in at least one of Golang or Python for automation, tooling, and infrastructure as code.
Hands-on experience with Kubernetes and major cloud platforms such as AWS, GCP, or Azure.
Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, Datadog).
Solid understanding of networking concepts, distributed systems, and API gateways.

Experience with Kong Gateway or other API management platforms.
Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Kubernetes Administrator).
Active contributions to open-source projects or developer communities.

Kong is hiring a SRE 2, Managed Gateways