Tyk Technologies is looking for a Senior Site Reliability Engineer to join our APAC team. In this role, you will be responsible for optimizing, automating, and improving the performance of Tyk's global Cloud platform. You will lead critical reliability initiatives and ensure high availability and stellar performance for our growing customer base.
What You'll Do
- Lead hands-on maintenance and optimization of our global Cloud platform within defined SL(A/I/O)s.
- Collaborate to shape SRE strategy and translate it into actionable technical plans coordinated through SCRUM.
- Identify reliability issues, drive root cause analysis, and implement solutions.
- Lead performance tuning and fault finding through analysis of OS and application metrics.
- Design and implement automation for common operational tasks and cloud-operations workflows.
- Develop proactive alerting, a monitoring roadmap, and relevant dashboards; define and track KPIs.
- Participate in an on-call rotation, ensuring effective incident response and resolution within SLAs.
- Conduct blame-free postmortems, document findings, and maintain operational runbooks.
- Drive multi-region and multi-cloud platform expansion with a focus on scalability and automation.
- Optimize infrastructure performance and cost efficiency without impacting service delivery.
- Engage with commercial teams on growth plans and translate them into technical SRE strategies.
- Coordinate penetration testing through provider liaison, technical setup, and environment configuration.
- Champion continuous improvement across processes, communication, and team practices.
- Model excellence in software design and knowledge sharing.
- Plan and execute software upgrades to enhance cloud services.
What We're Looking For
- Proven experience in an SRE role.
- Strong knowledge of cloud technologies and SLA, SLO, SLI management.
- Excellent communication and leadership skills.
- Ability to analyze and improve operational processes and performance metrics.
- Experience in software design, automation, and root cause analysis.
- On-call support experience and a customer-focused mindset.
- A collaborative attitude with commercial and technical teams.
- Hands-on experience launching and operating production Kubernetes clusters.
- Expertise designing and operating infrastructure on AWS and other cloud providers.
- Experience operating MongoDB (or other document database) clusters.
- Experience operating Redis (or other key-value storage) clusters.
- Proficiency in administering Linux servers.
- Experience operating Prometheus and Grafana.
- Experience operating a logging collection and analysis system.
- Ability to participate in an on-call rotation covering 4:00am - 16:00pm UTC.
Technical Stack
- Kubernetes, Go, Python, AWS/EKS, Linux, Terraform, Helm
- MongoDB, Redis, Prometheus, Grafana, Thanos
Benefits & Compensation
- Unlimited paid holidays
- Total flexibility in working hours
- Employee share scheme
- Generous maternity and paternity leave
- Volunteering Days
- Employee Wellbeing platform
Work Mode
This is a global role with flexibility for remote work.
Tyk is an equal opportunities employer and we are determined to ensure that no applicant or employee receives less favourable treatment on the grounds of gender, age, disability, religion, belief, sexual orientation, marital status, or race, or is disadvantaged by conditions or requirements which cannot be shown to be justifiable.


