The Senior Site Reliability Engineer will join NinjaOne's Platform Engineering organization to help scale its cloud-based IT solutions for millions of end-users. This role focuses on automation, observability, and ensuring high availability and security of services through proactive infrastructure and application management.
What You'll Do
- Diagnose and resolve complex application and infrastructure issues
- Participate in 24x7 on-call rotation, SCRUM, and deployment planning
- Perform Root Cause Analysis (RCA) and provide recommendations for application teams
- Improve availability and reduce customer impact using industry best observability tools
- Ensure best-practice and security-minded architecture by influencing design decisions
- Create and maintain technical documentation and SOPs
- Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure
- Other duties as needed
What We're Looking For
- 10+ years’ experience in Site Reliability Engineer roles
- Expert+ level Linux administration, scripting, and troubleshooting
- Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog)
- Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc)
- Extensive experience with cloud automation and infrastructure-as-code (IaC) toolsets, primarily CloudFormation
- Good understanding of containers, Fargate, Kubernetes, and overall distributed microservice architectures
- Passionate about automation, security, and self-service environments/portals
- Hands-on experience with CI/CD and SDLC (Software Development Life Cycle) processes
- Effective communication skills, both verbal and written
Nice to Have
- Experience with Terraform, Helm, Ansible
- CDK a plus
Technical Stack
- Java
- Kotlin
- C++
- Postgres
- AWS
- VPC
- EC2
- ECS
- Route53
- Fargate
- ALB
- NLB
- Prometheus
- Grafana
- New Relic
- Splunk
- DataDog
- CloudFormation
- Terraform
- Helm
- Ansible
- CDK
- Kubernetes
- containers
Team & Environment
- SRE team in the Platform Engineering organization
Benefits & Compensation
- Grow personally and professionally with one of the fastest growing companies
- Access to our Corporate Benefits Platform (with discounts for brands such as Expedia, FitX, Zalando and many more)
- Develop your skills through our renowned training platform
- Receive competitive compensation
- Collaborate with a curious, kind, international and intercultural workforce
Work Mode
- Hybrid work model
- Available locations: UK, Germany, Berlin
- Fully remote with option to be hybrid in Berlin office
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, genetic information, marital status, veteran status, or any other status protected by applicable law.