Responsibilities
- Ensure systems and infrastructure's reliability, scalability, and performance.
- Monitor system availability.
- Implement automation for deployment and maintenance tasks.
- Proactively identify areas for optimization.
- Collaborate with the development team to establish and refine service-level objectives.
- Drive incident response and postmortem analysis to minimize service disruptions.
Requirements
- Excellent technical and non-technical communication skills
- Prior Experience as an SRE or related discipline responsible for maintaining high availability of a cloud based application, troubleshooting performance bottlenecks, configuring monitoring and alerting, and conducting incident response in a blameless environment
- A knack for reducing manual toil tasks with automation and systematic thinking
- Prior experience working with CI/CD tools and processes, pipelines-as-code (GitHub Actions, CircleCI)
- At least 5+ years of hands-on experience with Python or Golang
- A solid background in configuration management and infrastructure-as-code (Terraform)
- Solid experience in monitoring/observability systems (Grafana, Prometheus, etc.)
- Demonstrated knowledge with Container orchestration (Kubernetes/GKE)
- Experience managing Kubernetes platforms and resources, and using Kubernetes deployment tool and patterns (Helm, GitOps, Knative)
Nice to Have
- Experience in FedRAMP or similar secure environments
- Expertise working within highly controlled environments containing sensitive information.
- Experience designing and maintaining CI/CD pipelines using commercial solutions
- Experience working on and within GCP and/or AWS
Work Arrangement
Hybrid
Additional Information
- AppOmni is an equal-opportunity employer.
- Committed to providing reasonable accommodations to qualified individuals with disabilities and disabled veterans in job application procedures.
- Diversity is valued to foster innovation and growth.


