Empower Retirement is looking for a Site Reliability Engineer to join our team. You'll be a key contributor in ensuring the reliability, scalability, and performance of our financial services platform, which serves millions of customers. This role operates production systems while maintaining the high availability standards critical in fintech.
What You'll Do
- Own operational excellence for assigned systems and services within your value stream.
- Participate in on-call rotations, responding to incidents and driving them to resolution.
- Lead postmortem processes for incidents, identifying root causes and implementing preventative measures.
- Build and maintain infrastructure as code using Terraform across multiple AWS environments.
- Manage and optimize EKS clusters, implementing best practices for container orchestration.
- Design and implement monitoring, alerting, and observability solutions using Datadog and Splunk.
- Develop automation tools and scripts to reduce toil and improve operational efficiency.
- Collaborate with development teams on deployment strategies, implementing progressive delivery patterns.
- Maintain and improve CI/CD pipelines in GitLab CI and Jenkins.
- Contribute to capacity planning and performance optimization initiatives.
- Mentor entry-level SREs, providing guidance on operational best practices.
- Document runbooks, architecture decisions, and system behaviors.
What We're Looking For
- Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent practical experience).
- 2-4 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
- Production experience with Kubernetes, including deployment, troubleshooting, and optimization.
- Proficiency in Infrastructure as Code, particularly Terraform.
- Solid programming skills in Python, Go, or similar languages.
- Experience with observability platforms (Datadog, Splunk, or similar).
- Understanding of CI/CD principles and experience with GitLab CI, Jenkins, or equivalent.
- Knowledge of networking fundamentals and troubleshooting.
- Familiarity with GitOps workflows and practices.
- Experience participating in on-call rotations and incident management.
- Understanding of high-availability architecture patterns.
Nice to Have
- Experience in financial services or highly regulated industries.
- Familiarity with compliance frameworks (SOC 2, PCI DSS).
- Experience with service mesh technologies (Istio, Linkerd).
- AWS certifications (Solutions Architect Associate or higher).
- CKA (Certified Kubernetes Administrator) certification.
- Experience with disaster recovery and business continuity planning.
- Background in site reliability engineering practices and SLO/SLI methodologies.
Technical Stack
- Cloud/Infrastructure: AWS, EKS, Kubernetes, Terraform, GitOps
- Monitoring/Observability: Datadog, Splunk, Prometheus
- CI/CD: GitLab CI, Jenkins
- Languages/Tools: Python, Go, Helm
We offer a flexible work environment and celebrate internal mobility. We recognize the importance of purpose, well-being, and work-life balance. Within Empower and our communities, we work hard to create a welcoming and inclusive environment.
We are an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age, race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law.

