Bangalore, Karnataka, India Employment

Empower Retirement, LLC is hiring a Senior Manager Site Reliability Engineering

About the Role

Empower Retirement is looking for a Senior Manager of Site Reliability Engineering to lead the team responsible for improving the reliability, scalability, security, and operational excellence of our production platforms and services in the Data Migration organization. You will partner with application development, architecture, platform, data, and security teams to strengthen production stability, accelerate safe delivery, and improve developer and operator experience.

What You'll Do

  • Lead and manage SRE team(s) responsible for production reliability, incident response, and operational readiness across Empower systems and integrated platforms.
  • Establish and evolve SRE operating practices including on-call, incident triage/escalation, post-incident reviews, problem management, and operational governance.
  • Define and implement service reliability standards (e.g., SLIs/SLOs, error budgets, operational runbooks, readiness checklists) compliant to enterprise standards.
  • Drive automation-first approaches that reduce manual effort, increase operational consistency, and improve service resiliency (self-healing, auto-remediation, safe rollbacks).
  • Partner with engineering teams to improve deployment workflows, release governance, rollback planning, and post-deployment verification.
  • Partner with Production Support teams on operations training, executions, maintenance and escalations.
  • Lead observability strategy and execution: monitoring, alerting, logging, tracing, dashboards, and performance analysis using AWS and third-party tools.
  • Collaborate with data/platform and engineering teams to design and optimize AWS-native infrastructure patterns, including Infrastructure as Code and standardized CI/CD practices.
  • Ensure AWS security best practices are incorporated into reliability operations (IAM least privilege, network segmentation, data protection, vulnerability management).
  • Coordinate with upstream/downstream system owners and data/platform teams to manage dependencies, reduce operational risk, and improve end-to-end reliability.
  • Provide performance management for team members by setting clear objectives, coaching, mentoring, and building career development plans.
  • Assign teams and individuals to reliability initiatives of varying scope; partner with delivery leadership and Agile roles to prioritize work delivering frequent, measurable improvements.
  • Evaluate emerging SRE, cloud, and automation technologies; recommend and drive improvements for resiliency, efficiency, cost optimization, and operational maturity.
  • Contribute to functional roadmaps and collaborate with leadership on long-term reliability and platform strategy.

What We're Looking For

  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or equivalent experience.
  • 8+ years of experience in SRE, production operations, platform engineering, DevOps, or software engineering.
  • 2+ years leading people.
  • Strong AWS experience (e.g., EC2, S3, IAM, RDS, Lambda, VPC, CloudFormation/CloudWatch, ECS/EKS).
  • Demonstrated ability to lead incident response and operational processes, including troubleshooting complex production issues and driving root-cause remediation.
  • Experience designing or governing CI/CD pipelines and release processes (e.g., Jenkins, GitHub Actions, AWS CodePipeline).
  • Proficiency with automation/scripting (Python and/or Java) and strong Linux/shell skills.
  • Experience with Infrastructure as Code (Terraform, CDK, CloudFormation) and standardization of reusable infrastructure patterns.
  • Familiarity with containerization and orchestration concepts (Docker, Kubernetes/EKS) and modern deployment practices (e.g., GitOps).
  • Knowledge of observability tools and practices (CloudWatch and/or tools like Datadog/Splunk), performance monitoring, and operational dashboards.
  • Solid understanding of networking, security models, distributed systems architecture, and operational risk management.
  • Strong communication and cross-team collaboration skills, including the ability to partner with architecture, security, and engineering leadership.

Nice to Have

  • Experience defining and implementing SLOs/SLIs and error-budget based operating models.
  • Proven track record in reliability improvements through automation, resilience engineering, and measurable reduction of incidents/toil.
  • Experience with cost monitoring and optimization in cloud environments.
  • Experience supporting data-intensive workloads (batch processing, ETL/orchestration patterns, dependency management, and data-quality controls).
  • Experience leading reliability improvements across complex upstream/downstream systems and multi-team ownership boundaries.

Technical Stack

  • AWS: EC2, S3, IAM, RDS, Lambda, VPC, CloudFormation, CloudWatch, ECS/EKS
  • Languages: Python, Java
  • Systems: Linux/Shell
  • Infrastructure as Code: Terraform, CDK, CloudFormation
  • Containerization: Docker, Kubernetes/EKS
  • CI/CD: Jenkins, GitHub Actions, AWS CodePipeline
  • Observability: CloudWatch, Datadog, Splunk

Team & Environment

You will lead SRE team(s) within the Data Migration organization, working with the Enterprise SRE Center of Excellence, the Director, and other SRE Leads.

Empower Retirement is an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age, race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law.

Required Skills
AWSEC2S3IAMRDSLambdaVPCCloudFormationCloudWatchECS/EKSPythonJavaLinux/ShellTerraformDockerKubernetesJenkinsIncident ResponseSRE
Invoicing holding you back?

Focus on work, not paperwork

Stop worrying about invoicing, taxes, and compliance. Glopay handles the business setup, you handle the client work. Get paid faster and look professional.

Auto-generated compliant invoices
Built-in expense management
Income reports for tax season
95% of earnings stay with you
Try Glopay free
No credit card needed
About company
Empower Retirement, LLC

Empower provides financial services and retirement solutions, helping customers achieve financial freedom.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago