Bangalore, Karnataka, India

Empower Retirement, LLC is hiring a Senior Site Reliability Engineer

Responsibilities

  • Create and deploy resilient, highly available systems that support essential financial transaction workloads
  • Develop cloud infrastructure on AWS following proven practices to balance cost, speed, and system dependability
  • Lead incident management during major outages, coordinating with multiple teams to restore services quickly
  • Manage post-incident reviews for critical events, ensuring corrective actions are defined and followed through
  • Define and monitor service level objectives and indicators to measure system reliability
  • Develop and test disaster recovery and business continuity strategies to ensure operational resilience
  • Build reusable Infrastructure as Code templates using Terraform with modular design and state management
  • Design and tune multi-cluster Kubernetes environments with autoscaling and efficient resource use
  • Implement monitoring solutions using Datadog and Splunk with actionable alerts and dashboards
  • Integrate gradual release strategies like canary and blue-green deployments into GitOps pipelines
  • Develop automation tools that minimize manual operations and increase engineering productivity
  • Collaborate with software teams to enhance application reliability through design input and reviews
  • Coach mid-level and junior SREs through code reviews and technical mentorship
  • Influence system architecture decisions that affect platform stability and growth capacity
  • Promote site reliability engineering principles across engineering groups
  • Take part in on-call schedules and lead initiatives to reduce alert fatigue and operational load
  • Enforce zero-trust security models across all infrastructure layers
  • Ensure systems comply with financial industry regulations and internal policy requirements
  • Review infrastructure changes and deployment methods for security risks
  • Support compliance audits and respond to regulatory questions

Compensation

Competitive salary and benefits package commensurate with experience

Work Arrangement

Hybrid work model with flexibility based on role and location

Team

Part of a dedicated engineering team focused on platform reliability and operational excellence

Responsibilities

  • Design and implement highly available, fault-tolerant systems supporting critical financial transactions
  • Architect infrastructure solutions using AWS best practices, optimizing for cost, performance, and reliability
  • Lead complex incident response efforts, coordinating across teams to restore service rapidly
  • Drive postmortem processes for high-severity incidents, ensuring meaningful action items are identified and completed
  • Establish and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key services
  • Design and implement disaster recovery strategies and business continuity plans
  • Build sophisticated Infrastructure as Code (IaC) solutions using Terraform, incorporating advanced patterns like modules, workspaces, and state management
  • Architect and optimize multi-cluster EKS environments, implementing pod autoscaling, cluster autoscaling, and resource optimization
  • Design observability strategies using Datadog and Splunk, creating meaningful metrics, dashboards, and alerting that enable proactive problem detection
  • Implement progressive delivery mechanisms (canary deployments, blue-green deployments) within GitOps workflows
  • Build automation frameworks that significantly reduce operational toil and improve team efficiency
  • Partner with development teams to improve application reliability, conducting design reviews and providing architectural guidance
  • Mentor and guide junior and intermediate SREs, conducting code reviews and providing technical coaching
  • Contribute to architectural decisions that impact platform reliability and scalability
  • Evangelize SRE best practices across the engineering organization
  • Participate in on-call rotations and drive improvements to reduce on-call burden
  • Implement and maintain zero-trust security controls across infrastructure
  • Ensure systems meet financial services regulatory requirements and internal compliance standards
  • Conduct security reviews of infrastructure changes and deployment processes
  • Participate in audit preparations and respond to compliance-related inquiries

May offer sponsorship for qualified candidates depending on business needs

Required Skills
AWSKubernetesTerraformEKSDatadogArgoCDGitLab CIJenkins
About company
Empower Retirement, LLC
Empower is a financial services company focused on transforming financial lives by helping individuals achieve financial freedom through advice, people, and technology. The company offers retirement, wealth management, and financial planning solutions.
All jobs at Empower Retirement, LLC Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 3 months ago