The Site Reliability Engineer plays a critical role in ensuring the stability, scalability, and performance of our cloud-native infrastructure and services. This position bridges the gap between development and operations by applying engineering principles to system reliability, automation, and incident response. The ideal candidate will have deep expertise in Kubernetes and AWS, with a strong focus on proactive monitoring, incident management, and continuous improvement of production systems. You will collaborate closely with development teams to design resilient architectures, enforce SRE best practices, and drive incident postmortems to prevent recurrence. This role requires a commitment to operational excellence, a data-driven mindset, and the ability to thrive in a fast-paced, dynamic environment.
Responsibilities
- Manage and enhance Kubernetes platform operations, with a preference for Amazon EKS, including upgrades, scaling, and reliability improvements
- Optimize CI/CD workflows to ensure safe deployments through rollback capabilities, progressive delivery strategies, and policy enforcement
- Develop and maintain infrastructure as code using tools like Terraform, CloudFormation, Helm, and GitOps practices
- Enhance system observability by refining alerting, dashboards, incident runbooks, and conducting postmortem analyses
Requirements
- Proven experience with Kubernetes in production environments, including debugging, resource optimization, and networking fundamentals
- Hands-on AWS platform knowledge, particularly with services such as EKS, IAM, VPC, load balancers, Route53, and KMS
- Demonstrated ability to automate infrastructure and operational workflows using code-driven methodologies
Nice to Have
- Familiarity with GitOps tools including Argo CD and Flux, service mesh configurations, ingress controllers, and OpenTelemetry
- Experience supporting Java-based microservices and applying runtime performance tuning techniques
Tech Stack
Kubernetes, EKS, CI/CD, Terraform, CloudFormation, Helm, GitOps, Argo CD, Flux, AWS, IAM, VPC, ALB, NLB, ELB, Route53, KMS, OpenTelemetry, Java, microservices
Benefits
- Medical, dental, and vision insurance coverage
- Employee assistance program offering comprehensive support services
- 401(k) retirement savings plan
- Paid time off and recognized holidays
- Paid days allocated for professional learning and development
Compensation
$92,000 - $125,000 plus performance-based incentives tied to individual and company results
Work Arrangement
Hybrid work model based in Atlanta, GA, with an option for remote work
Team
Part of the Technical Products and Services team, delivering strategic consulting, design, advisory services, market research, and contact center analytics
- Solution-focused approach to challenges
- Driven by technology and data intelligence
- Leverages market-leading insights and tools
- Emphasizes human-centered design principles
- Recognized as a 'World's Best Workplaces'
- Named among Best Companies for Career Growth and Best Company Culture
Additional Information
- Position located in Atlanta, GA
- Remote work option available
- Full-time employment
- Work involves regular use of computers, keyboards, telephones, headsets, and standard office equipment; primarily sedentary work
- Application deadline is March 13, 2026
- This is an immediate hire position with expectation to start promptly upon selection
- Employer complies with Equal Employment Opportunity and Affirmative Action regulations
- Job Applicant Privacy Notice available for California residents
- Reasonable accommodations are available upon request
- Affirmative Action Plan is available for review
Not specified


