SS&C Eze is looking for a Senior Site Reliability Engineer to ensure the availability, performance, scalability, and reliability of critical systems and services. You'll collaborate closely with infrastructure, Engineering, DevOps, and security teams to build robust systems, automate operations, and implement industry best practices.
What You'll Do
- Maintain and improve the uptime, performance, and availability of production systems.
- Define and track SLIs, SLOs, and SLAs to ensure service reliability and user satisfaction.
- Implement and manage monitoring, alerting, and observability tools like Prometheus, Grafana, Datadog, and ELK.
- Participate in on-call rotations, respond to incidents, and perform root cause analysis and postmortems.
- Automate repetitive tasks using scripts, configuration management, and Infrastructure as Code (IaC).
- Develop CI/CD pipelines to streamline deployment and operational processes.
- Analyze system performance and capacity trends to plan for future growth.
- Collaborate with engineering teams to design systems that scale reliably.
- Support cloud and/or hybrid infrastructure (AWS, Azure, GCP, VMware, etc.).
- Manage system provisioning, configuration, and patching via tools such as Ansible, Terraform, or Puppet.
- Act as a bridge between development and operations teams, championing DevOps and SRE principles.
- Contribute to a culture of continuous improvement, reliability, and accountability.
What We're Looking For
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
- 3+ years of experience in a Site Reliability, DevOps, or Systems Engineering role.
- Experience with Linux/Unix systems, Windows, shell scripting, and administration.
- Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.).
- Hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Strong knowledge of networking, security, load balancing, and DNS.
- Experience with monitoring/logging tools (e.g., Prometheus, Grafana, ELK, Splunk, Datadog).
Nice to Have
- Experience with containerization and orchestration tools (Docker, Kubernetes).
- Familiarity with ITIL processes and incident/change/problem management frameworks.
- Exposure to compliance and security standards (e.g., ISO 27001, SOC 2, HIPAA).
- Experience in large-scale distributed systems and microservices architectures.
Technical Stack
- Operating Systems: Linux/Unix, Windows
- Languages/Scripting: Python, Go, Bash
- Cloud Platforms: AWS, Azure, GCP
- Monitoring/Observability: Prometheus, Grafana, Datadog, ELK, Splunk
- Containers & Orchestration: Docker, Kubernetes
- Infrastructure as Code: Ansible, Terraform, Puppet
- Virtualization: VMware
Team & Environment
You'll be part of a collaborative operations team where you will work closely with infrastructure, engineering, DevOps, and security functions.
Benefits & Compensation
- 401k Matching Program
- Professional Development Reimbursement
- Flexible Personal/Vacation Time Off
- Sick Leave
- Paid Holidays
- Medical, Dental, Vision Insurance
- Employee Assistance Program
- Parental Leave
- Discounts on fitness clubs, travel, and more
Work Mode
This role is a hybrid position based in one of the following locations: Kansas City, MO, MO, TX, GA, or FL.
SS&C Technologies is an Equal Employment Opportunity employer and does not discriminate against any applicant for employment or employee on the basis of race, color, religious creed, gender, age, marital status, sexual orientation, national origin, disability, veteran status or any other classification protected by applicable discrimination laws.


