Site Reliability Engineering Manager at Veracode (Expired)

Veracode is looking for a Site Reliability Engineering Manager to lead the reliability, availability, and operational excellence of our production systems. You will define and enforce reliability standards, manage production risk, and ensure services meet agreed-upon service levels.

What You'll Do

Lead a 9 member global Site Reliability Engineering Team.
Set objectives and key results, KPIs and manage team performance.
Act as the primary point of accountability for reliability concerns that span multiple teams, including DevOps, Security, Database, and Product Engineering.
Manage the team on-call schedule and act as the point of escalation for alerts and production incidents.
Create tickets, groom the backlog, and prioritize work in sprints.
Utilize AWS services to design scalable cloud solutions that support critical systems.
Partner with software engineering teams to ensure monitoring and alerting is in place for consistent, scalable, and automated service delivery.
Own the design and enforcement of the organization’s observability strategy.
Drive alert hygiene, standardization, and reduction of alert fatigue across the organization.
Lead efforts to automate infrastructure deployment and management using Terraform, Kubernetes, and other cloud-native tools.
Create automated incident response workflows to handle common infrastructure and application issues.
Collaborate with security teams to ensure systems adhere to industry-standard security practices.
Document and train engineering teams on best practices in reliability, scalability, and operational excellence.
Design, operate, and continuously improve on-call and incident response processes.
Contribute to incident and process post-mortems.
Ensure uptime, SLAs, and availability of critical platform components through process improvements and automation.
Monitor existing application and infrastructure while working to improve existing monitoring.
Communicate effectively with project stakeholders and management.
Develop and support processes to maintain uptime, SLAs and availability of critical platform components.
Troubleshoot and resolve production issues related to systems, network, and application.

What We're Looking For

Bachelor's Degree in Computer Science, Information Science, Engineering, or related/relevant field or equivalent experience.
2+ years working as a manager or team lead with direct reports.
5+ years working in an SRE, DevOps, Cloud Engineering or similar role.
Experience with AWS and automation tools like Terraform, CloudFormation, or Ansible.
Hands-on experience deploying, managing, and troubleshooting Kubernetes clusters.
Hands-on proficiency with observability, monitoring, and alerting tools (Datadog, Sumologic, Prometheus, Grafana, etc.).
Familiarity with CI/CD pipelines and repository management tools (e.g., GitLab, Jenkins, GitHub).
Strong programming skills for automation (Python, Go, or similar languages).
Solid understanding of infrastructure as code (IaC) and GitOps methodologies.
Strong communication skills with the ability to collaborate effectively across different teams.
Ability to work in an Agile environment.
Proven experience in troubleshooting production environments and improving system reliability.
Experience with on-call/incident management systems such as PagerDuty, VictorOps or OpsGenie.

Nice to Have

Experience with service meshes (e.g., Istio) to enhance application observability and security.
Familiarity with advanced Kubernetes features (e.g., StatefulSets, Helm, Operators).
Knowledge of database management and migration processes, including RDS and DMS.

Technical Stack

Cloud & Infrastructure: AWS, Terraform, CloudFormation, Ansible, Kubernetes
Monitoring & Observability: Datadog, Sumologic, Prometheus, Grafana
CI/CD & Development: GitLab, Jenkins, GitHub, Python, Go
Additional Tools: Istio, Helm, RDS, DMS

Team & Environment

You will lead a 9 member global team of Site Reliability Engineers.

Veracode provides employment opportunities to all applicants without regard to race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Veracode was looking for a Site Reliability Engineering Manager

What You'll Do

What We're Looking For

Nice to Have

Technical Stack

Team & Environment

Similar Jobs

Platform Engineer, Infrastructure

Senior DevOps Engineer

Senior Site Reliability Engineer - Ireland

DevOps & Site Reliability Engineer

Software Engineer - Observability

Senior Private Cloud Consultant mit Schwerpunkt Proxmox/Kubernetes (m/w/d)

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

Veracode was looking for a Site Reliability Engineering Manager

What You'll Do

What We're Looking For

Nice to Have

Technical Stack

Team & Environment

Similar Jobs

Platform Engineer, Infrastructure

Senior DevOps Engineer

Senior Site Reliability Engineer - Ireland

DevOps &amp; Site Reliability Engineer

Software Engineer - Observability

Senior Private Cloud Consultant mit Schwerpunkt Proxmox/Kubernetes (m/w/d)

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

DevOps & Site Reliability Engineer