Responsibilities
- Build and operate the systems that power Vanta’s FedRAMP environments, including automated release, vulnerability remediation, and evidence generation pipelines that meet strict compliance timelines
- Design and maintain Vanta’s vulnerability management platform, automating detection, remediation, and compliance reporting across both FedRAMP and non-FedRAMP environments
- Define and evolve Vanta’s production reliability framework, including SLOs, incident response patterns, observability standards, service catalog, metrics dashboards, and the Vanta SLA definition
- Improve incident response workflows and systems for faster recovery
- Engineer reliability improvements for CI and deploy workflows, reducing production friction and operational load, while maintaining deployment velocity
- Collaborate with product teams to embed reliability best practices, guiding operational readiness reviews and helping teams design for resilience
- Lead design and improvement of datacenter and environment build-outs for future FedRAMP levels and regional expansion
- Identify and solve complex scalability and performance challenges, particularly related to service reliability and data throughput
- Work with talented and kind engineers to make a significant impact on our customer base, enabling them to improve their security and prove it
- Contribute to building Vanta’s engineering culture as we grow
Requirements
- Experience operating services in multiple environments requiring strict compliance including FedRAMP
- Technical lead in successfully driving large scale reliability initiatives across an entire product engineering organization
- Technical leadership roles on Infrastructure or platform teams
- Experience with infrastructure, AWS services, and scaling platforms in fast-growing environments
- Experience with TypeScript, React, Node.js, MongoDB, GitHub Actions, and AWS services such as Fargate and ECS
Nice to Have
- Care deeply about empowering other teams to build highly resilient and scalable production services
- Thoughtful about trade-offs and have good product sense when creating highly available infrastructure/services
- Open to using AI to amplify their skills and strengthen their work - demonstrating curiosity, a willingness to learn, and sound judgment in applying AI responsibly to improve efficiency and impact
Work Arrangement
Hybrid
Team
Structure: Enterprise Resilience team
Additional Information
- 16 weeks paid Parental Leave for all new parents
- Health & wellness stipend
- Remote workspace, internet, and cellphone stipend
- Commuter benefits for team members who report to the SF and NYC office
- Family planning benefits
- Matching 401(k) contribution with immediate vesting
- Flexible PTO policy, plus 80 hours of Sick Time
- 11 company-paid holidays
- Virtual team building activities, lunch and learns, and other company-wide events!
