Responsibilities
- Lead platform engineering initiatives using Kubernetes (EKS), Helm, and Infrastructure as Code
- Design and operate CI/CD platforms and deployment strategies to enable safe, low-risk releases
- Build and maintain strong observability foundations, including metrics, logging, alerting, and dashboards tied to service health
- Design, build, and operate scalable, secure AWS infrastructure, including VPCs, subnets, routing, NAT, VPNs, and Transit Gateway
- Own and evolve cloud networking and connectivity architecture, ensuring secure, reliable, and performant communication between services
- Configure and manage Cloudflare to support edge security (including WAF, DDoS protection, and rate limiting), optimize CDN performance and caching strategies, and handle DNS and traffic management.
- Actively participate in incident response, root cause analysis, and blameless post-mortems to drive durable corrective actions
- Strengthen security and compliance across infrastructure, including IAM, network security, container security, and SOC2 alignment
- Define and improve service reliability through SLIs, SLOs, and error budgets
- Reduce operational toil through automation and standardized platform patterns
- Identify and drive cost optimization, including cloud and network egress efficiency
- Mentor engineers and contribute to a collaborative, high-performance engineering culture
Requirements
- 6+ years of experience in DevOps, Platform Engineering, or SRE roles
- Strong hands-on experience with: AWS, including networking and connectivity design
- Kubernetes (EKS)
- Infrastructure as Code (AWS CDK, Terraform, or similar)
- CI/CD pipelines (GitHub Actions preferred)
- Experience configuring and operating Cloudflare (CDN, WAF, DNS, edge security)
- Hands-on experience with scripting languages such as Python and Bash for automation and operational tooling
- Proven experience owning production systems with high availability, performance, and security requirements
- Solid understanding of cloud networking fundamentals (routing, load balancing, security groups, NACLs)
- Strong understanding of SRE principles and operational excellence
- Strong communication skills and ability to collaborate across teams
- Hands-on experience improving observability platforms (Prometheus, Grafana, OpenTelemetry)
- Prior experience mentoring engineers or leading cross-team initiatives
- Consistent, Reliable High-Speed Internet Access
- Dedicated, quiet workspace free from distractions
Nice to Have
- Experience with advanced AWS networking (Transit Gateway, multi-account architectures, private connectivity)
- Experience building or operating internal developer platforms
- Familiarity with SOC2 or regulated environments
- Exposure to AI-assisted operations or intelligent automation
Work Arrangement
Remote (Worldwide)
Team
Team size: 175+. Structure: remote-first, product-led tech company
Additional Information
- Not open to applicants residing in: Alaska, California, Hawaii, Washington D.C.
- Some flexibility is required during busy seasons or critical “right-now” moments


