Palo Alto Networks is hiring a Senior Site Reliability Engineer (NetSec) for the SASE Platform team. We build and operate highly available, secure, and globally distributed services that protect users, applications, and data for some of the world’s largest enterprises. In this role, you will play a critical part in ensuring our platform is reliable, scalable, performant, and secure from day one.
What You'll Do
- Collaborate with development teams to embed reliability, scalability, and operability into services from the earliest design stages.
- Design, review, and evolve cloud-native architectures to improve availability, performance, cost efficiency, and fault tolerance.
- Build and operate automation for provisioning, deploying, and managing infrastructure at global scale using Infrastructure as Code.
- Improve CI/CD pipelines and release processes to enable safe, fast, and repeatable deployments.
- Drive observability best practices, including metrics, logs, traces, SLIs/SLOs, and data-driven incident analysis.
- Participate in on-call rotations, continuously reducing MTTR through automation, runbooks, and proactive reliability improvements.
- Mentor and guide engineers on large-scale cloud and SASE deployments, fostering a strong SRE culture.
- Participate in architecture and design reviews, bringing a reliability and operational excellence mindset.
- Champion reliability, security, and operational maturity across the organization.
What We're Looking For
- Bachelor’s degree in Engineering, Computer Science, or a related technical field (or equivalent practical experience).
- 5+ years of experience working with Unix/Linux systems (shell, tools, networking, storage, kernel concepts).
- 2+ years of hands-on experience with microservices architectures running on Kubernetes and container platforms.
- Strong understanding of distributed systems design, fault tolerance, scalability patterns, and high-availability architectures.
- Experience operating workloads in public cloud environments (AWS, GCP, Azure, or hybrid) at medium to large scale.
- Proficiency in building automation and tools in Python, Java, or similar languages for production environments.
- Strong Infrastructure as Code experience (Terraform, Ansible, Chef, Puppet, or similar).
- Experience designing and operating monitoring, alerting, and observability systems at scale.
- A tools-first mindset with a passion for reducing toil and increasing engineering efficiency.
- Excellent communication skills and the ability to lead discussions across engineering and security teams.
- Experience applying reliability and security frameworks to design, review, and operate production systems.
Nice to Have
- Networking expertise, including TCP/IP, DNS, BGP, routing, load balancing, proxies, VPNs, and cloud networking concepts—especially relevant to SASE architectures.
- Experience operating or supporting SASE, SD-WAN, Zero Trust, or network security platforms.
- Familiarity with AI/LLM technologies, including using LLMs to improve operational workflows (incident analysis, alert enrichment, runbooks, automation) and experience integrating AI/ML services into production systems.
- Understanding of reliability, security, and governance considerations for AI-driven services.
Technical Stack
- Unix/Linux
- Kubernetes
- AWS, GCP, Azure
- Python, Java
- Terraform, Ansible, Chef, Puppet
Work Mode
This role offers a hybrid work model.
Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.






