Responsibilities
- Lead the strategic direction and long-term planning for cloud infrastructure powered by autonomous agents.
- Design and manage the cloud networking backbone, including virtual private clouds, private links, and interconnections with major cloud and next-generation providers to enable fast, high-volume AI processing.
- Develop and expand compute platforms using container orchestration and scalable instance groups to handle real-time and batch workloads across multiple regions.
- Create and sustain secure, segmented deployment architectures for shared, dedicated, and customer-controlled cloud environments, ensuring proper cross-account access, identity management, and policy enforcement.
- Refine multi-region deployment models to ensure uptime, seamless failover, and optimal data placement, including intelligent traffic distribution and emergency recovery procedures.
- Collaborate with security teams to implement advanced controls like customer-managed encryption keys, network segmentation, and compliance-ready auditing for regulated industries.
- Build automation frameworks, operational tools, and standardized procedures to streamline ongoing infrastructure management, including deployment, updates, and incident handling across all product environments.
Responsibilities
- Own the roadmap and technical strategy for agent-driven cloud infrastructure management.
- Design and operate Perplexity’s cloud networking fabric, including VPC architectures, private connectivity, and peering with hyperscalers and neocloud providers to support low-latency, high-throughput AI workloads.
- Architect and scale compute platforms (Kubernetes/EKS, autoscaling groups, and mixed CPU/GPU fleets) to efficiently serve online request traffic and background workloads across regions.
- Build and maintain secure, isolated deployment topologies for multi-tenant, single-tenant, and customer-owned cloud (BYOC) environments, including cross-account networking, identity, and policy guardrails.
- Implement and evolve multi-region strategies for availability, failover, and data locality, including traffic routing, regional capacity planning, and disaster recovery playbooks.
- Partner with security to deliver enterprise controls such as BYOK/KMS integrations, network isolation, and auditability required for regulated customers.
- Develop automation, tooling, and runbooks that make day-2 operations (provisioning, upgrades, incident response) predictable and repeatable for Perplexity products across all environments.