Responsibilities
- Lead the development and execution of technical vision and long-term planning for cloud infrastructure powered by autonomous agents.
- Design and manage cloud networking systems, including virtual private clouds, private links, and interconnections with major and emerging cloud providers to enable fast, high-volume AI processing.
- Create and scale computing platforms using container orchestration (Kubernetes/EKS), dynamic scaling groups, and hybrid CPU/GPU resources to handle real-time and batch workloads across multiple regions.
- Develop and sustain secure, segregated deployment architectures for multi-tenant, dedicated-tenant, and customer-controlled cloud environments, incorporating cross-account networking, identity management, and policy enforcement.
- Design and refine multi-region infrastructure strategies to ensure high availability, seamless failover, and optimal data placement, including intelligent traffic distribution, regional resource forecasting, and recovery procedures.
- Collaborate with security teams to implement enterprise-grade safeguards such as customer-managed keys, network segmentation, and full audit trails to meet compliance needs for regulated industries.
- Build automation frameworks, operational tools, and standardized procedures to streamline ongoing operations—including provisioning, updates, and incident resolution—across all product environments.