About the Role
Role details below.
Responsibilities
- Design and implement cloud infrastructure solutions across Azure and AWS, focusing on scalability, reliability, and performance optimization
- Build and maintain CI/CD pipelines using Azure DevOps, GitHub Actions, and AWS CodePipeline with automated testing and deployment strategies
- Implement and optimize containerized applications using Azure Kubernetes Service (AKS), AWS Elastic Kubernetes Service (EKS), and container orchestration with Docker and Kubernetes
- Develop infrastructure-as-code solutions using Terraform, Azure Bicep, ARM Templates, and AWS CloudFormation with emphasis on modularity and reusability
- Implement comprehensive observability solutions using Azure Monitor, AWS CloudWatch, Dynatrace, or similar tools, creating dashboards, alerts, and monitoring best practices
- Provision and maintain AI service infrastructure, ensuring high availability of model endpoints, implementing fallbacks and guardrails, deploying central AI platform components (proxies, gateways, monitoring), and managing observability, logs, and cost attribution workflows
- Design and manage message queuing systems for event-driven architectures using RabbitMQ, Kafka, Azure Service Bus, AWS SQS/SNS, and third-party solutions
- Optimize cloud storage solutions across platforms: Azure (Blob Storage, Files, Data Lake Storage) and AWS (S3, EBS, EFS)
- Implement security best practices across both platforms including: IAM policies and role-based access control (Azure RBAC, AWS IAM), Secrets management (Azure Key Vault, AWS Secrets Manager), Certificate lifecycle management (Azure Key Vault, AWS Certificate Manager), Network security (Azure NSG/Firewall, AWS Security Groups/Network Firewall)
- Perform performance tuning and scalability optimization for applications, including load testing and capacity planning
- Conduct incident response, root cause analysis, and implement improvements to prevent recurrence
- Collaborate with development teams to ensure infrastructure supports application requirements and deployment needs
- Mentor L1, L2, and L3 engineers on cloud technologies, DevOps practices, and troubleshooting techniques
- Contribute to infrastructure standards, best practices documentation, and runbook development
- Support disaster recovery planning and backup/restore procedures with testing and validation
- Monitor and optimize cloud spending through FinOps practices, implementing cost-saving measures across both platforms
- Work with both Azure and AWS platforms daily to support mission-critical production workloads
- The above statements are intended only to describe the general nature of the job and should not be construed as an all-inclusive list of position responsibilities
Requirements
- 6+ years of experience in cloud infrastructure, DevOps, or SRE roles with proven technical expertise
- Strong proficiency in both Azure and AWS cloud platforms with demonstrated capability to architect and implement solutions in either environment
- Professional-level certifications in both platforms: Azure Solutions Architect Expert (AZ-305) - Required, an
Work Arrangement
hybrid