Role Overview
As a Senior Cloud & Platform Engineer, you will be responsible for designing and managing the foundational cloud infrastructure that powers a high-performance Data and AI platform. Your work will directly enable engineering teams to develop, deploy, and scale applications quickly and securely through a self-service model built on automation and Infrastructure as Code.
Key Responsibilities
- Design and operate the AWS environment supporting data-intensive and AI-driven applications
- Build and maintain a developer-centric platform that allows engineers to provision resources independently and safely
- Implement and manage CI/CD pipelines with integrated testing, security checks, and deployment strategies such as blue/green
- Develop cloud-native networking solutions using VPCs, peering, Transit Gateways, and load balancers optimized for real-time data flow
- Enforce security standards through least-privilege IAM policies, encryption in transit and at rest, and centralized secret management
- Establish comprehensive monitoring and observability using tools like Prometheus, Grafana, Datadog, or New Relic to track system health and performance
- Configure and optimize managed databases including PostgreSQL and vector databases for reliability and speed
- Ensure all infrastructure is defined in code, version-controlled, and free from manual configuration
Required Qualifications
- Minimum of 7 years in systems or infrastructure engineering with deep AWS experience
- Proven expertise with Infrastructure as Code using Terraform or Pulumi
- Strong background in Kubernetes platforms such as Amazon EKS or ECS, including Fargate, service mesh, and auto-scaling
- Experience building and maintaining CI/CD systems using GitHub Actions, GitLab CI, or Jenkins
- Advanced knowledge of VPC architecture, routing, peering, and load balancing for high-throughput applications
- Hands-on experience with IAM policy design, AWS Secrets Manager, or HashiCorp Vault
- Familiarity with encryption standards and monitoring solutions for cloud environments
Preferred Skills
- Experience with AWS SageMaker and MLOps platforms like Kubeflow
- Background in managing GPU instances for AI inference workloads
- Proficiency with serverless technologies such as AWS Lambda and EventBridge
- Track record in cloud cost optimization using Reserved Instances, Savings Plans, and AWS Cost Explorer
- Experience defining and tracking Service Level Objectives (SLOs)
- Participation in blameless post-mortem processes to improve system resilience


