At Cognizant, we are hiring a Senior Site Reliability Engineer within our Cloud Infrastructure & Security services practice. This role is central to ensuring the reliability, performance, and operational excellence of our clients' Snowflake + dbt workloads running on AWS.
What You'll Do
- Ensure reliability for Snowflake and dbt data workloads and migrations.
- Establish and maintain monitoring, alerting, and operational dashboards.
- Manage Snowflake operations, including performance tuning, cost governance, and incident response.
- Debug dbt operations and model dependency issues.
- Implement observability using tools like Grafana, Prometheus, Datadog, or Splunk.
- Build and manage CI/CD pipelines using Terraform and GitLab/Jenkins.
- Apply strong SRE fundamentals: define SLOs/SLAs/Error Budgets, conduct RCAs, and automate tasks.
- Own incidents from detection through to resolution and root cause analysis.
What We're Looking For
- Expertise in AWS core services (EC2, S3, IAM, VPC, CloudWatch, Lambda).
- Hands-on experience with Snowflake operations, including tuning, cost governance, and incident response.
- Proficiency with dbt operations and debugging model dependencies.
- Experience implementing observability with Grafana, Prometheus, Datadog, or Splunk.
- Proven skill building CI/CD with Terraform and GitLab or Jenkins.
- Strong foundational SRE knowledge: SLO/SLA/Error Budgets, RCA, and automation.
Nice to Have
- Experience with Airflow or Prefect for orchestration.
- Knowledge of secrets management practices.
- Familiarity with Docker/EKS and Python automation.
- Understanding of zero-downtime migration patterns.
Technical Stack
- Data Platforms: Snowflake, dbt
- Cloud: AWS, GCP
- Observability: Grafana, Prometheus, Datadog, Splunk
- Infrastructure as Code: Terraform
- CI/CD: GitLab, Jenkins
- AWS Core: EC2, S3, IAM, VPC, CloudWatch, Lambda




