Hyderabad, Telangana, India Hybrid Employment

VSolvit is hiring a DevOps / Site Reliability Engineering (SRE) Lead

About the Role

VSolvit is seeking a DevOps / Site Reliability Engineering (SRE) Lead to architect and manage a resilient, secure, and scalable Azure cloud environment. You will be responsible for the end-to-end design and operation of cloud-native infrastructure, CI/CD, observability, and reliability practices.

What You'll Do

  • Design and operate Azure landing zones aligned with enterprise governance.
  • Lead AKS production architecture including multi-region, high availability, and zero-downtime deployments.
  • Define SLOs, SLIs, and error budgets across digital platforms.
  • Implement production reliability frameworks including chaos testing and resilience validation.
  • Architect blue-green, canary, and progressive deployment strategies.
  • Build enterprise Infrastructure as Code (IaC) frameworks using Terraform, Bicep, or ARM with Azure DevOps or GitHub Actions.
  • Implement a GitOps model using ArgoCD or Flux, enforce immutable infrastructure patterns, and standardize reusable infrastructure modules.
  • Design secure multi-stage CI/CD pipelines with automated code quality, SAST/DAST, container scanning, and dependency vulnerability scanning.
  • Integrate Azure Key Vault and managed identities into pipelines and enable automated rollback and deployment guardrails.
  • Operate and optimize Azure Kubernetes Service (AKS) and Helm, implementing pod security policies, workload identity, and network policies.
  • Build internal platform templates for developer self-service.
  • Design an enterprise observability stack using Azure Monitor, Log Analytics, Application Insights, Prometheus, and Grafana.
  • Define centralized logging, distributed tracing, and alerting frameworks.
  • Lead production incident response and postmortem analysis and build real-time dashboards for leadership visibility.
  • Implement Azure Policy & RBAC frameworks and design secure multi-tenant cloud architecture.
  • Integrate Defender for Cloud, Conditional Access, and Identity Federation.
  • Lead SOC2, ISO, and internal audit cloud controls and define a least privilege model across subscriptions.
  • Implement predictive monitoring, anomaly detection, and automate capacity scaling using telemetry insights.
  • Integrate ML-based alert reduction and noise suppression and enable self-healing infrastructure patterns.
  • Lead DevOps and SRE engineers and establish reliability KPIs and maturity roadmap.
  • Collaborate with Architecture, Security, Data, and Product teams.
  • Drive platform modernization strategy and mentor teams on cloud native best practices.

What We're Looking For

  • Proven experience leading the design and implementation of resilient, enterprise-scale Azure cloud platforms.
  • Deep hands-on expertise with Azure Kubernetes Service (AKS) production architecture, including multi-region and high-availability deployments.
  • Strong background in Site Reliability Engineering, including defining SLOs/SLIs, error budgets, and implementing chaos testing.
  • Expert-level proficiency with Infrastructure as Code tools like Terraform, Bicep, or ARM.
  • Extensive experience building and operating secure, multi-stage CI/CD pipelines with integrated security scanning.
  • Demonstrated skill in implementing GitOps (ArgoCD/Flux), observability stacks (Azure Monitor, Prometheus, Grafana), and cloud security controls (Defender for Cloud, RBAC).
  • Experience leading cloud compliance efforts (SOC2, ISO) and designing secure, multi-tenant architectures.
  • Strong leadership skills with experience mentoring teams and collaborating across Architecture, Security, Data, and Product functions.

Technical Stack

  • Azure, Azure Kubernetes Service (AKS)
  • Terraform, Bicep, ARM
  • Azure DevOps, GitHub Actions
  • ArgoCD, Flux, Helm
  • Azure Monitor, Log Analytics, Application Insights
  • Prometheus, Grafana
  • Azure Key Vault, Defender for Cloud

Team & Environment

You will lead DevOps and SRE engineers and collaborate closely with Architecture, Security, Data, and Product teams.

Work Mode

This position follows a hybrid work model and is based in Hyderabad, India.

VSolvit is an equal opportunity employer.

Required Skills
AzureKubernetesAKSTerraformBicepARMAzure DevOpsGitHub ActionsArgoCDFluxHelmDevOpsSRESite Reliability EngineeringInfrastructure
Freelancing without stability?

Get steady projects, keep your freedom

Iglu connects you with international clients and handles contracts, payments, and admin. You get consistent work and flexibility — no more chasing invoices or worrying about gaps.

Consistent client projects
Contract & payment management
Flexible work schedule
Revenue-sharing compensation
See open positions
Work from anywhere
About company
VSolvit

VSolvit is a technology services provider that specializes in cybersecurity, cloud computing, geographic information systems (GIS), business intelligence (BI) systems, data warehousing, engineering services, and custom database and application development.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago