Bengaluru or Hyderabad Hybrid

DigitalOcean is hiring a Senior Cloud Support Engineer

Requirements

  • 7+ years of progressive experience in technical support, solutions engineering, DevOps, or site reliability engineering roles with consistent demonstration of technical leadership
  • 5+ years in senior technical customer-facing roles with proven ability to manage enterprise customer relationships and complex technical engagements
  • Expert-level Kubernetes knowledge: Production-scale architecture design, cluster operations, advanced troubleshooting, performance optimization, security hardening, and networking (CNI, service meshes, ingress controllers)
  • Deep GPU/AI/ML infrastructure expertise: Multi-GPU and multi-node training, distributed computing frameworks, GPU resource management, inference optimization, and production ML deployment patterns
  • Advanced understanding of production AI/ML pipelines including model training, optimization, deployment, and monitoring at scale
  • Extensive experience with major ML frameworks (PyTorch, TensorFlow, Hugging Face) including distributed training strategies and production deployment patterns
  • Expertise in GPU optimization techniques: CUDA programming concepts, TensorRT, vLLM, model quantization (INT4, INT8, FP8), and inference performance tuning
  • Deep knowledge of MLOps practices: CI/CD for ML, model versioning, experiment tracking, feature stores, and production monitoring
  • Experience with large-scale distributed AI/ML workloads including data parallelism, model parallelism, and mixed-precision training
  • Proven experience designing fault-tolerant, scalable cloud architectures with deep consideration for cost optimization, security, compliance, and operational excellence
  • Expert-level Linux system administration: Kernel tuning, performance profiling, security hardening, advanced troubleshooting, and automation
  • Advanced networking expertise: Deep understanding of TCP/IP, routing protocols, load balancing, CDNs, VPNs, network security, and troubleshooting complex network issues
  • Strong programming skills in Python with experience in at least one additional systems language (Go, Rust, C++, or similar)
  • Extensive experience with infrastructure-as-code (Terraform, CloudFormation, Pulumi) and configuration management tools
  • Exceptional communication abilities: Can translate highly complex technical concepts into clear, actionable guidance for audiences ranging from junior engineers to C-level executives
  • Demonstrated leadership capabilities including mentoring team members, leading cross-functional initiatives, and influencing without direct authority
  • Strong consultative approach: Ability to discover underlying customer needs, challenge assumptions respectfully, and craft solutions that balance technical excellence with business pragmatism
  • Track record of driving organizational improvement through process design, automation, documentation, and strategic initiatives

Nice to Have

  • Kubernetes certifications: CKA (Certified Kubernetes Administrator), CKAD, or CKS (Certified Kubernetes Security Specialist)
  • Advanced cloud certifications: AWS Solutions Architect Professional, GCP Professional Cloud Architect, Azure Solutions Architect Expert
  • GPU/AI certifications: NVIDIA DLI certifications, CUDA programming certifications, or similar specialized credentials
  • Open-source contributions to AI/ML projects, Kubernetes ecosystem, or infrastructure tools
  • Published technical content: Blog posts, whitepapers, solution guides, or technical documentation demonstrating thought leadership
  • Speaking experience at technical conferences, meetups, or webinars on topics related to cloud infrastructure, AI/ML, or DevOps
  • Active participation in technical communities (CNCF, Kubernetes SIGs, AI/ML forums, cloud-native communities)
  • Experience with observability platforms: Prometheus, Grafana, Datadog, New Relic, or similar monitoring/alerting systems
  • Multi-cloud or hybrid-cloud architecture experience: Designing solutions that span AWS, GCP, Azure, and on-premises infrastructure
  • Experience with DigitalOcean or Paperspace products as a user or customer
  • Database expertise: Experience with both relational (PostgreSQL, MySQL) and NoSQL (MongoDB, Redis) databases at scale
  • Security & compliance knowledge: Experience with SOC2, HIPAA, GDPR, or other compliance frameworks in cloud environments

Work Arrangement

Hybrid

Required Skills
Technical SupportDevOps
About company
DigitalOcean
DigitalOcean builds the simplest scalable cloud for a strong community of top talent and the dreamers and builders in the world.
All jobs at DigitalOcean Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 3 months ago