Requirements

7+ years of progressive experience in technical support, solutions engineering, DevOps, or site reliability engineering roles with consistent demonstration of technical leadership
5+ years in senior technical customer-facing roles with proven ability to manage enterprise customer relationships and complex technical engagements
Expert-level Kubernetes knowledge: Production-scale architecture design, cluster operations, advanced troubleshooting, performance optimization, security hardening, and networking (CNI, service meshes, ingress controllers)
Deep GPU/AI/ML infrastructure expertise: Multi-GPU and multi-node training, distributed computing frameworks, GPU resource management, inference optimization, and production ML deployment patterns
Advanced understanding of production AI/ML pipelines including model training, optimization, deployment, and monitoring at scale
Extensive experience with major ML frameworks (PyTorch, TensorFlow, Hugging Face) including distributed training strategies and production deployment patterns
Expertise in GPU optimization techniques: CUDA programming concepts, TensorRT, vLLM, model quantization (INT4, INT8, FP8), and inference performance tuning
Deep knowledge of MLOps practices: CI/CD for ML, model versioning, experiment tracking, feature stores, and production monitoring
Experience with large-scale distributed AI/ML workloads including data parallelism, model parallelism, and mixed-precision training
Proven experience designing fault-tolerant, scalable cloud architectures with deep consideration for cost optimization, security, compliance, and operational excellence
Expert-level Linux system administration: Kernel tuning, performance profiling, security hardening, advanced troubleshooting, and automation
Advanced networking expertise: Deep understanding of TCP/IP, routing protocols, load balancing, CDNs, VPNs, network security, and troubleshooting complex network issues
Strong programming skills in Python with experience in at least one additional systems language (Go, Rust, C++, or similar)
Extensive experience with infrastructure-as-code (Terraform, CloudFormation, Pulumi) and configuration management tools
Exceptional communication abilities: Can translate highly complex technical concepts into clear, actionable guidance for audiences ranging from junior engineers to C-level executives
Demonstrated leadership capabilities including mentoring team members, leading cross-functional initiatives, and influencing without direct authority
Strong consultative approach: Ability to discover underlying customer needs, challenge assumptions respectfully, and craft solutions that balance technical excellence with business pragmatism
Track record of driving organizational improvement through process design, automation, documentation, and strategic initiatives

Nice to Have

Kubernetes certifications: CKA (Certified Kubernetes Administrator), CKAD, or CKS (Certified Kubernetes Security Specialist)
Advanced cloud certifications: AWS Solutions Architect Professional, GCP Professional Cloud Architect, Azure Solutions Architect Expert
GPU/AI certifications: NVIDIA DLI certifications, CUDA programming certifications, or similar specialized credentials
Open-source contributions to AI/ML projects, Kubernetes ecosystem, or infrastructure tools
Published technical content: Blog posts, whitepapers, solution guides, or technical documentation demonstrating thought leadership
Speaking experience at technical conferences, meetups, or webinars on topics related to cloud infrastructure, AI/ML, or DevOps
Active participation in technical communities (CNCF, Kubernetes SIGs, AI/ML forums, cloud-native communities)
Experience with observability platforms: Prometheus, Grafana, Datadog, New Relic, or similar monitoring/alerting systems
Multi-cloud or hybrid-cloud architecture experience: Designing solutions that span AWS, GCP, Azure, and on-premises infrastructure
Experience with DigitalOcean or Paperspace products as a user or customer
Database expertise: Experience with both relational (PostgreSQL, MySQL) and NoSQL (MongoDB, Redis) databases at scale
Security & compliance knowledge: Experience with SOC2, HIPAA, GDPR, or other compliance frameworks in cloud environments

Work Arrangement

Hybrid

DigitalOcean is hiring a Senior Cloud Support Engineer

Requirements

Nice to Have

Work Arrangement

Similar Jobs

Machine Learning Infrastructure Engineer

Senior Solutions Architect, Cloud Infrastructure and DevOps

Founding Support Engineer

Senior Site Reliability Engineer (Resilience) - Platform Resilience

Senior DevOps Engineer (m/w/d) im KI-Startup

Senior Private Cloud Consultant mit Schwerpunkt Proxmox/Kubernetes (m/w/d)

Related Articles

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

Remote SRE Jobs: Vanguard’s Cloud Transformation