San Francisco, California Remote (Country) Full-time

Lavendo is hiring a Senior AI/ML Specialist Solutions Architect (AI Infra & Cloud)

About the Role

Lavendo is looking for a Senior AI/ML Specialist Solutions Architect to design and implement scalable AI solutions on a powerful, AI-focused cloud platform leveraging large-scale GPU clusters. In this role, you will work closely with customers, engineering, and product teams to drive technical strategy and ensure successful deployment of machine learning workloads.

What You'll Do

Architect and optimize distributed training and inference systems for large-scale AI models
Design and deliver customer-focused solutions that maximize performance and business value
Lead the transition of ML pipelines from POC to scalable production systems
Build long-term customer relationships, ensuring satisfaction and alignment with strategic goals
Create whitepapers, deliver technical presentations, and host webinars to share insights and best practices
Provide technical leadership and mentor teams on AI infrastructure and deployment strategies
Collaborate with engineering and product teams to prioritize customer feedback and influence product roadmaps

What We're Looking For

5+ years of experience with cloud technologies and infrastructure, ideally in senior MLOps or Solutions Architect roles
Proven expertise in scaling and optimizing AI workloads across multi-node and multi-GPU environments
Demonstrated success delivering ML products, scaling from POC to production
Deep knowledge of ML frameworks like PyTorch and JAX
Strong background in the NVIDIA HPC ecosystem (CUDA, NCCL, Infiniband)
Exceptional communication skills to engage both technical teams and business stakeholders
Legal authorization to work in the United States on a full-time basis without sponsorship

Nice to Have

Programming Languages: Python, Go, Java, C++
Infrastructure as Code (IaC): Terraform, Ansible
Orchestration: Kubernetes (K8s), Slurm
DevOps Tools: Git, Docker, Helm
Big Data Frameworks: Spark, Kafka, Hadoop
Databases: SQL, NoSQL, and vector databases
ML Frameworks: PyTorch, TensorFlow, JAX, HuggingFace, Scikit-learn

Technical Stack

PyTorch, JAX, CUDA, NCCL, Infiniband, Python, Go, Java, C++, Terraform, Ansible, Kubernetes, Slurm, Git, Docker, Helm, Spark, Kafka, Hadoop, SQL, NoSQL, vector databases, TensorFlow, HuggingFace, Scikit-learn

Benefits & Compensation

Competitive compensation: $225,000 to $315,000 per year (negotiable based on experience and location)
Full medical benefits: 100% company-paid medical, dental, and vision coverage for employees and families
401(k) plan with a 4% match program
Stock options plan
Flexible remote work environment
Company-paid short-term, long-term disability, and life insurance coverage
20 weeks paid parental leave for primary caregivers, 12 weeks for secondary caregivers
Up to $85/month for mobile and internet
Work with state-of-the-art AI and cloud technologies, including the latest NVIDIA GPUs
Be part of a team that operates one of the most powerful commercially available supercomputers
Contribute to sustainable AI infrastructure, with energy-efficient data centers that recover waste heat to warm nearby residential buildings

Work Mode

Remote U.S.
Flexible remote work environment

We are proud to be an equal opportunity workplace and are committed to equal employment opportunity regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity, or expression, sexual orientation, or any other characteristic protected by applicable federal, state or local law.

Required Skills

PyTorchJAXCUDANCCLInfinibandPythonGoJavaC++TerraformMLOpsAI/MLCloud InfrastructureNVIDIA HPCMulti-GPU Optimization PyTorchJAXCUDANCCLInfinibandPythonGoJavaC++TerraformMLOpsAI/MLCloud InfrastructureNVIDIA HPCMulti-GPU Optimization

Ready to relocate and code from paradise?

Thailand or Vietnam — your office, your rules

Iglu offers relocation to Bangkok, Chiang Mai, Ho Chi Minh City, or Hong Kong. Full employment, legal setup, and a community of 200+ digital professionals.

Relocation to 5 countries

Full legal work setup

Developer community access

Work-life balance culture

Explore locations

Relocation support included

About company

Building AI-centric cloud infrastructure that combines large GPU clusters, high-speed networks, and cloud-native tooling into a platform used by enterprises, startups, and research teams. The goal is to enable serious AI and simulation workloads without requiring customers to build their own supercomputers.

All jobs at Lavendo Visit website

Job Details

Category infrastructure

Posted a month ago

Similar Jobs

Other opportunities you might be interested in

Senior AI/ML Engineer

Insight Enterprises

Gurugram, Haryana, India Remote (Global)

Senior ML/AI Engineer

Equip Health

Remote (Global)

Senior AI Engineer

IMO Health

Houston or Chicago or Rosemont

Senior Software Engineer, AI/ML Platform

Socure

Senior IA/ML Engineer

Plain Concepts

Spain Hybrid

Senior ML Solutions Architect - AI Studio

Nebius

Hybrid

Insights related to this role

Data center with server racks and a technician working, representing AI agent infrastructure jobs in scalable browser cloud environments

job-search

AI Agent Infrastructure Jobs: TestMu Scales Browser Cloud

TestMu AI has launched its Browser Cloud to solve infrastructure bottlenecks in AI agent deployment. The platform supports over 1.5 billion tests annually and opens new remote AI agent infrastructure jobs across Europe and beyond.

3 min 4 days ago

Senior tech professional working remotely at home, demonstrating resilience and focus in remote engineering leadership jobs after a career transition.

career-growth

Remote Engineering Leadership Jobs: A Layoff's Silver Lining

Laid off from Amazon in 2025 after 11.5 years, Hemant Virmani is rebuilding his career with AI upskilling and a focus on remote engineering leadership roles. His journey highlights resilience, health, and strategic reinvention in a competitive tech job market.

4 min

Developer working remotely at night, illustrating the evolution of AI-resilient engineering careers in a changing tech landscape.