Lavendo is looking for a Senior AI/ML Specialist Solutions Architect to design and implement scalable AI solutions on a powerful, AI-focused cloud platform leveraging large-scale GPU clusters. In this role, you will work closely with customers, engineering, and product teams to drive technical strategy and ensure successful deployment of machine learning workloads.
What You'll Do
- Architect and optimize distributed training and inference systems for large-scale AI models
- Design and deliver customer-focused solutions that maximize performance and business value
- Lead the transition of ML pipelines from POC to scalable production systems
- Build long-term customer relationships, ensuring satisfaction and alignment with strategic goals
- Create whitepapers, deliver technical presentations, and host webinars to share insights and best practices
- Provide technical leadership and mentor teams on AI infrastructure and deployment strategies
- Collaborate with engineering and product teams to prioritize customer feedback and influence product roadmaps
What We're Looking For
- 5+ years of experience with cloud technologies and infrastructure, ideally in senior MLOps or Solutions Architect roles
- Proven expertise in scaling and optimizing AI workloads across multi-node and multi-GPU environments
- Demonstrated success delivering ML products, scaling from POC to production
- Deep knowledge of ML frameworks like PyTorch and JAX
- Strong background in the NVIDIA HPC ecosystem (CUDA, NCCL, Infiniband)
- Exceptional communication skills to engage both technical teams and business stakeholders
- Legal authorization to work in the United States on a full-time basis without sponsorship
Nice to Have
- Programming Languages: Python, Go, Java, C++
- Infrastructure as Code (IaC): Terraform, Ansible
- Orchestration: Kubernetes (K8s), Slurm
- DevOps Tools: Git, Docker, Helm
- Big Data Frameworks: Spark, Kafka, Hadoop
- Databases: SQL, NoSQL, and vector databases
- ML Frameworks: PyTorch, TensorFlow, JAX, HuggingFace, Scikit-learn
Technical Stack
- PyTorch, JAX, CUDA, NCCL, Infiniband, Python, Go, Java, C++, Terraform, Ansible, Kubernetes, Slurm, Git, Docker, Helm, Spark, Kafka, Hadoop, SQL, NoSQL, vector databases, TensorFlow, HuggingFace, Scikit-learn
Benefits & Compensation
- Competitive compensation: $225,000 to $315,000 per year (negotiable based on experience and location)
- Full medical benefits: 100% company-paid medical, dental, and vision coverage for employees and families
- 401(k) plan with a 4% match program
- Stock options plan
- Flexible remote work environment
- Company-paid short-term, long-term disability, and life insurance coverage
- 20 weeks paid parental leave for primary caregivers, 12 weeks for secondary caregivers
- Up to $85/month for mobile and internet
- Work with state-of-the-art AI and cloud technologies, including the latest NVIDIA GPUs
- Be part of a team that operates one of the most powerful commercially available supercomputers
- Contribute to sustainable AI infrastructure, with energy-efficient data centers that recover waste heat to warm nearby residential buildings
Work Mode
- Remote U.S.
- Flexible remote work environment
We are proud to be an equal opportunity workplace and are committed to equal employment opportunity regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity, or expression, sexual orientation, or any other characteristic protected by applicable federal, state or local law.








