About the Role
Lead the design and implementation of machine learning operations frameworks to accelerate AI research. Drive best practices in infrastructure, tooling, and deployment to improve research velocity and model reliability.
Responsibilities
- Architect scalable systems for training and deploying machine learning models
- Develop reproducible workflows for research experimentation
- Collaborate with researchers to transition prototypes into production-ready systems
- Optimize resource utilization across distributed computing environments
- Implement monitoring and observability for ML pipelines
- Establish version control and model tracking standards
- Design automated CI/CD pipelines tailored for ML workflows
- Ensure compliance with security and data governance policies
- Support hybrid cloud and on-prem infrastructure integration
- Lead incident response and root cause analysis for ML systems
- Define performance benchmarks for training and inference
- Promote reusable components and internal tooling
- Coordinate with engineering teams on platform scalability
- Drive documentation and knowledge sharing practices
- Evaluate emerging MLOps tools and frameworks
Nice to Have
- PhD in a technical discipline with research experience
- Direct experience in AI research environments
- Contributions to open-source MLOps projects
- Experience with high-performance computing clusters
- Background in computer-aided design or engineering software domains
Compensation
Competitive salary and benefits package
Work Arrangement
Remote, within the United States or Canada
Team
Part of the AI Research organization focused on advancing machine learning capabilities
Why This Role Matters
This position plays a critical role in enabling cutting-edge AI research by building robust, scalable infrastructure. The work directly impacts the speed and reliability of innovation across the research team.
What to Expect
You will work remotely with a team distributed across the US and Canada. Expect regular collaboration with researchers, engineers, and platform teams to refine tooling and infrastructure. Leadership opportunities are embedded in day-to-day responsibilities.
Not available for this position
