Responsibilities
- Design and implement advanced computing frameworks for AI-centric serverless environments, supporting tasks such as LLM training, inference, agent orchestration, and reinforcement learning
- Evaluate and enhance full-stack AI system efficiency, focusing on distributed task scheduling, inter-node data movement, and memory management across extensive compute clusters
- Investigate and assess novel technologies in distributed systems, serverless platforms, reinforcement learning methodologies, and AI agents powered by large language models
- Partner with research, product, and infrastructure teams to convert experimental AI and RL concepts into robust, production-grade distributed systems
- Drive technical innovation by contributing to patents, presenting findings, and providing strategic direction in system design
- Monitor advancements in AI and distributed computing ecosystems, including tools like Ray, SkyPilot, vLLM, DeepSpeed, and Mojo, to guide technology adoption
Work Arrangement
Remote (Country) — Canada
Other
- The organization supports an equitable, diverse, and accessible hiring experience for all candidates
- Candidates may request accommodations at any point during the recruitment process
- All applications are evaluated personally by the hiring team without automated filtering
- No artificial intelligence systems are employed in candidate screening or selection