About the Role
Design and implement scalable systems for training machine learning models, improving efficiency, reliability, and developer experience across the training lifecycle.
Responsibilities
- Develop and maintain distributed training infrastructure for deep learning models
- Optimize training pipelines for speed, cost, and scalability
- Collaborate with research and applied teams to productionize training workflows
- Diagnose and resolve performance bottlenecks in training jobs
- Improve tooling for monitoring, debugging, and logging of training runs
- Design systems that support rapid experimentation and iteration
- Ensure training platforms meet reliability and reproducibility standards
- Integrate new hardware and distributed training strategies
- Work closely with ML researchers to understand training requirements
- Implement best practices for data loading and preprocessing at scale
- Support versioning and tracking of models and training configurations
- Contribute to cross-team standards for ML infrastructure
- Enhance security and access controls within training environments
- Automate routine maintenance and scaling operations
- Drive improvements in developer experience for ML engineers
Nice to Have
- Master’s or PhD in computer science or related field
- Experience with MLOps platforms or tools
- Background in computer vision or natural language processing
- Contributions to open-source ML projects
- Experience with hybrid cloud and on-prem training environments
- Knowledge of model parallelism and distributed optimization techniques
Benefits
- Equity offering
- Health and wellness benefits
- Flexible work hours
- Remote work support
- Professional development budget
- Generous paid time off
- Parental leave
- Mental health resources
- Learning and training programs
- Team events and retreats
Compensation
Competitive salary and equity package
Work Arrangement
Remote within Australia
Team
Part of a growing AI/ML team working on core training systems for large-scale models
Our Impact
We build the foundational systems that power intelligent features across a global design platform, enabling millions to create with AI.
Engineering Culture
We value ownership, technical excellence, and collaboration. Engineers are empowered to drive projects from concept to production.
Not available for this position