About the Role
Role details below.
Responsibilities
- Design, implement, and optimize large-scale machine learning systems for training
- Improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency
- Partner with research and modeling teams to align systems with algorithmic needs
- Evaluate and apply best practices for distributed training using industry-leading frameworks
- Dive deep into low-level optimization, including custom CUDA or Triton kernels
- Debug, profile, and fine-tune training workflows to unlock new levels of scalability
Additional Information
- The position is open to candidates at all experience levels, including social hires, 2026 and 2027 graduates, and interns
- The role is also advertised as '大模型训练优化工程师(多模态/图像生成)' with technical focus on operator optimization, distributed training, GPU clusters, and training frameworks
- English proficiency is required