Responsibilities
- Lead the shift from data-parallel to model-parallel training architectures through scalable system design.
- Orchestrate high-volume training operations across multi-pod infrastructures, optimizing network throughput and reducing inter-process communication delays.
- Investigate and apply improvements to transformer models, including structural changes and efficiency techniques to shorten training duration and lower resource demands.
- Develop and refine low-level model implementations, such as custom Pallas kernels, to fully leverage underlying hardware capabilities.
- Collaborate with cross-functional teams and kernel optimization specialists to jointly develop compiler innovations that boost model execution speed.
Compensation
Full-time base salary range: $174,000–$252,000
Work Arrangement
Full-time
Team
Cross-functional collaboration with engineering and optimization teams
Not specified
