Google is hiring a Senior Software Engineer to join a horizontal machine learning infrastructure and efficiency team. You will support the training framework for our foundation recommender model and its customers, with a mission to accelerate product innovations through ML for recommendations and user modeling.
What You'll Do
- Architect and implement the transition from data-parallel to model-parallel training paradigms.
- Design and manage large-scale training runs across multi-pod environments, maximizing data center network bandwidth and minimizing communication bottlenecks.
- Research and integrate transformer model optimizations and novel architectural variants to reduce training time and resource consumption.
- Write and optimize low-level model code, including custom pallas kernels, to maximize performance out of the hardware.
- Work cross-functionally with the team and the Kernel optimization team to co-design and implement compiler-level optimizations that accelerate model execution.
What We're Looking For
- Bachelor’s degree or equivalent practical experience.
- 5 years of experience programming in Python or C++.
- 3 years of experience with ML infrastructure (e.g., model deployment, model evaluation, optimization, data processing, debugging).
Nice to Have
- Master’s degree or PhD in Computer Science, Machine Learning, Computer Engineering, or a related technical field.
- Experience scaling machine learning models (e.g., Large Language Models (LLMs) or foundation models), managing the complexities of transitioning architectures from data-parallel to model, tensor, pipeline-parallel configurations, or related fields.
- Experience with deep learning frameworks (e.g., JAX, PyTorch, or TensorFlow), including a track record of contributing to or modifying their core internals to support novel and emerging use cases.
- Experience with co-designing hardware-aware optimizations to accelerate model execution.
- Knowledge of machine learning compilers (e.g., Accelerated Linear Algebra (XLA) or Multi-Level Intermediate Representation (MLIR)).
Technical Stack
- Python, C++
- JAX, PyTorch, TensorFlow
- XLA, MLIR
Team & Environment
This role is part of the RecML team within Core ML's Applied ML organization.
Benefits & Compensation
- Base salary range: $174,000-$252,000
- Equity included
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.





