Responsibilities
- Define and drive multi-year, multi-team technical strategy for machine learning across Affirm, ensuring alignment with company-wide priorities and influencing the roadmaps of partner teams and platforms.
- Lead the design, implementation, and scaling of advanced ML systems, setting the architectural direction for complex, cross-functional initiatives and ensuring systems remain reliable, extensible, and prepared for increasingly sophisticated modeling workloads.
- Partner deeply with ML Platform, product, engineering, and risk leadership to shape long-term modeling capabilities, define new opportunities for ML impact, and guide infrastructure evolution required for next-generation ML methods.
- Provide broad technical leadership across the ML organization, mentoring senior engineers, elevating design and code quality, and spreading ML expertise through documentation, talks, and cross-org guidance.
- Drive clarity and alignment on ambiguous, high-stakes technical decisions, resolving cross-team tensions, balancing competing priorities, and exercising judgment optimized for the broader engineering organization.
- Champion operational and system excellence at the area level, owning the long-term health, availability, and evolution of critical ML systems, and ensuring robust testing, monitoring, and reliability practices across teams.
Requirements
- 10+ years of experience researching, designing, deploying, and operating large-scale, real-time machine learning systems, with a proven record of driving technical innovation and delivering measurable business impact. Relevant PhD can count for up to 2 YOE.
- Experience leading end-to-end ML system design, from data architecture and feature pipelines to model training, evaluation, and production deployment.
- Use of distributed frameworks such as Spark, Ray, or similar large-scale data processing systems.
- Proficiency in Python and ML frameworks, including PyTorch and XGBoost.
- Experience with ML tooling for training orchestration, experimentation, and model monitoring, such as Kubeflow, MLflow, or equivalent internal platforms.
- Strong understanding of representation learning and embedding-based modeling.
- Deep expertise in neural network-based sequence modeling, including architectures such as Transformers, recurrent, or attention-based models, and multi-task learning systems.
- Comfort designing and optimizing models that learn from sequential or temporal event data at scale.
- Deep hands-on experience with large-scale distributed ML infrastructure, including streaming or batch data ingestion, feature stores, feature engineering, training pipelines, model serving and inference infrastructure, monitoring, and automated retraining.
- Proven ability to provide strong technical leadership: defining long-term strategy, guiding research direction, and aligning work across teams.
- Recognized as a trusted expert who can drive clarity and execution even in ambiguous problem spaces.
- Demonstrated exceptional judgment, collaboration, and communication skills, enabling effective technical discussions with engineers, researchers, and executives.
- Mentor senior engineers, foster technical excellence, and contribute to a culture of continuous learning.
- Strong verbal and written communication skills that support effective collaboration across our global engineering organization.
- Equivalent practical experience or a Bachelor’s degree in a related field.