Responsibilities
- Lead the creation of a scalable platform for training and deploying large machine learning models.
- Develop comprehensive MLOps practices covering data preprocessing, model versioning, experiment logging, and deployment workflows.
- Build and maintain a graph machine learning framework that simplifies development and improves model scalability.
- Partner with machine learning engineers to enhance training performance, reduce costs, and optimize GPU utilization.
- Improve efficiency of batch data workflows using distributed computing tools and data warehouse technologies.
- Design and manage high-throughput pipelines for constructing and updating massive graph datasets with billions of nodes and edges.