Responsibilities
- Create and implement statistical and machine learning techniques to process and enhance massive unstructured datasets
- Build systems to measure data variety, redundancy, and informational value
- Develop statistical methodologies to reduce risks in training data composition
- Work closely with modeling teams to detect data limitations and improve dataset effectiveness
- Demonstrate experience collaborating across large foundational model projects and early-stage startups
- Lead initiatives in data quality planning and define internal standards for best practices
- Assess third-party datasets for potential integration, prioritizing scalability, accuracy, and impact on model outcomes
- Support the creation of data scorecards to track quality and performance metrics
- Assist in researching and developing automated tools for data preprocessing and validation
