Full-time

Protege is hiring a Senior Machine Learning Researcher / Principal Scientist

About the Role

Protege is hiring a Senior Machine Learning Researcher / Principal Scientist to lead the evaluation and optimization of large-scale datasets used to train state-of-the-art AI models. You'll define what 'high-quality data' means in practice using statistical, computational, and ML-driven methods. This role is central to solving AI's data problem—a generational opportunity.

What You'll Do

  • Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets.
  • Develop frameworks to assess data diversity, duplication, and informativeness.
  • Design statistical approaches to de-risk training datasets.
  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance.
  • Provide leadership on data quality strategy and shape internal best practices.
  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance.
  • Help build data scorecards.
  • Contribute to research and development of tools that automate data preprocessing and validation.

What We're Looking For

  • PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field.
  • Strong understanding of AI model training pipelines, including pre-processing and evaluation.
  • Experience working with large, unstructured datasets, especially text.
  • Background in statistical analysis, bias detection, and data validation.
  • Able to identify high-impact problems and drive independent solutions.

Nice to Have

  • Experience with synthetic data generation or augmentation strategies.
  • Publications or open-source contributions in data-centric AI or related areas.
  • Experience developing evaluation frameworks or performance metrics for training data.
  • Cross-functional collaboration with product, infrastructure, or partnership teams.

Team & Environment

This role is part of the Core Data Team.

Protege is an equal opportunity employer.

Required Skills
Machine LearningDeep LearningPythonResearchStatistical AnalysisModel DevelopmentData AnalysisProblem SolvingCommunicationProject Leadership
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Protege

Protege solves the biggest unmet need in AI — getting access to the right training data. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Visit website
Job Details
Category data
Posted 8 months ago