Full-time

Protege is hiring a Senior Machine Learning Researcher / Principal Scientist

About the Role

Protege is hiring a Senior Machine Learning Researcher / Principal Scientist to lead the evaluation and optimization of large-scale datasets used to train state-of-the-art AI models. You'll define what 'high-quality data' means in practice using statistical, computational, and ML-driven methods. This role is central to solving AI's data problem—a generational opportunity.

What You'll Do

  • Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets.
  • Develop frameworks to assess data diversity, duplication, and informativeness.
  • Design statistical approaches to de-risk training datasets.
  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance.
  • Provide leadership on data quality strategy and shape internal best practices.
  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance.
  • Help build data scorecards.
  • Contribute to research and development of tools that automate data preprocessing and validation.

What We're Looking For

  • PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field.
  • Strong understanding of AI model training pipelines, including pre-processing and evaluation.
  • Experience working with large, unstructured datasets, especially text.
  • Background in statistical analysis, bias detection, and data validation.
  • Able to identify high-impact problems and drive independent solutions.

Nice to Have

  • Experience with synthetic data generation or augmentation strategies.
  • Publications or open-source contributions in data-centric AI or related areas.
  • Experience developing evaluation frameworks or performance metrics for training data.
  • Cross-functional collaboration with product, infrastructure, or partnership teams.

Team & Environment

This role is part of the Core Data Team.

Protege is an equal opportunity employer.

Required Skills
Machine LearningDeep LearningPythonResearchStatistical AnalysisModel DevelopmentData AnalysisProblem SolvingCommunicationProject Leadership
Scaling your freelance income?

Invoice multiple clients effortlessly

Managing 3+ international clients? Glopay streamlines everything. One EU company, unlimited invoices, automatic compliance. You just send and get paid.

Unlimited clients & invoices
Multi-currency support
Automated tax compliance
Client portal for easy payments
Scale with Glopay
Trusted by 10,000+ freelancers
About company
Protege

Protege solves the biggest unmet need in AI — getting access to the right training data. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Visit website
Job Details
Category data
Posted 8 months ago