Full-time

Protege is hiring a Senior Applied Research Scientist

About the Role

Protege is looking for a Senior Applied Research Scientist to join our Core Data Team. You will play a leading role in defining what 'high-quality data' means in practice for training state-of-the-art AI models. Your work will focus on using statistical, computational, and machine learning methods to ensure our datasets are diverse, representative, and high-impact.

What You'll Do

  • Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets.
  • Develop frameworks to assess data diversity, duplication, and informativeness.
  • Design statistical approaches to de-risk training datasets.
  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance.
  • Provide leadership on data quality strategy and shape internal best practices.
  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance.
  • Help build data scorecards.
  • Contribute to research and development of tools that automate data preprocessing and validation.

What We're Looking For

  • A PhD or an equivalent Master's Degree + 4+ years of industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field.
  • Strong understanding of AI model training pipelines, including pre-processing and evaluation.
  • Experience working with large, unstructured datasets, especially text.
  • Background in statistical analysis, bias detection, and data validation.
  • Able to identify high-impact problems and drive independent solutions.

Nice to Have

  • Experience with synthetic data generation or augmentation strategies.
  • Publications or open-source contributions in data-centric AI or related areas.
  • Experience developing evaluation frameworks or performance metrics for training data.
  • Cross-functional collaboration with product, infrastructure, or partnership teams.

Team & Environment

You'll be a senior member of the Core Data Team, working within a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

Required Skills
Machine LearningDeep LearningPythonPyTorchTensorFlowNLPComputer VisionStatistical ModelingResearchData AnalysisModel DeploymentA/B TestingExperimentationCommunicationMentoring
Starting a business in Thailand?

Company registration done right

Foreign ownership rules, licenses, tax registration — Thai business setup has many moving parts. SVBL guides you through every step with full legal compliance.

Company registration & structure
Foreign ownership solutions
License & tax registration
BOI promotion eligibility
Start your business
100% foreign ownership possible
About company
Protege

Protege solves the biggest unmet need in AI — getting access to the right training data. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Visit website
Job Details
Category data
Posted a month ago