Protege is looking for an Applied ML Researcher to bridge the gap between our rich healthcare data assets and our customers' specific AI model development needs. You will play a key role in ensuring our datasets are well-matched and well-understood by our customers.
What You'll Do
- Conduct feasibility analyses by querying healthcare datasets to assess patient cohort availability based on complex inclusion/exclusion criteria.
- Collaborate directly with customers to understand their use cases and support effective data integration.
- Ensure customers have a clear understanding of the data’s structure, limitations, and strengths.
- Identify gaps in our data offerings and provide insights to our partnerships team on the highest-priority data acquisitions.
- Evaluate potential data partnerships, ensuring the data is high-quality, well-documented, and commercially viable.
What We're Looking For
- Undergraduate degree or an MS/BS plus industry experience in a quantitative field such as mathematics, economics, statistics, biostatistics, bioinformatics, computer science, or data science.
- Proficiency with programming in R, Python, or SQL.
- Hands-on experience working with large-scale healthcare datasets, including one or more of the following: imaging, EHR, genomics, claims, or pathology data.
Nice to Have
- Experience in a customer-facing role.
- Experience with data optimization techniques such as model-based filtering, multimodal data integration, heuristic filtering, and/or target distribution matching.
- Experience applying machine learning or logistic regression techniques to healthcare data.
- Familiarity with third-party data certification or audit processes related to privacy and data quality.
- Ability to think creatively and insightfully about large-scale data problems.
Technical Stack
- R
- Python
- SQL
Team & Environment
We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.





