About the Role
Role details below.
Responsibilities
- Mine Unstructured Data: Use NLP (ClinicalBERT / BioBERT) to extract anxiety markers and procedural indicators from thousands of anonymized clinical notes.
- Feature Engineering: Clean and join complex longitudinal datasets including medication dosages (Infusion tables), vitals (Flowsheets), and recovery timestamps.
- Build the Model: Develop and validate an XGBoost classification model to output an 'XR Suitability Score' with a target AUC-ROC of >0.75.
- Work directly with the CEO.
Requirements
- Final year Master’s student or PhD candidate in Data Science, AI, Medical Informatics, or Applied Math.
- Expert-level knowledge of the Python data stack (Pandas, Scikit-learn, XGBoost).
- Hands-on experience with the HuggingFace library and transformer models.
- Ability to navigate and join massive relational databases (BigQuery/SQL Server).
- Mindset: You are a 'builder' who is comfortable with the ambiguity of clinical data and the fast pace of a scale-up environment.
Compensation
This internship is not focused on earning a high salary; compensation is primarily in the form of learning, freedom, and responsibility.
Additional Information
- Location: Remote with 2x/week online video call with CEO in USA.
- Compensation is not monetary-focused; the primary value is learning, freedom, and responsibility.
- After completing the project, the intern will have a significant CV booster.
- No other data scientist on the team — must show independence.