Valohealth is looking for a Data Scientist, RWD to be a core member of a team building a computational platform for drug discovery. In this role, you will help curate and leverage real-world data (e.g., electronic medical record databases) to support research hypothesis studies and machine learning pipelines.
What You'll Do
- Work in close partnership with a cross-functional data science team to optimize patient data utilization and ensure high quality datasets for research and development.
- Support data harmonization standards, curation strategy, and mappings of clinical concepts from electronic medical record and biobank registry databases.
- Break down business and research requirements into actionable steps and lead the execution of data analysis or feasibility studies.
- Contribute to preparing deliverables for internal and external stakeholders.
- Support the development and maintenance of user-driven tools to generalize data processes and automate data transformation in real-world data studies and machine learning projects.
- Autonomously perform high-quality and reproducible analysis within a cloud environment using R or Python.
- Be a dynamic team member, championing shared coding standards, participating in code review, and providing regular updates.
What We're Looking For
- Bachelor’s with 3+ years of patient data experience or MS with 1+ year with a quantitative focus (public health, health economics, biostatistics) or in a quantitative field (computer science, statistics, computational biology, biomedical engineering).
- 1+ year of experience querying and curating structured and unstructured electronic health record data.
- Demonstrated ability to execute robust analytical strategies using health care databases including electronic health records, administrative claims databases, and/or patient registries.
- Must have experience conducting data manipulation, analysis, and visualization in Python and/or R programming languages.
- Exceptional time management, ability to prioritize multiple tasks simultaneously, and deliver products on time.
Nice to Have
- Knowledge of medical coding ontologies and data models for real-world data types in US and globally (ICD, ATC, LOINC, SNOMED, CPT, HCPCS, etc).
- Familiarity and/or experience with Spark (pyspark) including performance optimization.
- Familiarity and/or experience with cloud computing (AWS), Linux environments, and shell scripting.
- Familiarity with cloud analytics platforms (e.g., Snowflake) for developing ETL pipelines or dashboard visualizations (e.g., Streamlit).
- Experience implementing machine learning models using real world clinical data and/or translating output into meaningful insights for diverse audiences.
- Familiarity and/or experience in common clinical practices, human physiology, or epidemiology of cardiovascular, metabolic, and renal disease areas.
- Familiarity with or exposure to traditional drug discovery and development processes and approaches.
Technical Stack
- Python, R, Spark (pyspark), AWS, Linux, Shell scripting, Snowflake, Streamlit
Team & Environment
You will be a core member of a team of data scientists and data engineers.
Benefits & Compensation
- Compensation: $127,000 - $164,000 USD
Work Mode
This is a remote position open to candidates globally.
Valohealth is committed to hiring diverse talent, prioritizing growth and development, fostering an inclusive environment, embracing new ways of learning, solving complex problems, and welcoming diverse perspectives.



