Data Scientist,Machine Learning in Epidemiology and Patient Data Products Job Lexington area,Massachusetts USA,Software Development

Position: Staff Data Scientist, Machine Learning in Epidemiology and Patient Data Products

Staff Data Scientist, Machine Learning in Epidemiology and Patient Data Products

Lexington, Massachusetts, United States;
Remote

About Us

Valo Health is a human‑centric, AI‑enabled biotechnology company working to make new drugs for patients faster. The company’s Opal Computational Platform transforms drug discovery and development through a unique combination of real‑world data, AI, human translational models and predictive chemistry.

Our talented team of biologists, chemists and engineers, armed with advanced AI/ML tools, work together to break down traditional R&D silos and accelerate the speed and scale of drug discovery and development.

Valo is committed to hiring diverse talent, prioritizing growth and development, fostering an inclusive environment, and creating opportunities to bring together a group of different experiences, backgrounds, and voices to work together. We embrace new ways of learning, solve complex problems and welcome diverse perspectives that can help us advance patient‑centric innovation.

Valo is headquartered in Lexington, MA, with additional offices in New York, NY and Tel Aviv, Israel.

Role Overview

As a Staff Data Scientist, you will be a core member of a team building a powerful computational platform for advancing the discovery and development of new medicines. You will develop machine learning tools for patient data and drive their adoption across teams, under the guidance of epidemiology and biology program leads. You will work with a diverse group of scientists and domain experts, cutting across traditional industry boundaries in an innovative startup environment.

What

You’ll Do

Lead the development of machine learning methods and analyses of patient data with diverse stakeholders, integrating clinical insights into supervised and unsupervised learning approaches and generating patient profiles.
Perform project‑specific hands‑on analysis and modeling of high‑dimensional longitudinal real‑world data, spanning electronic medical records (EHRs), clinical notes, sequencing data, and multi‑omics, using modern data science tools in cloud environments.
Contribute to the design, implementation, and evaluation of innovative machine learning approaches for patient data to provide novel clinical insights.
Embrace scientific uncertainty, curiosity, and creative solutions. Tackle challenges that may not have known solutions or established pathways.
Use technical knowledge and intuition to articulate and break down large problems into solvable pieces, prioritizing critical‑path tasks.
Champion shared coding standards, participate in code reviews, and provide regular updates and input into the work of colleagues.

What You Bring

MS, MPH, or PhD in health data science, biostatistics, or a related quantitative field, with 5 years of experience developing and applying ML methods, including at least 3 years working directly with real‑world patient data. Experience in a biopharmaceutical, epidemiological or biostatistical setting is a plus.
Extensive experience developing and implementing machine learning solutions in healthcare databases, including EHRs, administrative claims, and patient registries. Familiarity with medical coding ontologies and data models (ICD, ATC, LOINC, SNOMED, CPT, HCPCS, OMOP, etc.). Confidence working with highly sparse and high‑dimensional data. Experience processing and mining clinical notes is a plus.
Experience building, maintaining, and operationalizing ML pipelines, and translating model outputs into meaningful insights for diverse audiences.
Broad proficiency across core ML paradigms (supervised, unsupervised, semi‑supervised) and experience with linear and logistic regression, classification and tree‑based methods, clustering and dimensionality‑reduction techniques, and deep learning architectures. Hands‑on experience with representation learning and transformer‑based and other sequence models is a plus.
Strong grounding in key components of the ML development lifecycle, including evaluation metrics, hyperparameter tuning, model selection, feature engineering and selection, model explainability, and MLOps best practices.
Mastery of Python and modern data…