Postdoctoral Fellow in Biostatistics & Health Data Science
Listed on 2026-06-22
-
IT/Tech
Data Scientist, AI Engineer (Applied/Software), Machine Learning/ ML Engineer, Data Analyst
Postdoctoral Fellow in Biostatistics & Health Data Science
Indiana University is an equal opportunity employer and provider of ADA services and prohibits discrimination in hiring.
Research Context & Opportunity
Modern healthcare increasingly depends on integrating data across hospitals, registries, cohorts, and public health systems. Yet semantic heterogeneity—differences in terminology, structure, and logic—remains a central barrier to reusability, interoperability, and reproducibility.
This postdoctoral position addresses a fundamental and timely research question:
How can Large Language Models (LLMs) and intelligent agents support transparent, scalable, and auditable clinical data harmonization?
We are particularly interested in:
- LLM‑driven systems for aligning real‑world health data to standards such as OMOP CDM, FHIR, and UMLS
- Agent‑based workflows that explain, refine, and adapt semantic mappings over time
- Hybrid architectures that combine knowledge‑grounded reasoning with flexible machine learning
- Tools that reduce manual burden while preserving traceability and clinical interpretability
This position offers the opportunity to publish novel methods, work with real messy multi‑source data, and contribute to infrastructure supporting population‑level research and health equity. The postdoctoral fellow will be based in the Department of Biostatistics and Health Data Science at Indiana University School of Medicine, in close collaboration with the Regenstrief Institute.
Responsibilities- Design and implement LLM‑based methods for clinical data harmonization, semantic normalization, and ontology alignment
- Develop multi‑agent or retrieval‑augmented generation workflows for schema matching and terminology mapping
- Collaborate with national and multi‑institutional initiatives in data integration and standardization
- Support open‑source tooling, reproducible pipelines, and standards‑based approaches (OMOP, FHIR, UMLS)
- Lead or support manuscript preparation and dissemination at top informatics and AI venues
- Contribute to grant development and proposal writing
Required Qualifications:
- Ph.D. (by start date) in Computer Science, Biomedical Informatics, Health Data Science, Biostatistics, or a closely related area.
- Strong machine‑learning/deep‑learning foundation plus expertise in at least one of: multimodal learning, time‑series modeling, or NLP.
- Demonstrated working experience with healthcare data (e.g., EHR, clinical text, imaging, omics).
- Proficiency in Python and ML tooling (PyTorch, scikit‑learn), version control (Git), and experiment tracking (e.g., Weights & Biases).
- Excellent written and oral communication skills, and ability to collaborate with multidisciplinary teams.
Preferred Qualifications:
- Experience with concept normalization, ontology mapping, or schema alignment
- Familiarity with LLM agents, tool‑augmented reasoning, or hybrid rules+LLM systems
- Record of publications in relevant domains (informatics, machine learning, AI, knowledge representation)
- Experience with multi‑site data harmonization or federated data environments
- A collaborative environment at the intersection of real‑world data, applied AI, and translational science
- Opportunities to work across academic, clinical, and public health settings
- Mentorship and support toward independent research or career development in academia or industry
- Competitive salary and benefits through Indiana University
- A culture that values both scientific innovation and practical impact
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).