Postdoctoral Research Associate, Data Readiness
Listed on 2026-02-10
-
IT/Tech
Data Scientist, AI Engineer, Data Analyst, Machine Learning/ ML Engineer
Select how often (in days) to receive an alert:
Postdoctoral Research Associate, Data ReadinessThe Workflows and Ecosystem Services (WES) group under the Advanced Technology Section (ATS) of the National Center for Computational Sciences (NCCS) is seeking a postdoctoral research associate to advance the state of scientific AI by addressing cross-cutting challenges in data readiness for AI to enable scalable, reproducible AI workflows on leadership-class systems. This position focuses on researching, designing, and deploying innovative data pipelines and readiness frameworks to tackle obstacles such as data heterogeneity, scalability bottlenecks, privacy compliance, reproducibility, and interoperability across scientific domains.
By improving data readiness processes, this role will amplify the potential of AI-driven discovery in areas such as high energy physics, fusion research, life sciences, and materials science. Furthermore, these efforts to enhance data readiness for AI workflows may play a significant role in contributing to the goals of the 2025 Genesis Mission, which seeks to accelerate scientific discovery through the integration of AI-enabled solutions.
NCCS operates the Frontier exascale supercomputer and world-class HPC infrastructure, giving you access to resources that enable impactful, facility-scale innovation. If you re passionate about creating solutions that empower AI at scale, we encourage you to apply and help shape the future of scientific AI.
Focus Areas:
- Cross-Domain Interoperability
:
Develop common readiness templates, standardized metadata models, and APIs to enable seamless integration across diverse scientific domains. - Scalability of Preprocessing Pipelines
:
Design and implement automated, parallel preprocessing workflows capable of handling multi-petabyte datasets efficiently while reducing throughput bottlenecks. - Data Scarcity and Quality Dynamics
:
Investigate methods for addressing sparse labels, non-standard metadata, and imbalanced datasets to improve AI training robustness across scientific domains. - Privacy and Compliance Integration
:
Develop privacy-preserving preprocessing pipelines that operate under stringent regulations (such as HIPAA, CUI, ITAR) while maintaining scalability and secure sharing mechanisms such as federated learning. - Provenance and Reproducibility Frameworks
:
Build systems that enable detailed provenance tracking, schema validation, and auditable workflows to ensure trustworthy and reproducible AI practices. - Heterogeneous Data Integration
:
Address challenges in reconciling experimental, simulation, and observational datasets with varying resolutions, data fidelity, and sampling rates. - Intelligent Sampling for Federated Learning
:
Investigating frameworks (such as SICKLE) for intelligently sampling cross-facility extreme-scale data to enhance federated learning workflows with platforms like APPFL and Omni Fed.
Major
Duties and Responsibilities:
- Conduct and publish original research focused on data readiness methodologies and frameworks for scalable AI applications across fluid dynamics, fusion, materials, life sciences, and other strategic domains.
- Investigate novel approaches for balancing efficient I/O, interoperability, and scientific validity in AI-ready datasets.
- Design, prototype, and optimize preprocessing pipelines using HPC resources, targeting scalable execution and automation.
- Collaborate with domain scientists to integrate pipelines into end-to-end AI workflows specific to scientific domains.
- Publish research outcomes in peer-reviewed journals and conference venues, setting benchmarks and proposing methodologies for cross-disciplinary readiness challenges.
- Aid in the development and adoption of open standards for scientific dataset processing, including contributing to open-source tools.
- Mentor interns, students, and peers in cross-domain data readiness approaches.
- Present findings at technical workshops, scientific meetings, and in outreach efforts to improve awareness around the importance of data readiness for scientific AI.
Basic Qualifications:
- Ph.D. earned in Computer Science, Data Science, Computational Science, a scientific domain relevant to AI (e.g., physics, biology, chemistry, climate), or a closely related field (within the last 5 years or near completion).
- Demonstrated expertise in data preprocessing pipelines, AI-ready dataset design, or scientific workflows in HPC environments.
- Proven experience with modern data frameworks (e.g., PyTorch, Tensor Flow), scalable I/O solutions (e.g., HDF5, ADIOS2), and distributed computing tools relevant to data preparation.
- Evidence of ability to conduct independent research and publish in peer-reviewed venues.
Preferred Qualifications:
- Hands-on experience prototyping and scaling data pipelines in HPC environments (Frontier-scale or similar).
- Strong familiarity with domain-specific formats such as NetCDF, CSV/Parquet, FASTA/MMCIF, or graph-based encodings in materials and molecular AI.
- Familiarity with frameworks for…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).