Postdoctoral Associate | Vertebrate Genome Laboratory
Listed on 2026-01-12
-
Research/Development
Data Scientist, Research Scientist
Organization Overview
Our Vertebrate Genome Laboratory specializes in high-molecular weight DNA and long-read genomic technologies. The VGL offers both library preparation and sequencing services, including library preparation for high molecular weight gDNA, long amplicons, and full-length transcriptome sequencing (Iso-Seq method), utilizing Pac Bio, Bionano and 10X Chromium technologies.
OverviewThe Vertebrate Genome Laboratory (VGL,(Use the "Apply for this Job" box below).) at The Rockefeller University leads international efforts in vertebrate genome sequencing, annotation, and evolutionary analysis in vertebrate and other eukaryotic genomes. We are a core lab of the Vertebrate Genomes Project (VGP, ), and also a main hub of the Earth Bio Genome Project (EBP, https:// ) and a hub for innovation in high-quality and telomere-to-telomere (T2T,)
assembly.
We seek a Postdoctoral Associate with significant Software Engineering and Bioinformatics skills to support a new initiative to improve and streamline genome assembly and curation. This position offers the unique opportunity to work at the intersection of genomics and AI with world-class researchers and large-scale genomic datasets. The candidate will report directly to the Co-Director of the VGL, Dr. Giulio Formenti, and the VGL Director, Dr.
Erich Jarvis.
The Postdoctoral Associate will collaborate closely with VGL's bioinformatics team, the VGL wet lab team, led by Co-Director of the VGL, Jennifer Balacco, as well as the University’s Data Science Platform (“DSP”) team, contributing to software development to improve genome assembly and manual curation, towards the goal achieving T2T genomes for all species. The successful candidate will also have the unique opportunity to work on a team effort that aims to leverage AI, including transformers, CNNs, and foundational models, to resolve the remaining bottlenecks for scalable genome assembly.
Responsibilities- Implement, optimize, and evaluate efficient algorithms and software tools applied to reference genome assembly and curation, and to the analysis of genomic data broadly
- Contribute to the development of new tools for identifying and fixing errors in genomes also using AI
- Develop and implement efficient and scalable software pipelines for genomic sequence modeling, assembly, curation, and multimodal data integration
- Develop and contribute to software tools and maintain clean, well-documented code for long-term sustainability
- Manage and preprocess large-scale biological datasets of genomic sequences, both long and short reads
- Conduct feature engineering and model training using high-quality genome assemblies and annotations
- Collaborate with bioinformaticians on experimental design and validation
- Contribute to scientific publications and collaborative research in computer science and genomics, including leading publications particularly on novel methods for genomics
- Additional duties and special projects as assigned
REQUIRED QUALIFICATIONS:
- Ph.D. in computational biology, bioinformatics, computer science, electrical engineering or a related field
- Proficiency in at least one high-performance programming language, e.g. C/C++/Rust
- Ability to read and implement complex methods from publications into production-level scalable code
- Experience with code repositories such as Git
- Proficiency with HPC and Cloud Computing environments, including distributed training (e.g. torchrun, slurm, deepspeed, etc.)
- Excellent communication and teamwork skills
PREFERRED QUALIFICATIONS:
- Familiarity with file formats for genome sequences (FASTA, FASTQ, BAM) and annotations (GFF, BED), genome browsers, and omics data
- Prior experience in genome assembly and curation projects
- Knowledge of common bioinformatics pipelines and genome assembly tools, such as Pretext View.
- Experience with Software engineering/Dev Ops concepts including containerization (Docker, Kubernetes, Argo Workflow) and CI/CD pipelines
- Knowledge of MLOps concepts, including dataset ops, training pipelines, evaluation frameworks, deployment and monitoring.
- Understanding of…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).