Postdoctoral Associate | Vertebrate Genome Laboratory Job New York New York USA,Research/Development

Location: New York

Organization Overview

Our Vertebrate Genome Laboratory specializes in high-molecular weight DNA and long-read genomic technologies. The VGL offers both library preparation and sequencing services, including library preparation for high molecular weight gDNA, long amplicons, and full-length transcriptome sequencing (Iso-Seq method), utilizing Pac Bio, Bionano and 10X Chromium technologies.

Overview

The Vertebrate Genome Laboratory (VGL,(Use the "Apply for this Job" box below).) at The Rockefeller University leads international efforts in vertebrate genome sequencing, annotation, and evolutionary analysis in vertebrate and other eukaryotic genomes. We are a core lab of the Vertebrate Genomes Project (VGP, ), and also a main hub of the Earth Bio Genome Project (EBP, https:// ) and a hub for innovation in high-quality and telomere-to-telomere (T2T,)

assembly.

We seek a Postdoctoral Associate with significant Software Engineering and Bioinformatics skills to support a new initiative to improve and streamline genome assembly and curation. This position offers the unique opportunity to work at the intersection of genomics and AI with world-class researchers and large-scale genomic datasets. The candidate will report directly to the Co-Director of the VGL, Dr. Giulio Formenti, and the VGL Director, Dr.

Erich Jarvis.

The Postdoctoral Associate will collaborate closely with VGL's bioinformatics team, the VGL wet lab team, led by Co-Director of the VGL, Jennifer Balacco, as well as the University’s Data Science Platform (“DSP”) team, contributing to software development to improve genome assembly and manual curation, towards the goal achieving T2T genomes for all species. The successful candidate will also have the unique opportunity to work on a team effort that aims to leverage AI, including transformers, CNNs, and foundational models, to resolve the remaining bottlenecks for scalable genome assembly.

Responsibilities

Implement, optimize, and evaluate efficient algorithms and software tools applied to reference genome assembly and curation, and to the analysis of genomic data broadly
Contribute to the development of new tools for identifying and fixing errors in genomes also using AI
Develop and implement efficient and scalable software pipelines for genomic sequence modeling, assembly, curation, and multimodal data integration
Develop and contribute to software tools and maintain clean, well-documented code for long-term sustainability
Manage and preprocess large-scale biological datasets of genomic sequences, both long and short reads
Conduct feature engineering and model training using high-quality genome assemblies and annotations
Collaborate with bioinformaticians on experimental design and validation
Contribute to scientific publications and collaborative research in computer science and genomics, including leading publications particularly on novel methods for genomics
Additional duties and special projects as assigned

Qualifications

REQUIRED QUALIFICATIONS:

Ph.D. in computational biology, bioinformatics, computer science, electrical engineering or a related field
Proficiency in at least one high-performance programming language, e.g. C/C++/Rust
Ability to read and implement complex methods from publications into production-level scalable code
Experience with code repositories such as Git
Proficiency with HPC and Cloud Computing environments, including distributed training (e.g. torchrun, slurm, deepspeed, etc.)
Excellent communication and teamwork skills

PREFERRED QUALIFICATIONS:

Familiarity with file formats for genome sequences (FASTA, FASTQ, BAM) and annotations (GFF, BED), genome browsers, and omics data
Prior experience in genome assembly and curation projects
Knowledge of common bioinformatics pipelines and genome assembly tools, such as Pretext View.
Experience with Software engineering/Dev Ops concepts including containerization (Docker, Kubernetes, Argo Workflow) and CI/CD pipelines
Knowledge of MLOps concepts, including dataset ops, training pipelines, evaluation frameworks, deployment and monitoring.
Understanding of…


Increase/decrease your Search Radius (miles)



Job Posting Language