Senior/Staff Scientist, Data Science Berkeley, CA
Listed on 2025-11-28
-
IT/Tech
Data Scientist, Machine Learning/ ML Engineer
At Glyphic Biotechnologies, we plan create the protein revolution for which scientists and researchers have been waiting. We are developing a massively parallel, single-molecule proteome sequencing platform that will transform life science discovery and usher in a new era of insights into human biology and disease. To date, we have raised >$50M from venture partners and non-dilutive grant funding to achieve our vision of next generation proteome sequencing.
What we are looking for in you
Glyphic is seeking a highly motivated and experienced Senior/Staff Data Scientist to assist in the advancement of our cutting‑edge single molecule proteome sequencing platform which has the potential to transform how we understand biology and develop new medicines.
We're looking for a Senior Data Scientist who's excited about solving complex, real‑world problems with cutting‑edge technology. You'll work directly with our CTO and a collaborative team of scientists, engineers, and bioinformaticians who are passionate about pushing the boundaries of what's possible.
This is a hybrid role and with expectations to spend as much as ~20% of your time on‑site with the team in Berkeley, CA ( average) in service of a more complete understanding of Glyphic’s technology and calibration with the on‑site research team. This role will require some flexibility for additional collaboration projects require.
What you’ll do
Data Analysis and Insight Generation :
- Design and implement novel algorithms to analyze proteomics data that no one has ever seen before.
- Develop machine learning models that can extract meaningful insights from complex, noisy biological signals.
- Develop and optimize algorithms for analyzing high‑dimensional chemistry and NGS data, including single cell, spatial data, and LCMS data outputs
- Build models that reveal how parameters and molecular interfaces drive outcomes, including surface interactions and molecule‑target binding.
- Design and execute biostatistical analyses using Python and/or R to uncover significant trends, model experimental outcomes, and inform data‑driven decision‑making.
- Apply machine learning to guide experiment design, identify key parameters, and optimize workflows for efficiency and reproducibility.
- Develop clear, insightful visualizations that make complex, high‑dimensional results understandable and actionable for scientists and stakeholders.
- Help define metrics and visualizations that clarify high‑dimensional relationships for scientists and stakeholders.
- Partner with wet lab, hardware, and software teams to translate experimental goals into computational strategies.
Pipelines and Automation :
- Create ETL pipelines that clean, normalize, and integrate diverse datasets (sequencing reads, LCMS spectra, metadata) into analysis‑ready formats.
- Combine off‑the‑shelf pipelines (basecalling, variant calling, deconvolution) with custom scripts to deliver end‑to‑end solutions.
- Continuously improve throughput and data quality by automating QC steps and integrating feedback from experiments.
- Establish best practices for code quality, testing, and deployment that will scale with our growing team.
What you need
Required :
- PhD in Computer Science, Bioinformatics, Computational Biology, Biostatistics or related field with 4+ (Senior) or 6+ (Staff) years of hands‑on experience.
- Proven ability to model and interpret high‑dimensional datasets with numerous interacting variables, uncovering statistically robust patterns and causal relationships.
- Competency in chemistry data science (e.g., interpreting LCMS data, utilizing deconvolution tools, understanding surface chemistry and molecule‑target interactions).
- Competency in next generation sequencing, including familiarity with multi‑omics, error modeling, and basecalling.
- Expertise in Python and/or R for biostatistical analysis, including data wrangling, statistical modeling, and visualization of high‑dimensional experimental results.
- Experience designing ML models for experimental data and deploying pipelines (Snakemake, Nextflow).
- Familiarity with ML frameworks (PyTorch, Tensor Flow) and data science libraries (pandas, numpy, scipy).
- Experience building automated data pipelines…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).