Scientific Data Platform Architect — Antibody Discovery
Listed on 2025-12-01
-
IT/Tech
Data Engineer, Data Scientist
Scientific Data Platform Architect — Antibody Discovery
Remote
Data Science
About Prellis BiologicsAt Prellis we integrate human biology with machine learning. We aim to revolutionized drug discovery by harnessing the power of human immune system with tightly, integrated machine learning to develop next-generation antibody therapeutics with unparalleled speed, precision and safety. We are committed to empowering our pharmaceutical partners with access to the most promising fully human body candidate rapidly identified from the human immune repertoire, enabling them to bring life-changing treatments to patients faster than ever before.
Prellis Biologics is a pre-IPO biotech located in Berkeley CA with a team-oriented, inclusive, and family-friendly culture. Our growing pipeline target high unmet patient needs across therapeutics including metabolic, inflammation, and oncology disease. Prellis has raised funding from top investors, including Celesta, Khosla Ventures, SOSV, & Avidity Partners.
You’ll architect and hands‑on build the end‑to‑end scientific data platform that powers antibody discovery and characterization. This includes a well‑structured Postgre
SQL backbone on AWS, reliable ETL from lab systems (Benchling, Pipe Bio, instruments), and a scientist‑friendly app (Shiny or Python) with built‑in analytics and visualizations. You’ll design for FAIR data (Findable, Accessible, Interoperable, Reusable) and publish AI/ML‑ready datasets with clear lineage and versioning.
- Own the canonical schemas (with selective JSONB), indexing/partitioning, materialized views, and stable entity IDs (samples, sequences, assays, runs).
- Operate RDS/Aurora Postgre
SQL, S3 for raw artifacts, and right‑sized IAM/VPC access; set guardrails for backups, recovery, and monitoring (Cloud Watch).
- Make data Findable (catalog/registry tables, searchable metadata), Accessible (role‑based access, documented APIs/exports), Interoperable (controlled vocabularies, standard formats such as CSV/Parquet, FASTA/VDJ, FCS/SPR), and Reusable (required metadata, units/QC flags, versioned tables).
- Define and enforce data contracts, provenance, and lightweight review checkpoints.
- Build parsers/pipelines for instrument exports (CSV/TSV, FCS, ELISA/SPR/BLI), Pipe Bio repertoire/QC outputs, and Benchling entities via API/webhooks.
- Add validation, unit normalization, schema migrations, and automated checks.
- Create curated analytic views (assay roll‑ups, QC dashboards, lineage), and implement interactive visuals (dose–response fits, sensograms, flow summaries, repertoire plots) with Plotly/Dash, Shiny, Spotfire, Streamlit, or similar.
- Deliver drill‑downs, comparisons across runs/targets, and clean CSV/Excel exports.
- Build and maintain a small Shiny (R/Python) or Python app (FastAPI + Dash/Plotly/Streamlit) that is role‑aware, searchable, and easy for scientists to use; deploy simply (EC2/ECS/Docker).
- Publish feature‑ready Parquet/Arrow datasets (sequence features, develop ability metrics, assay labels like KD/EC50, clonotypes) with dataset versioning, timestamps, and lineage.
- Provide reproducible extracts/snapshots for training, and ingest model predictions/scores back into Postgres and the UI.
- Set patterns and code standards, mentor contributors, review designs, and coordinate with Biology, Analytics, and QA/Compliance.
- Keep cost/performance sane; evolve the roadmap as assays and throughput grow.
- A clear Postgres schema with stable IDs, required metadata, and provenance supporting FAIR discovery.
- Automated ETL for Benchling + Pipe Bio + instruments, with validation and unit normalization.
- A usable app delivering interactive analytics & visualizations scientists rely on daily.
- ML‑ready datasets with documented contracts; backups, monitoring, and a published data dictionary/metadata guide
- Bachelors degree is Computer Science or similar field
- 7+ years building data platforms or complex data products; expert SQL/Postgre
SQL (schema design, optimization,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).