AI Data Architect
Listed on 2026-06-02
-
IT/Tech
Data Engineer, AI Engineer
Genzeon, an AI and automation company with deep engineering and data expertise, dedicated to serving the healthcare and retail industries. Our platform solutions – including HIP One, Compliance Pro Solutions, and Patient Engagement Solutions – empower organizations to scale innovation and transform outcomes.
Genzeon is a global community of innovators and problem-solvers, with a culture built on inclusion, flexibility, and purpose-driven work. With four global delivery centers, we support providers, payers, Healthtech, and retail organizations worldwide.
Genzeon has an exciting opening for AI Data Architect | Healthcare AI Platform to join our dynamic team.
Exton, PA / Hybrid
0–4 years |
The short versionWe run a multi-model AI pipeline that processes 150K Medicare documents/year — faxed PDFs, EDI transactions, FHIR data, clinical notes. You’ll design and build the data architecture that ingests, stores, governs, and serves all of it to AI models and clinical reviewers. On-prem GPUs, hybrid cloud, HIPAA compliance. This is the real thing.
What you’ll do- Design the end-to-end data architecture for a healthcare AI platform — ingestion, storage, processing, serving, governance
- Build pipelines for heterogeneous healthcare data: faxed PDFs, X12 EDI (835/837/278), FHIR R4, HL7v2, CMS files, unstructured clinical notes
- Architect the data lake/lakehouse layer (Apache Iceberg, MinIO, DuckDB, Postgre
SQL/pgvector) - Design the embedding and vector storage layer that powers RAG — chunking, indexing, retrieval optimization
- Build data lineage tracking from source document to AI decision
- Implement HIPAA/HITRUST data governance — encryption, access controls, audit logging, PHI handling
- Monitor data quality across the pipeline — schema drift, completeness, freshness, anomalies
- Optimize for hybrid infrastructure: on-prem GPUs (RTX 50U0, L40S), NAS, Azure Gov Cloud, Azure Commercial
- A data pipeline you’ve built that ran in production (we’ll ask about it).
- SQL fluency and Python proficiency.
- Experience with at least one of:
Spark, dbt, Airflow, Dagster, Prefect. - Hands‑on work with unstructured or semi‑structured data — PDFs, images, OCR outputs, free text.
- Practical understanding of vector databases, embeddings, and how RAG systems consume data.
- Comfort with on-premises infrastructure, not just managed cloud services.
- Data quality and governance as instincts, not afterthoughts.
- Healthcare data formats (X12 EDI, FHIR, HL7, CCD/C-CDA).
- Apache Iceberg, Delta Lake, or modern table formats.
- pgvector, Pinecone, Weaviate, or similar vector stores.
- DuckDB or embedded analytical engines.
- HIPAA technical safeguards implementation.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).