AI Data Architect Job Exton area,Pennsylvania USA,IT/Tech

Genzeon, an AI and automation company with deep engineering and data expertise, dedicated to serving the healthcare and retail industries. Our platform solutions – including HIP One, Compliance Pro Solutions, and Patient Engagement Solutions – empower organizations to scale innovation and transform outcomes.

Genzeon is a global community of innovators and problem-solvers, with a culture built on inclusion, flexibility, and purpose-driven work. With four global delivery centers, we support providers, payers, Healthtech, and retail organizations worldwide.

Genzeon has an exciting opening for AI Data Architect | Healthcare AI Platform to join our dynamic team.

Exton, PA / Hybrid

0–4 years |

The short version

We run a multi-model AI pipeline that processes 150K Medicare documents/year — faxed PDFs, EDI transactions, FHIR data, clinical notes. You’ll design and build the data architecture that ingests, stores, governs, and serves all of it to AI models and clinical reviewers. On-prem GPUs, hybrid cloud, HIPAA compliance. This is the real thing.

What you’ll do

Design the end-to-end data architecture for a healthcare AI platform — ingestion, storage, processing, serving, governance
Build pipelines for heterogeneous healthcare data: faxed PDFs, X12 EDI (835/837/278), FHIR R4, HL7v2, CMS files, unstructured clinical notes
Architect the data lake/lakehouse layer (Apache Iceberg, MinIO, DuckDB, Postgre

SQL/pgvector)
Design the embedding and vector storage layer that powers RAG — chunking, indexing, retrieval optimization
Build data lineage tracking from source document to AI decision
Implement HIPAA/HITRUST data governance — encryption, access controls, audit logging, PHI handling
Monitor data quality across the pipeline — schema drift, completeness, freshness, anomalies
Optimize for hybrid infrastructure: on-prem GPUs (RTX 50U0, L40S), NAS, Azure Gov Cloud, Azure Commercial

What you need

A data pipeline you’ve built that ran in production (we’ll ask about it).
SQL fluency and Python proficiency.
Experience with at least one of:
Spark, dbt, Airflow, Dagster, Prefect.
Hands‑on work with unstructured or semi‑structured data — PDFs, images, OCR outputs, free text.
Practical understanding of vector databases, embeddings, and how RAG systems consume data.
Comfort with on-premises infrastructure, not just managed cloud services.
Data quality and governance as instincts, not afterthoughts.

Strong signals

Healthcare data formats (X12 EDI, FHIR, HL7, CCD/C-CDA).
Apache Iceberg, Delta Lake, or modern table formats.
pgvector, Pinecone, Weaviate, or similar vector stores.
DuckDB or embedded analytical engines.
HIPAA technical safeguards implementation.

#J-18808-Ljbffr