AI Data Engineer Job Harrisburg area,Pennsylvania USA,IT/Tech

Integri Chain is the data and application backbone for market access departments of Life Sciences manufacturers. We deliver the data, the applications, and the business process infrastructure for patient access and therapy commercialization. More than 250 manufacturers rely on our ICyte Platform to orchestrate their commercial and government payer contracting, patient services, and distribution channels. ICyte is the first and only platform that unites the financial, operational, and commercial data sets required to support therapy access in the era of specialty and precision medicine.

With ICyte, Life Sciences innovators can digitalize their market access operations, freeing up resources to focus on more data-driven decision support. With ICyte, Life Sciences innovators are digitalizing labor-intensive processes – freeing up their best talent to identify and resolve coverage and availability hurdles and to manage pricing and forecasting complexity.

We are headquartered in Philadelphia, PA (USA), with offices in:
Ambler, PA (USA);
Pune, India; and Medellín, Colombia. For more information, visit , or follow us on Twitter @Integri Chain and Linked In.

This role offers flexibility, but candidates must reside in Pennsylvania, New Jersey, or New York and be within a reasonable travel distance of our Philadelphia office, as regular in-person collaboration is required.

Mission

Join the Data Science team as an AI Data Engineer responsible for building the data foundations that make enterprise AI products accurate, explainable, and scalable. This role will design and implement Snowflake and dbt pipelines from raw source data to curated gold-layer datasets, create semantic models that LLM tools can use reliably, and partner with data science, product, and engineering teams to convert data dictionaries and business definitions into AI-ready data products.

The ideal candidate is a strong data engineer with deep Snowflake/dbt experience and a practical understanding of how semantic layers, ER relationships, denormalized models, and metadata quality influence LLM and agent performance.

Position Overview

Snowflake and dbt engineering: Design, build, optimize, and operate Snowflake pipelines and dbt models across raw, curated, and gold-layer datasets.
AI-ready semantic modeling: Create semantic models, relationships, metrics, dimensions, and curated views that allow LLM tools and agents to answer questions accurately.
Data dictionary-driven delivery: Translate team-defined data dictionaries, business definitions, and source mappings into tested, governed, and reusable data products.
Agent consumption focus: Design datasets for AI agents, natural-language analytics, Snowflake Cortex Analyst, and other LLM-powered tools.
Enterprise data modeling: Balance normalized source models, ER relationships, dimensional models, denormalized consumption layers, and semantic-layer needs.

Key Responsibilities Snowflake, dbt, and Data Pipeline Development

Build reliable data pipelines from raw source data through curated silver layers and business-ready gold layers using Snowflake and dbt.
Develop modular dbt models, tests, documentation, exposures, and lineage-friendly transformation patterns.
Implement incremental processing, snapshots, audit columns, reconciliation, data quality checks, and restartable pipeline patterns.
Optimize Snowflake SQL and dbt workloads for performance, scalability, cost, and maintainability.
Work with orchestration and Dev Ops/SRE teams to support CI/CD, environment promotion, pipeline monitoring, and operational runbooks.

Semantic Models and AI-Ready Data Products

Create Snowflake semantic models and curated views that support accurate natural-language querying through Snowflake Cortex Analyst and related LLM tools.
Translate approved data dictionaries into semantic model dimensions, facts, metrics, synonyms, descriptions, relationships, and business rules.
Design ER relationships and join paths that are explicit, accurate, and easy for semantic-layer tools and AI agents to use.
Create denormalized or consumption-optimized models where appropriate to reduce ambiguity and improve LLM answer quality.
Partner with…