Lead Data Engineer Job Hagerstown area,Maryland USA,IT/Tech

We’relooking for an Identity Data Engineer who is passionate about data quality, intellectually curious about how real-world identities get resolved, and ready to get deep into the details.

You’llwork directly with PII-class data at a low level — examining records, interrogating match logic, and developing a genuine understanding of why our matching engines make the decisions they do.

Our matching engines link consumer and household identity signals across diverse data sources, combining deterministic logic with increasingly AI-assisted probabilistic resolution.

You’llhelp enhance these engines — improving match rates, reducing false positives, and extending asset coverage. As our AI-augmented matching capabilities grow, so will this role. There is a real long-term track here for an engineer who wants to go deep onidentity.

What You’ll Do

Identity Data Engineering

Design, build, and maintain

Snowflake-based pipelines that produce and refresh our core consumer and household identity assets on a regularcadence.

Write complex SQL and Python to transform, deduplicate, and enrich identity data at scale — including direct work with PII fields such as names, addresses, emails, and phone numbers.

Investigate data anomalies and quality issues at a record level, tracing match decisions back to source signals and surfacing root causes.

Build andmaintaindata models thatrepresentconsumer and household identity linkage across multiple input sources.

Matching Engine Enhancement

Partner with senior engineers and data scientists to enhance our AI-assisted matching engine — contributing to feature design, scoring logic, model evaluation, and threshold tuning.

Implement and test matching algorithm improvements — both AI-driven and rule-based — and measure their real impact on precision, recall, and overall asset quality.

Build evaluation tooling: ground-truth comparisons, match quality dashboards, and regression detection across engine versions.

Help drive the evolution of our matching pipeline toward more intelligent, AI-augmented identity resolution, actively using AI tools as part of your day-to-day engineering workflow.

Work cross-functionally with Data Science, Product, and downstream engineering teams to translate identity requirements into reliable, scalable solutions.

Participate in code reviews and architectural discussions; apply engineering best practices across the full delivery lifecycle — design, implement, test, and deploy via CI/CD.

Document data models, pipeline logic, and algorithmdecisions clearlyfor both technical and non-technical audiences.

Support QA processes and on-call responsibilities forproductionidentity asset pipelines.

Build automated validation frameworks and quality tracking pipelines that continuouslymonitorasset health — including data completeness, match consistency, and anomaly detection — and surface results through clear, actionable reporting.

What You Bring

Required

4+years of data engineering or software engineering experience, with a focus on data-intensive systems.

Strong Python skills — you write clean, well-structured code and are comfortable building data processing logic from scratch.

Deep Snowflake fluency: data modeling, complex querying, Streams and Tasks, performance tuning, and preferably Snowpark for Python-native workloads.

Strong SQL fundamentals and comfort working with large, messy, real-world datasets — you know how to interrogate data and know when not to trust it.

Some experience or genuine curiosity around identity matching, deduplication, record linkage, or data quality at scale.

Comfort working with PII-class data responsibly, with awareness of data governance and privacy best practices.

Familiarity with version control (Git), Agile delivery, and CI/CD pipelines.

Comfort applying AI tools in day-to-day engineering work — including prompt engineering, LLM-assisted data processing, and AI-augmented pipeline logic.

Nice to Have

Hands-on exposure to matching algorithms — deterministic, probabilistic, or ML/AI-based — and experience evaluating or tuning their performance.

Experiencebuildingagentic workflowsandworking withMCP servers

Some Java experience; comfort with JVM-based tooling…