Principal Data Engineer – ML Platforms
Job in
Arlington, Arlington County, Virginia, 22201, USA
Listed on 2026-01-07
Listing for:
Palladian Partners, Inc.
Full Time
position Listed on 2026-01-07
Job specializations:
-
IT/Tech
Data Engineer, AI Engineer, Data Science Manager, Cloud Computing
Job Description & How to Apply Below
Overview
Altarum | Data & AI Center of Excellence (CoE)
Altarum is building the future of data and AI infrastructure for public health. We are hiring a Principal Data Engineer – ML Platforms to design, build, and operationalize modern data and ML platform capabilities that power analytics, evaluation, AI modeling, and interoperability across all Altarum divisions.
- ML Platform Engineering: lakehouse architecture, pipelines, MLOps lifecycle
- Applied ML Enablement: risk scoring, forecasting, Medicaid analytics
- NLP/Generative AI Support: RAG, vectorization, health communications
- Causal ML Operationalization: evaluation modeling workflows
- Responsible/Trusted AI Engineering: model cards, fairness, compliance
- Platform Architecture & Delivery: design and operate modern, cloud-agnostic lakehouse using object storage, SQL/ELT engines, and dbt.
- Build CI/CD pipelines for data, dbt, and model delivery (Git Hub Actions, Git Lab, Azure Dev Ops).
- Implement MLOps systems: MLflow or equivalent, feature stores, model registry, drift detection, automated testing.
- Engineer solutions in AWS Gov Cloud today, with portability to Azure Gov or GCP.
- Use IaC (Terraform, Cloud Formation, Bicep) to automate secure deployments.
- Build scalable ingestion and normalization pipelines for healthcare and public health datasets, including FHIR R4 / US Core (preferred), HL7 v2 (preferred), Medicaid/Medicare claims & encounters (preferred), SDOH & geospatial data (preferred), survey and qualitative data.
- Create reusable connectors, dbt packages, and data contracts for cross-division use.
- Publish clean, conformed, metrics-ready tables for Analytics Engineering and BI teams.
- Support Population Health in turning evaluation and statistical models into pipelines.
- Define SLOs and alerting; instrument lineage & metadata; ensure ≥95% of data tests pass.
- Perform performance and cost tuning (partitioning, storage tiers, autoscaling) with guardrails and dashboards.
- Build production-grade pipelines for risk prediction, forecasting, cost/utilization models, and burden estimation.
- Develop ML-ready feature engineering workflows and support time-series/outbreak detection models.
- Integrate ML assets into standardized deployment workflows.
- Build ingestion and vectorization pipelines for surveys, interviews, and unstructured text.
- Support RAG systems for synthesis, evaluation, and public health guidance.
- Enable secure, controlled-generation environments.
- Translate R/Stata/SAS evaluation code into reusable pipelines.
- Build templates for causal inference workflows (DID, AIPW, CEM, synthetic controls).
- Support operationalization of ARA’s applied research methods at scale.
- Implement Model Card Protocol (MCP) and fairness/explainability tooling (SHAP, LIME).
- Ensure compliance with HIPAA, 42 CFR Part 2, IRB/DUA constraints, and NIST AI RMF standards.
- Enforce privacy-by-design: tokenization, encryption, least-privilege IAM, and VPC isolation.
- Develop runbooks, architecture diagrams, repo templates, and accelerator code.
- Pair with data scientists, analysts, and SMEs to build organizational capability.
- Provide technical guidance for proposals and client engagements.
- Platform skeleton operational: repo templates, CI/CD, dbt project, MLflow registry, tests.
- Two pipelines in production (e.g., FHIR → analytics and claims normalization).
- One end-to-end CoE lighthouse MVP delivered (ingestion → model → metrics → BI).
- Completed playbooks for Gov Cloud deployment, identity/secrets, rollback, and cost control.
- Pipeline reliability meeting SLA/SLO targets.
- ≥95% data tests passing across pipelines.
- MVP dataset onboarding ≤ 4 weeks.
- Reuse of platform assets across ≥3 divisions.
- Cost optimization and budget adherence.
- 7–10+ years in data engineering, ML platform engineering, or cloud data architecture.
- Expert in Python, SQL, dbt, and orchestration tools (Airflow, Glue, Step Functions).
- Deep experience with AWS…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×