×
Register Here to Apply for Jobs or Post Jobs. X

Principal Data Engineer – ML Platforms

Job in Arlington, Arlington County, Virginia, 22201, USA
Listing for: Palladian Partners, Inc.
Full Time position
Listed on 2026-01-07
Job specializations:
  • IT/Tech
    Data Engineer, AI Engineer, Data Science Manager, Cloud Computing
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Overview

Altarum | Data & AI Center of Excellence (CoE)
Altarum is building the future of data and AI infrastructure for public health. We are hiring a Principal Data Engineer – ML Platforms to design, build, and operationalize modern data and ML platform capabilities that power analytics, evaluation, AI modeling, and interoperability across all Altarum divisions.

What You'll Work On
  • ML Platform Engineering: lakehouse architecture, pipelines, MLOps lifecycle
  • Applied ML Enablement: risk scoring, forecasting, Medicaid analytics
  • NLP/Generative AI Support: RAG, vectorization, health communications
  • Causal ML Operationalization: evaluation modeling workflows
  • Responsible/Trusted AI Engineering: model cards, fairness, compliance
Responsibilities
  • Platform Architecture & Delivery: design and operate modern, cloud-agnostic lakehouse using object storage, SQL/ELT engines, and dbt.
  • Build CI/CD pipelines for data, dbt, and model delivery (Git Hub Actions, Git Lab, Azure Dev Ops).
  • Implement MLOps systems: MLflow or equivalent, feature stores, model registry, drift detection, automated testing.
  • Engineer solutions in AWS Gov Cloud today, with portability to Azure Gov or GCP.
  • Use IaC (Terraform, Cloud Formation, Bicep) to automate secure deployments.
Pipelines & Interoperability
  • Build scalable ingestion and normalization pipelines for healthcare and public health datasets, including FHIR R4 / US Core (preferred), HL7 v2 (preferred), Medicaid/Medicare claims & encounters (preferred), SDOH & geospatial data (preferred), survey and qualitative data.
  • Create reusable connectors, dbt packages, and data contracts for cross-division use.
  • Publish clean, conformed, metrics-ready tables for Analytics Engineering and BI teams.
  • Support Population Health in turning evaluation and statistical models into pipelines.
Data Quality, Reliability & Cost Management
  • Define SLOs and alerting; instrument lineage & metadata; ensure ≥95% of data tests pass.
  • Perform performance and cost tuning (partitioning, storage tiers, autoscaling) with guardrails and dashboards.
Applied ML Enablement & Generative AI
  • Build production-grade pipelines for risk prediction, forecasting, cost/utilization models, and burden estimation.
  • Develop ML-ready feature engineering workflows and support time-series/outbreak detection models.
  • Integrate ML assets into standardized deployment workflows.
  • Build ingestion and vectorization pipelines for surveys, interviews, and unstructured text.
  • Support RAG systems for synthesis, evaluation, and public health guidance.
  • Enable secure, controlled-generation environments.
Causal ML & Evaluation Engineering
  • Translate R/Stata/SAS evaluation code into reusable pipelines.
  • Build templates for causal inference workflows (DID, AIPW, CEM, synthetic controls).
  • Support operationalization of ARA’s applied research methods at scale.
Responsible AI, Security & Compliance
  • Implement Model Card Protocol (MCP) and fairness/explainability tooling (SHAP, LIME).
  • Ensure compliance with HIPAA, 42 CFR Part 2, IRB/DUA constraints, and NIST AI RMF standards.
  • Enforce privacy-by-design: tokenization, encryption, least-privilege IAM, and VPC isolation.
Reuse, Shared-Services, and Enablement
  • Develop runbooks, architecture diagrams, repo templates, and accelerator code.
  • Pair with data scientists, analysts, and SMEs to build organizational capability.
  • Provide technical guidance for proposals and client engagements.
Your First 90 Days
  • Platform skeleton operational: repo templates, CI/CD, dbt project, MLflow registry, tests.
  • Two pipelines in production (e.g., FHIR → analytics and claims normalization).
  • One end-to-end CoE lighthouse MVP delivered (ingestion → model → metrics → BI).
  • Completed playbooks for Gov Cloud deployment, identity/secrets, rollback, and cost control.
Success Metrics (KPIs)
  • Pipeline reliability meeting SLA/SLO targets.
  • ≥95% data tests passing across pipelines.
  • MVP dataset onboarding ≤ 4 weeks.
  • Reuse of platform assets across ≥3 divisions.
  • Cost optimization and budget adherence.
What You'll Bring
  • 7–10+ years in data engineering, ML platform engineering, or cloud data architecture.
  • Expert in Python, SQL, dbt, and orchestration tools (Airflow, Glue, Step Functions).
  • Deep experience with AWS…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary