Senior Data Infrastructure Engineer Job Boston area,Massachusetts USA,IT/Tech

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Engineering, Software Engineering, or related field
2-3+ years of experience building production data pipelines and data platforms that support AI/ML models
Strong proficiency in Python, C++ and distributed data processing frameworks
Hands-on experience with AWS services including S3, EC2, Sage Maker, and Glue
Experience designing data systems that support large-scale ML training and experimentation
Knowledge of data governance, access control, and lifecycle management
Experience collaborating with ML, data science, operations, and cloud teams
(Desirable) Experience building pipelines spanning edge devices and cloud systems
(Desirable) Background working with large-scale sensor, image or IoT data
(Desirable) Familiarity with data labeling tools and annotation workflows
(Desirable) Experience implementing dataset versioning, lineage, and reproducibility systems
(Desirable) Understanding of privacy, compliance, or regulated data environments
(Desirable) Experience supporting global, multi-region data platforms

What the job involves

Join Evolv as Senior Data Infrastructure Engineer in the Machine Learning & Sensors organization, responsible for building and operating the scalable, secure, and reliable data pipelines that power our AI/ML research and production systems
In this role, you will own the end‑to‑end data lifecycle—from collection on thousands to millions of edge devices, through cloud ingestion and processing, into a centralized data factory enabling model training, evaluation, and continuous improvement
Data is the backbone of our mission to deliver best‑in‑class AI‑based weapon detection systems
You will ensure that data flows seamlessly across geographies, devices, and cloud systems while meeting strict requirements for quality, privacy, security, and scale
This role is ideal for someone who thrives at the intersection of distributed systems, cloud pipelines, and ML‑driven data needs
In the first 30 days:
Develop a deep understanding of existing edge‑to‑cloud data pipelines and deployment environments
Review current data ingestion flows, governance policies, and cloud infrastructure
Assess pain points in data reliability, quality, and operational scalability
Build relationships with AI/ML, data science, field operations, and cloud engineering teams
Design and prototype data processing pipelines (both cloud and edge)
Within the first three months:
Design and implement improvements to core ingestion, validation, and processing pipelines
Deploy scalable data pipeline with AWS‑based components (S3, EC2, Lambda, Glue, Step Functions, Sage Maker integrations)
Introduce automated validation workflows to detect corruption, missing metadata, or malformed data
Design and implement automated model evaluation, model training and model improvement pipeline to speed up experiments
Partner with field operations to improve data reliability, observability, and coverage across deployments
By the end of the first year:
Own the entire lifecycle of mission‑critical data pipelines supporting AI/ML research and production
Architect next‑generation edge‑to‑cloud data systems that scale across millions of devices
Define and enforce data governance frameworks including retention, access control, privacy, and lineage
Enable ML teams to rapidly experiment through high‑quality, discoverable, versioned datasets
End‑to‑End Data Pipeline Ownership:
Design, build, and maintain both research and production data pipelines spanning edge devices, cloud services, and centralized data platforms
Own the full data lifecycle: collection, ingestion, processing, obfuscation, versioning, access, retention, and retirement
Edge‑to‑Cloud Data Flow:
Develop resilient ingestion pipelines capable of handling variable connectivity and device heterogeneity
Support secure data transfer from the field to cloud storage systems
Collaborate with field ops to enhance data coverage, observability, and operational robustness
Data Quality, Governance & Compliance:
Implement privacy‑preserving transformations and obfuscation pipelines
Build automated cleaning/validation steps to remove duplicates, detect corruption,…