More jobs:
Senior AI Data Engineer
Job in
Menlo Park, San Mateo County, California, 94029, USA
Listed on 2026-05-16
Listing for:
Intelliswift - An LTTS Company
Full Time
position Listed on 2026-05-16
Job specializations:
-
IT/Tech
Data Engineering, Data Scientist, Machine Learning/ ML Engineer, AI Engineer (Applied/Software)
Job Description & How to Apply Below
Duration: 7 months (with potential for extension)
As a Senior AI Data Engineer, you will design and operate end‑to‑end pipelines that not only move and transform data, but enrich it using ML models such as classifiers, embedding models, and large language models. The role sits at the intersection of data engineering and ML systems, requiring strong systems thinking around throughput, retries, async execution, and capacity management.
You will work closely with engineers and researchers to support image generation and evaluation workflows, contributing directly to data quality, model performance, and scalability.
Required Skills & Experience- Strong data engineering expertise, including advanced SQL, complex query optimization, and production pipeline orchestration (e.g., Airflow or equivalent)
- Calling inference endpoints
- Managing batching and throughput
- Handling failures and retries at scale
- Experience operating large-scale production pipelines with high reliability and performance requirements.
- Proficiency using AI‑assisted coding tools to accelerate development, debugging, and code reviews.
- Strong communication skills and ability to collaborate with engineers, researchers, and cross‑functional teams.
- Experience working with embeddings and vector search, including storage, indexing, and similarity queries.
- Familiarity with content understanding models, such as image classification, OCR, safety or quality scoring.
- Experience using LLMs for data workflows, including automated annotation, data cleaning, or evaluation tasks.
- Knowledge of generative AI systems, particularly image generation and corresponding evaluation metrics.
- Background working in data engineering, ML engineering, or hybrid roles that support model training or evaluation.
- AI‑Augmented Data Pipelines:
Design and maintain large‑scale data pipelines (up to billions of records/images) that combine SQL-based transformations with ML model inference for data cleaning, labeling, and enrichment. - Remote Inference Orchestration:
Build and own systems that orchestrate remote model inference within pipelines, including batching, async execution, retries, fallback logic, and graceful degradation under load. - Feature & Embedding Pipelines:
Develop scalable pipelines to generate, store, validate, and serve vector embeddings. Manage nearest‑neighbor indexes and ensure data quality at scale. - Data Curation at Scale:
Source, filter, and curate training datasets using both structured queries and model‑derived signals (e.g., visual quality scores, content classification, safety filters). Own the end‑to‑end data lifecycle with a focus on quality, governance, and compliance. - LLM‑Assisted Annotation:
Design pipelines that use large language models and vision models for automated data annotation. Create auditing workflows to evaluate and improve annotation quality. - Shared Tooling & Frameworks:
Contribute reusable components and frameworks that simplify AI‑augmented data pipelines, such as standardized model‑invocation operators and async job orchestration patterns.
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×