Agentic AI/ML Engineer - Multimodal Job Irvine area,California USA,IT/Tech

Position: 1.68 Agentic AI/ML Engineer - Multimodal

Field

AI’s Irvine team is where embodied AI meets real robots, real sensors, and real field deployments. Based in the heart of Southern California’s robotics ecosystem, we build risk‑aware, reliable, field‑ready AI systems that solve the hardest problems in robotics and unlock the full potential of embodied intelligence. If you want your work to ship, get tested on hardware, and improve through real deployments, Irvine is the place.

We go beyond typical data‑driven approaches or pure transformer‑only architectures, combining rigorous engineering with learning systems proven in globally deployed solutions that deliver results today and get better every time our robots run in the field.

About the Job

Our Field Foundation Model (FFM) powers a global fleet of autonomous robots that capture massive streams of multimodal data across diverse, dynamic environments every day. As part of the Insight Team our mission is to transform this raw multimodal data into actionable insights that empower our customers and engineers to deliver value. The Field‑insight Foundation Model (FiFM) is at the core of how we transform multimodal data from autonomous robots into actionable insights.

As an AI/ML Engineer on the FiFM team, you will drive research and model development for one of Field AI’s most ambitious initiatives. Your work will span computer vision, vision‑language models (VLMs), multimodal scene understanding, and long‑memory video analysis and search
, with a strong emphasis on agentic AI (tool use, memory, multimodal retrieval‑augmented generation). This is a full‑cycle ML role
—you’ll curate datasets, fine‑tune and evaluate models, optimize inference, and deploy them into production. It’s a blend of applied research and engineering
, requiring creativity, rapid experimentation, and rigorous problem‑solving. While FiFM is your primary focus, you’ll also contribute to broader perception and insight‑generation initiatives across Field AI.

What You’ll Get To Do

Train and fine‑tune million‑ to billion‑parameter multimodal models
, focusing on computer vision
, video understanding
, and vision‑language integration
Track state‑of‑the‑art research
, adapt novel algorithms, and integrate them into FiFM
Curate datasets and develop tools to improve model interpretability
Build scalable evaluation pipelines for vision and multimodal models
Contribute to model observability
, drift detection
, and error classification
Fine‑tune and optimize open‑source VLMs and multimodal embedding models for efficiency and robustness
Build and optimize Multi‑Vector

RAG pipelines with vector DBs and knowledge graphs
Create embedding‑based memory and retrieval chains with token‑efficient chunking strategies

What You Have

Master’s/Ph.D. in Computer Science, AI/ML, Robotics, or equivalent industry experience
2+ years of industry experience or relevant publications in CV/ML/AI
Strong expertise in computer vision, video understanding, temporal modeling, and VLMs
Proficiency in Python and Py Torch with production‑level coding skills
Experience building pipelines for large‑scale video/image datasets
Familiarity with AWS or other cloud platforms for ML training and deployment
Understanding of MLOps best practices (CI/CD, experiment tracking)
Hands‑on experience fine‑tuning open‑source multimodal models using Hugging Face, Deep Speed, vLLM, FSDP, LoRA/QLoRA
Knowledge of precision tradeoffs (FP16, bfloat
16, quantization) and multi‑GPU optimization
Ability to design scalable evaluation pipelines for vision/VLMs and agent performance

The Extras That Set You Apart

Experience with Agentic/RAG pipelines and knowledge graphs (Lang Chain, Lang Graph, Llama Index, Open Search, FAISS, Pinecone)
Familiarity with agent operations logging and evaluation frameworks
Background in optimization
: token cost reduction, chunking strategies, reranking, and retrieval latency tuning
Experience deploying models under quantized (int4/int8) and distributed multi‑GPU inference
Exposure to open‑vocabulary detection, zero/few‑shot learning, multimodal RAG
Knowledge of temporal‑spatial modeling (event/scene graphs)
Experience deploying AI in edge or resource‑constrained environments

Our salary range is…

Agentic AI​/ML Engineer - Multimodal

Agentic AI/ML Engineer - Multimodal