Senior Software Engineer, Vision Language Models
Listed on 2025-12-09
-
Engineering
AI Engineer -
IT/Tech
Machine Learning/ ML Engineer, AI Engineer
Mission Summary
At Motional, data play a critical role in fueling our ML-centered autonomous driving vehicle. Our robo-taxi fleet collects petabytes of data on the road every day – the Data Mining team is mining & filtering the massive influx of fleet data by developing billion-scale data workflows and state-of-the-art mining algorithms. Through our mining and learning frameworks we continuously improve the on-road performance of ML products for perception, prediction & planning with every mile driven.
We mine for model errors, anomalies, rare objects & long-tail driving scenarios across millions of driving hours – these are used for laser-focused ML model training and continuous edge case validation. We are looking for an engineer to spearhead new mining strategies & workflows and help deliver high-quality data that improve our core ML products.
What you'll be doing:- Spearhead the development of cutting-edge data products by adapting and extending Vision‑Language Models (VLMs) and other multimodal foundation models. This includes applying advanced techniques like fine‑tuning, RAG, in‑context learning, continual pre‑training, and knowledge distillation.
- Design and curate high‑quality multimodal datasets crucial for training and evaluating multimodal foundation models. This includes developing innovative strategies for data curation, dataset creation, and synthetic data generation to optimize multimodal foundation models for long‑tail event mining.
- Drive the in‑depth analysis of multimodal foundation models' performance, generalization, and robustness in diverse real‑world settings
- MS/PhD in computer science or related fields with a strong emphasis on multimodal foundation models
- Strong publication record in premier conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR) demonstrating significant contributions to the field of vision‑language understanding or multimodal foundation models
- Proficiency in Python and deep learning frameworks such as PyTorch, with a demonstrated ability to write clean, efficient, and maintainable code
- Experience in the application of Vision‑Language Models (VLMs) or other multimodal foundation models to data mining in real‑world settings
- Experience in production deployment of Vision‑Language Models (VLMs) or other multimodal foundation models for real‑world applications (e.g., image/video captioning, open‑vocabulary image/video searching)
- Experience with data from diverse sensor modalities (e.g., camera, lidar, radar)
- Experience in applied machine learning for autonomous driving
The salary range for this role is an estimate based on a wide range of compensation factors including but not limited to specific skills, experience and expertise, role location, certifications, licenses, and business needs. The estimated compensation range listed in this job posting reflects base salary only. This role may include additional forms of compensation such as a bonus or company equity.
The recruiter assigned to this role can share more information about the specific compensation and benefit details associated with this role during the hiring process.
Candidates for certain positions are eligible to participate in Motional’s benefits program. Motional’s benefits include but are not limited to medical, dental, vision, 401k with a company match, health saving accounts, life insurance, pet insurance, and more.
Salary Range $175,000 — $234,000 USD
Motional is a driverless technology company making autonomous vehicles a safe, reliable, and accessible reality. We’re driven by something more.Our journey is always people first.
We aren't just developing driverless cars; we're creating safer roadways, more equitable transportation options, and making our communities better places to live, work, and connect. Our team is made up of engineers, researchers, innovators, dreamers and doers, who are creating a technology with the potential to transform the way we move.
Higher purpose, greater impact.We’re creating first-of‑its‑kind technology that will transform transportation. To do so successfully, we must design for everyone in our cities and on our…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).