Senior, ML Engineer - Auto Tagger
Listed on 2026-05-31
-
Software Development
AI Engineer, Data Engineer
At Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business. A leader in autonomous driving since 2007, Torc has spent over a decade commercializing our solutions with experienced partners. Now a part of the Daimler family, we are focused solely on developing software for automated trucks to transform how the world moves freight.
Join us and catapult your career with the company that helped pioneer autonomous technology, and the first AV software company with the vision to partner directly with a truck manufacturer.
The Auto Tagger team is the engine behind our data flywheel, responsible for translating petabytes of raw, multi-modal vehicle data into a highly curated library of critical driving scenarios. By mining driving logs for long-tail events, we provide the foundational data required for safe autonomous trucking. Leveraging Pegasus logical layers, this team structures and catalogs findings into an observations database that directly accelerates development across autonomous perception, sensor fusion, and generative simulation testing.
WhatYou'll Do
- Scenario Mining at Scale: Architect and optimized distributed data pipelines to process massive multi-sensor logs (camera, LiDAR, radar, kinematics), automatically extracting and cataloging safety-critical and long-tail driving events.
- Advanced Event Tagging: Develop and tune both heuristic-based and ML-assisted algorithms (including exploring Vision-Language Models or semantic vector search) to automatically classify and describe complex environmental and behavioral scenarios.
- Standardized Data Structuring: Extract and format scenario data utilizing the Pegasus layer standard (alongside open-source frameworks) to ensure semantic consistency and rigorous metadata integrity.
- Data Flywheel Integration: Manage the ingestion of tagged events into the observations database, enabling high-speed querying and retrieval for ML training, regression testing, and system validation.
- Cross-Functional Alignment: Operate with broad autonomy to drive consensus across organizational boundaries. Collaborate closely with downstream consumers in perception, simulation, and systems engineering to define what constitutes an "interesting scenario" and operationalize a continuous data loop.
- Mentorship & Team Growth: Guide, mentor, and elevate less-experienced engineers. Lead design reviews, establish coding standards, and foster a culture of technical excellence and collaborative problem-solving.
- BS or MS in Computer Science, Robotics, Engineering, or a STEM field, with 6+ years in data engineering, ML systems, or autonomous data curation.
- Core
Languages:
Strong Python and SQL skills, with heavy experience processing massive time-series or unstructured datasets. - ML & Dataset Curation: Hands-on machine learning and dataset curation experience, with a demonstrated history of implementing targeted datasets that measurably improve downstream model performance.
- Data Exploration: Hands-on experience using Databricks (or similar platforms) for large-scale analytics, interactive querying, and making massive vehicle datasets searchable.
- Cloud & Compute: Expertise in distributed compute frameworks (Ray, Spark, Beam) and cloud platforms (AWS, GCP, or Azure) for executing heavy data workloads.
- AV Standards: Experience parsing complex data formats and applying scenario-description standards like Pegasus layers.
- Communication: Exceptional ability to translate complex data engineering challenges into clear strategies for cross-functional stakeholders.
- Technical Leadership: Proven track record of mentoring teams, driving system architecture, and defining engineering roadmaps.
- Auto-labeling & VLMs: Familiarity with foundational models, auto-labeling pipelines, or zero-shot classification for scenario extraction.
- Model Serving: Experience with vLLM, SGLang, or similar frameworks for highly optimized, high-throughput model serving and inference.
- Semantic Inference: Experience with semantic extraction and attribute mapping to help build out a robust semantic inference engine,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).