Senior Data Engineer
Listed on 2025-12-18
-
IT/Tech
Data Engineer, AI Engineer
At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we’ve built a world-class team in Automated Driving, Energy & Materials, Human-Centered AI, Human Interactive Driving, Large Behavioral Models, and Robotics.
The Automated Driving Advanced Development division at TRI will focus on enabling innovation and transformation at Toyota by building a bridge between TRI research and Toyota products, services, and needs. We achieve this through partnership, collaboration, and shared commitment. This new division is leading a new cross-organizational project between TRI and Woven by Toyota to conduct research and develop a fully end-to-end learned driving stack.
This cross-org collaborative project is harmonious with TRI’s robotics divisions' efforts in Diffusion Policy and Large Behavior Models.
We are looking for a Senior Data Engineer to design and build the foundational data infrastructure and tools that power our autonomy research and development workflows. This includes large-scale ingestion pipelines, structured feature stores, labeling infrastructure, scene search and data discovery tools, and performance diagnostics for machine learning and simulation workflows.
Responsibilities- Design and implement scalable, production-grade pipelines for data ingestion, transformation, storage, and retrieval from vehicle fleets and simulation environments.
- Build internal tools and services for data labeling, curation, indexing, and cataloging across large and diverse datasets.
- Collaborate with ML researchers, autonomy engineers, and data scientists to design schemas and APIs that power model training, evaluation, and debugging.
- Develop and maintain feature stores, metadata systems, and versioning infrastructure for structured and unstructured data.
- Support the generation and integration of synthetic datasets with real-world logs to enable hybrid training and simulation workflows.
- Optimize pipelines for cost, latency, and traceability, ensuring reproducibility and consistency across environments.
- Partner with simulation and cloud platform teams to automate workflows for closed-loop testing, scenario mining, and performance analytics.
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
- 8+ years of experience building data-intensive software systems, ideally in robotics, autonomous driving, or large-scale ML environments.
- Proficient in Python, SQL, and familiar with C++.
- Experience designing ETL pipelines using modern frameworks (e.g., Apache Spark, Flyte, Union).
- Strong knowledge of cloud-native architectures, including AWS services (e.g., S3, or equivalents (Google Cloud platform)
- Familiarity with sensor data types (camera, lidar, radar, GPS/IMU) and common data serialization formats (e.g., protobuf. ROS2bag, MCAP).
- Deep understanding of data quality, observability, and lineage in high-volume systems.
- Track record of building reliable and performant infrastructure that supports both ad-hoc exploration and repeatable production workflows.
- Experience in AD/ADAS, robotics, or autonomous systems — especially handling perception or planning datasets.
- Familiarity with ML pipeline orchestration frameworks (e.g. Kubeflow, Sage Maker, etc).
- Experience working with temporal or spatial data, including geospatial indexing and time-series alignment.
- Exposure to synthetic data generation, simulation logging, or scenario replay pipelines.
- Strong software engineering fundamentals, CI/CD, testing, code review, and service deployment best practices.
- Experience collaborating with cross-functional, distributed teams across research and production orgs.
Please include links to any relevant open-source contributions or technical project write-ups with your application.
The pay range for this position at commencement of employment is expected to be between $180,000and $270,000/year for California-based roles; however, base pay offered may vary depending on multiple individualized factors, including market location,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).