Lead AI/ML Platform Engineer
Listed on 2026-06-02
-
IT/Tech
AI Engineer, Systems Engineer, Cloud Computing
Overview
Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like one of the world’s most admired brands, Toyota is growing and leading the future of mobility through innovative, high‑quality solutions designed to enhance lives and delight those we serve. We’re looking for talented team members who want to Dream. Do. Grow.
with us.
An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world‑changing company—delivering on Toyota's vision to move people beyond what’s possible. At TFS, you will help create best‑in‑class customer experience in an innovative, collaborative environment.
To save time applying, Toyota does not offer sponsorship of job applicants for employment‑based visas or any other work authorization for this position at this time.
Who we’re looking forToyota Financial Services Enterprise Platforms team is looking for a passionate and highly motivated Lead AI/ML Platform Engineer
. The primary responsibility of this role is to design, build, and implement scalable platform solutions that power enterprise AI/ML and GenAI capabilities across the organization. You will help enable secure, production‑ready MLOps and LLMOps infrastructure that supports model training, inference, orchestration, and retrieval‑augmented generation. The Lead AI/ML Platform Engineer will support the Enterprise Platforms team’s objective to deliver reliable, secure, and high‑performing AI platform capabilities that drive business value at scale.
In this role, you’ll help shape the foundation for Toyota Financial Services’ next generation of AI platform capabilities
, where success means building systems that are scalable, resilient, and ready for production use. A typical day may include collaborating with product, architecture, engineering, data, and cybersecurity partners to solve complex infrastructure challenges while improving the developer and model lifecycle experience.
Design and implement cloud‑native infrastructure that enables enterprise AI/ML and GenAI workloads in production
Build and evolve MLOps and LLMOps platform capabilities, including model training, versioning, deployment, monitoring, and rollback
Create GPU‑accelerated compute environments that improve model performance while balancing scalability and cost efficiency
Standardize infrastructure patterns for vector databases, model registries, and orchestration frameworks
Develop reusable approaches for model serving, inference scaling, prompt management, and latency optimization
Design secure, multi‑tenant environments with strong access controls, auditability, and usage governance for AI models
Partner closely with engineering, platform, and data teams to ensure smooth data flow, strong observability, and operational resiliency
Own technical direction for AI infrastructure services and integrations in collaboration with the architecture team
Lead design reviews, establish engineering standards, and help guide critical technical decisions
Mentor engineers, provide thoughtful feedback, and support growth through coaching and development planning
Stay current on emerging GenAI, distributed systems, and infrastructure trends to bring fresh ideas and better solutions to the team
10+ years of experience in software engineering, with a focus on cloud infrastructure or cloud platform engineering
3+ years of experience building cloud infrastructure that supports AI/ML workloads such as training, tuning, and inference
Deep hands‑on experience with AWS and infrastructure‑as‑code tools such as Terraform, CDK, or Cloud Formation
Experience with
Kubernetes, containerization, and CI/CD pipelines in a production environmentStrong understanding of GPU infrastructure
, serverless compute, and scalable microservice patternsFamiliarity with model hosting, inference scaling, and observability tools such as Datadog, Cloud Watch, or Prometheus
Practical experience using Git/Git Hub and CI/CD tooling such as Git Hub Actions or…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).