AI Technical Lead Job San Jose area,California USA,Engineering

JOB DESCRIPTION About NIONIO is a pioneer and a leading company in the premium smart electric vehicle market. Founded in November 2014, NIO’s mission is to shape a joyful lifestyle. NIO aims to build a community starting with smart electric vehicles to share joy and grow together with users.

NIO designs, develops, jointly manufactures and sells premium smart electric vehicles, driving innovations in next-generation technologies in autonomous driving, digital technologies, electric powertrains and batteries. NIO differentiates itself through its continuous technological breakthroughs and innovations, such as its industry-leading battery swapping technologies, Battery as a Service, or BaaS, as well as its proprietary autonomous driving technologies and Autonomous Driving as a Service, or ADaaS.NIO’s

product portfolio consists of the ES8, a six-seater smart electric flagship SUV, the ES7 (or the EL7), a mid-large five-seater smart electric SUV, the ES6, a five-seater all-round smart electric SUV, the EC7, a five-seater smart electric flagship coupe SUV, the EC6, a five-seater smart electric coupe SUV, the ET7, a smart electric flagship sedan, and the ET5, a mid-size smart electric sedan.

Roles and Responsibilities Architect the Hybrid AI Vision:
Lead the architectural design and strategic vision for hybrid inference systems, dynamically distributing Large Language Model (LLM) and Vision-Language Model (VLM) workloads across edge computing environments and cloud infrastructure.

Team Leadership &

Innovation: Lead, mentor, and inspire a team of specialized engineers working across distributed systems orchestration, inference optimization, and AI compiler engineering. While you are not expected to be a hands-on master of every domain, you will drive the overarching technical roadmap, foster a culture of cutting-edge innovation, and guide domain experts in navigating complex system tradeoffs.

Design Dynamic Orchestration & Resilience:
Oversee the architecture of high-availability orchestration engines that intelligently route inference tasks. Guide the team in developing cascading inference mechanisms, dynamic model fallback strategies, and robust telemetry to ensure continuous, steady-state inference under varying connectivity constraints.

Qualifications

Education & Experience:

Ph.D. in Computer Science, Computer Engineering, Artificial Intelligence, or a related field with 8+ years of relevant industry experience (or Master’s degree with 12+ years ), including proven experience leading technical teams or driving complex architectural roadmaps.

End-to-End Systems Leadership (T-Shaped Profile):
Demonstrated capability to lead full-stack AI systems engineering. You possess deep, hands-on mastery in at least one or two of the following core domains, coupled with the comprehensive systemic breadth required to effectively lead engineers working across the others:

Distributed Systems & Hybrid Inference:
Designing, scaling, and deploying production-grade distributed ML systems. Balancing cloud infrastructure with edge constraints using modern routing paradigms, such as cascading inference architectures and semantic routing.

Algorithmic & Inference Optimization:
Proven experience optimizing state-of-the-art LLM/VLM inference pipelines. Deep understanding of model compression (e.g., PTQ, QAT, AWQ, FP8/INT4), hardware-aware compute optimizations (e.g., Flash Attention), and advanced memory management (e.g., Paged Attention, KV cache compression/eviction).Advanced Systems & Compiler Engineering : C++ and production-grade Python proficiency. Deep understanding of edge/cloud model-serving frameworks (e.g., vLLM, TensorRT-LLM, Execu Torch, MLC-LLM) and AI compilers (e.g., MLIR, Apache TVM, Triton) for compute graph optimization and custom kernel development.

Preferred Qualifications Privacy & Security :
Deep understanding of privacy-preserving AI techniques (federated learning, differential privacy, secure enclaves) essential for processing sensitive data across edge and cloud environments.

Community Engagement & Open Source:
Publications in relevant AI, ML, or systems conferences (e.g., NeurIPS, ICML, MLSys), or active…