AI Infra Engineer Job San Jose area,California USA,IT/Tech

We are looking for a highly motivated HW/Algorithm Co-Design Engineer to join our NPU Hardware Architecture Team
. In this role, you will operate at the critical intersection of AI model innovation and silicon architecture
, working closely with both algorithm and hardware teams to ensure that state-of-the-art models are efficiently mapped, optimized, and deployed on our in-house NPU platform.

This is a high-impact role for engineers who are passionate about bridging the gap between model design and hardware execution
. You will help shape model-friendly architecture practices, drive cross-functional optimization, and influence the evolution of next-generation AI computing platforms from a system-level perspective.

Key Responsibilities

Partner closely with algorithm teams to understand model architectures, operator patterns, training/inference workflows, and deployment requirements, and guide them toward NPU-friendly design choices.
Analyze AI models from a hardware architecture perspective, identifying bottlenecks in compute, memory access, data movement, bandwidth utilization, and parallelism.
Drive hardware/algorithm co-design initiatives to improve model efficiency, performance, energy efficiency, and deployability on our in-house NPU.
Define and promote best practices for NPU-friendly model design, including operator selection, graph patterns, quantization readiness, memory-efficient structures, and execution-friendly network topologies.
Collaborate with hardware architects, compiler engineers, runtime/software teams, and algorithm researchers to enable end-to-end optimization across the full stack.
Evaluate emerging AI models and workload trends, and identify opportunities to improve future NPU capabilities through architecture-aware algorithm guidance.
Serve as a technical bridge between algorithm innovation and hardware realization, ensuring that advanced models can be translated into scalable and efficient production deployments.

Qualifications

Master’s degree or above in Computer Science, Electrical Engineering, Computer Engineering, Applied Mathematics, or a related field.
3+ years of relevant industry experience
, preferably in AI accelerators, NPU/GPU architecture, deep learning systems, or hardware/software co-design.
Strong understanding of deep learning fundamentals and modern model architectures such as CNNs, Transformers, and other large-scale AI models.
Solid knowledge of AI accelerator architecture concepts, including compute engines, memory hierarchy, dataflow, bandwidth constraints, and parallel execution.
Proven ability to analyze model behavior and identify architecture-sensitive performance bottlenecks.
Familiarity with common AI frameworks and deployment tool chains such as Py Torch ,
ONNX
, and model profiling/optimization tools.
Strong problem-solving skills, with the ability to reason across algorithm, software, and hardware layers.
Excellent communication and cross-functional collaboration skills, with the ability to work effectively across multiple engineering disciplines.

Preferred Qualifications

Experience with in-house NPU/ASIC development, AI compiler stacks, or performance modeling for ML workloads.
Familiarity with model optimization techniques such as quantization, sparsity, operator fusion, graph optimization, and low-precision computation.
Hands-on experience optimizing real-world workloads in areas such as computer vision, autonomous driving, multimodal AI, or large language models.
Experience in workload characterization, roofline analysis, memory bandwidth analysis, or architecture/performance tradeoff studies.
Demonstrated success influencing model design decisions based on hardware execution characteristics.

What We Value

A system-level mindset and the ability to connect model behavior with architectural implications.
Strong technical curiosity and the drive to push both algorithm efficiency and hardware capability forward.
A practical engineering approach focused not only on making models run, but on making them run efficiently, robustly, and competitively on our platform.
The ability to thrive in a highly collaborative environment where architecture, software, and algorithms evolve together.

#J-18808-Ljbffr