Performance Architect CHODC Job Milpitas area,California USA,IT/Tech

Position: Performance Architect -- CHODC5804107

The Performance Architect develops advanced AI storage solutions through innovative system architectures and complex simulation models for Client next-generation products. This role involves designing, programming, debugging, and modifying simulation models to evaluate architectural changes, while assessing performance, power, and endurance. The architect will collaborate with engineering teams to address complex challenges, drive innovation, and shape the future of data-centric architectures.

Key Responsibilities

Build System

C performance models for AI storage solutions, covering end-to-end components such as GPU/TPU/NPU/xPU, host interfaces, memory hierarchies, base die controllers, and packaging technologies.
Improve AI/ML ASIC architecture performance through hardware/software co-optimization, post-silicon performance analysis, and strategic roadmap influence.
Conduct workload analysis and characterization of ASICs and competitive AI/datacenter solutions to identify performance improvement opportunities.
Collaborate with architecture teams to resolve performance issues and optimize datacenter technologies for efficiency and TCO.
Model and optimize components of AI/ML accelerator ASICs, including PCIe/UCIe/CXL, NoC, DMA, firmware interactions, NAND, fabrics, and xPU.
Perform performance modeling and optimization for large-scale LLM training/inference, including Dense and MoE architectures across multiple modalities.
Develop and optimize parallelization strategies across tensor, pipeline, context, expert, and data parallel dimensions.
Architect memory-efficient training systems using techniques such as structured pruning, quantization, continuous batching, speculative decoding, and KV cache optimization.
Incorporate and extend state-of-the-art models (e.g., GPT-4, Deepseek-R1) and multi-modal architectures.
Collaborate with internal and external stakeholders to disseminate results and iterate rapidly.

Required Qualifications

Bachelor’s, Master’s, or Ph.D. in Computer/Electrical Engineering.
5+ years of experience in performance modeling, simulation, and analysis using System

C.
Strong background in computer/graphics architecture, ML, and LLMs.
Hands-on experience with System

C/TLM simulation, behavioral modeling, and performance analysis.

Preferred Qualifications (if any)

Experience with storage systems, protocols, and NAND flash.
Deep expertise in optimizing large-scale ML systems and GPU architectures.
Proven technical leadership in GPU performance and workload analysis.
Knowledge of transformer architectures, attention mechanisms, and model parallelism techniques.
Experience with GPU/TPU microarchitecture and distributed training systems.
Proficiency in PyTorch, CUDA, Tensor

RT, OpenAI Triton, ONNX, and distributed frameworks (Ray, Megatron-LM).
Familiarity with performance analysis tools (NSight Compute, nvprof, PyTorch Profiler).
Background in IO subsystem microarchitecture and protocols (NVMe, PCIe, UCIe, CXL, NVLink).
Experience with datacenter workload analysis, multi-core systems, and multi-thread interactions.

Certifications (if any)

Relevant certifications in performance engineering, AI/ML, or hardware architecture (preferred but not required).

#J-18808-Ljbffr