More jobs:
Performance Architect CHODC
Job in
Milpitas, Santa Clara County, California, 95035, USA
Listed on 2026-06-03
Listing for:
Compunnel Inc.
Full Time
position Listed on 2026-06-03
Job specializations:
-
IT/Tech
AI Engineer, Systems Engineer
Job Description & How to Apply Below
The Performance Architect develops advanced AI storage solutions through innovative system architectures and complex simulation models for Client next-generation products. This role involves designing, programming, debugging, and modifying simulation models to evaluate architectural changes, while assessing performance, power, and endurance. The architect will collaborate with engineering teams to address complex challenges, drive innovation, and shape the future of data-centric architectures.
Key Responsibilities- Build System
C performance models for AI storage solutions, covering end-to-end components such as GPU/TPU/NPU/xPU, host interfaces, memory hierarchies, base die controllers, and packaging technologies. - Improve AI/ML ASIC architecture performance through hardware/software co-optimization, post-silicon performance analysis, and strategic roadmap influence.
- Conduct workload analysis and characterization of ASICs and competitive AI/datacenter solutions to identify performance improvement opportunities.
- Collaborate with architecture teams to resolve performance issues and optimize datacenter technologies for efficiency and TCO.
- Model and optimize components of AI/ML accelerator ASICs, including PCIe/UCIe/CXL, NoC, DMA, firmware interactions, NAND, fabrics, and xPU.
- Perform performance modeling and optimization for large-scale LLM training/inference, including Dense and MoE architectures across multiple modalities.
- Develop and optimize parallelization strategies across tensor, pipeline, context, expert, and data parallel dimensions.
- Architect memory-efficient training systems using techniques such as structured pruning, quantization, continuous batching, speculative decoding, and KV cache optimization.
- Incorporate and extend state-of-the-art models (e.g., GPT-4, Deepseek-R1) and multi-modal architectures.
- Collaborate with internal and external stakeholders to disseminate results and iterate rapidly.
- Bachelor’s, Master’s, or Ph.D. in Computer/Electrical Engineering.
- 5+ years of experience in performance modeling, simulation, and analysis using System
C. - Strong background in computer/graphics architecture, ML, and LLMs.
- Hands-on experience with System
C/TLM simulation, behavioral modeling, and performance analysis.
- Experience with storage systems, protocols, and NAND flash.
- Deep expertise in optimizing large-scale ML systems and GPU architectures.
- Proven technical leadership in GPU performance and workload analysis.
- Knowledge of transformer architectures, attention mechanisms, and model parallelism techniques.
- Experience with GPU/TPU microarchitecture and distributed training systems.
- Proficiency in PyTorch, CUDA, Tensor
RT, OpenAI Triton, ONNX, and distributed frameworks (Ray, Megatron-LM). - Familiarity with performance analysis tools (NSight Compute, nvprof, PyTorch Profiler).
- Background in IO subsystem microarchitecture and protocols (NVMe, PCIe, UCIe, CXL, NVLink).
- Experience with datacenter workload analysis, multi-core systems, and multi-thread interactions.
- Relevant certifications in performance engineering, AI/ML, or hardware architecture (preferred but not required).
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×