AI Performance Engineer Job San Francisco area,California USA,Software Development

Requirements

BS/MS in Computer Science, Electrical Engineering, or related field
Experience with distributed systems and communication libraries (MPI, NCCL, UCX, libfabric)
Strong programming skills in C++ and Python
Experience profiling and optimizing HPC or AI/ML workloads
Familiarity with ML benchmarks such as MLPerf
(Desirable) Experience with GPUs or accelerated computing architectures
(Desirable) Knowledge of HPC networking and interconnect technologies (Infini Band, RoCE)
(Desirable) Familiarity with ML frameworks such as PyTorch or Tensor Flow
(Desirable) Understanding of ARM architectures and tool chains
(Desirable) Strong debugging, profiling, and performance optimization skills

What the job involves

Graphcore’s AI/ML training and inference infrastructure is rapidly scaling to meet the growing demands of AI workloads across mobile, edge, and datacenter environments
This role focuses on optimizing performance across ARM-based architectures and large-scale distributed systems, ensuring efficiency, scalability, and reliability across the full hardware-software stack
The System Engineering Performance team architects and optimizes high-performance infrastructure for large-scale datacenter deployments. The team works across hardware, software, networking, and system architecture to deliver cutting-edge AI solutions and ensure optimal system performance at scale
Analyze ML models’ compute and memory requirements using roofline analysis and simulations
Collaborate across hardware and software teams to optimize large-scale AI workloads
Benchmark, monitor, and troubleshoot system performance across distributed systems
Optimize communication stacks including MPI, NCCL, UCX, RDMA, and networking fabrics
Profile and optimize AI workloads, focusing on performance bottlenecks
Develop high-quality, ARM-compatible code and documentation

#J-18808-Ljbffr