×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

AI Performance Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Graphcore
Full Time position
Listed on 2026-05-29
Job specializations:
  • Software Development
    AI Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Requirements

  • BS/MS in Computer Science, Electrical Engineering, or related field
  • ,
  • Experience with distributed systems and communication libraries (MPI, NCCL, UCX, libfabric)
  • ,
  • Strong programming skills in C++ and Python
  • ,
  • Experience profiling and optimizing HPC or AI/ML workloads
  • ,
  • Familiarity with ML benchmarks such as MLPerf
  • ,
  • (Desirable) Experience with GPUs or accelerated computing architectures
  • ,
  • (Desirable) Knowledge of HPC networking and interconnect technologies (Infini Band, RoCE)
  • ,
  • (Desirable) Familiarity with ML frameworks such as PyTorch or Tensor Flow
  • ,
  • (Desirable) Understanding of ARM architectures and tool chains
  • ,
  • (Desirable) Strong debugging, profiling, and performance optimization skills
What the job involves
  • Graphcore’s AI/ML training and inference infrastructure is rapidly scaling to meet the growing demands of AI workloads across mobile, edge, and datacenter environments
  • ,
  • This role focuses on optimizing performance across ARM-based architectures and large-scale distributed systems, ensuring efficiency, scalability, and reliability across the full hardware-software stack
  • ,
  • The System Engineering Performance team architects and optimizes high-performance infrastructure for large-scale datacenter deployments. The team works across hardware, software, networking, and system architecture to deliver cutting-edge AI solutions and ensure optimal system performance at scale
  • ,
  • Analyze ML models’ compute and memory requirements using roofline analysis and simulations
  • ,
  • Collaborate across hardware and software teams to optimize large-scale AI workloads
  • ,
  • Benchmark, monitor, and troubleshoot system performance across distributed systems
  • ,
  • Optimize communication stacks including MPI, NCCL, UCX, RDMA, and networking fabrics
  • ,
  • Profile and optimize AI workloads, focusing on performance bottlenecks
  • ,
  • Develop high-quality, ARM-compatible code and documentation
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary