×
Register Here to Apply for Jobs or Post Jobs. X

Senior AI Systems Performance Engineer

Job in Palo Alto, Santa Clara County, California, 94306, USA
Listing for: SambaNova
Full Time position
Listed on 2026-03-15
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below

Available Positions Samba Nova  Systems employs some of the greatest minds and talent in AI and machine learning. If you’d like to help lead the next generation of AI computing, we want to hear from you.

Senior AI Systems Performance Engineer Palo Alto, California, United States

The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

Samba Nova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the Samba Nova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

About

the role

We are seeking a talented and driven ML performance engineer to optimize and scale state-of-the-art foundation models on Samba Nova's reconfigurable dataflow platform. You'll work hands-on with some of the most advanced models in the world — such as Deep Seek R1, GPT OSS, and other frontier architectures — to push the limits of throughput, latency, and efficiency. In this role, you'll bridge the gap between deep learning and systems performance, collaborating across compiler, runtime, and hardware layers to deliver world-record performance for large-scale AI inference.

Responsibilities
  • Bring up and optimize cutting-edge foundation models (e.g., Deep Seek, Llama, Qwen, and others) on the Samba Nova platform through the Samba Nova software stack.
  • Profile and enhance model performance across compiler, runtime, and hardware layers to achieve SOTA throughput and latency.
  • Collaborate with machine learning, compiler, runtime, and hardware teams to deliver co-designed, high-performance AI applications.
  • Integrate the latest advances in model architecture, quantization, scheduling, and memory optimization from both academia and industry.
  • Develop robust, scalable, and efficient end-to-end inference solutions aligned with customer needs.
  • Identify performance bottlenecks and propose dataflow or scheduling optimizations for both single-node and distributed systems.
Basic Qualifications
  • Bachelor's or higher degree in computer science, electrical engineering, or a related field (e.g., applied mathematics, physics, or statistics).
  • 3+ years of experience in one or more of the following areas:
  • Deep learning model development and performance optimization
  • Compiler, runtime, or kernel-level optimization
  • Software–hardware co-design or systems performance tuning
  • Proficiency in Python or C++, with strong foundations in algorithms, data structures, and numerical computing.
  • Experience with at least one major ML framework — PyTorch, Tensor Flow, or JAX.
  • Demonstrated ability to analyze and optimize performance in real-world ML pipelines.
Preferred Qualifications
  • Hands-on experience with LLM or multimodal model training and inference.
  • Background in large-scale distributed training, continuous batching, and high-throughput inference systems.
  • Familiarity with quantization, graph optimization, kernel fusion, and model partitioning.
  • Experience with frameworks such as Deep Speed, Megatron, vLLM, or Tensor

    RT.
  • Strong GPU programming skills (CUDA, Triton, or OpenCL); experience with cuDNN, cuBLAS, or similar libraries is a plus.
  • Knowledge of memory hierarchy optimization, caching, and scheduling for large-scale model execution.
  • Publication record or open-source contributions in ML systems or performance optimization is a plus.

Submission Guidelines
Please note that in order to be considered an applicant for any position at Samba Nova Systems, you must submit an application form for each position for which you believe you are qualified.

EEO Policy
Samba Nova Systems is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary