Principal Performance Engineer Lead Job Cambridge Massachusetts USA,Engineering

Job Description

Join the Akamai Inference Cloud Team! The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design and operate AI platforms that enable customers to run models with unmatched performance, compliance, and economics. The Model Intelligence & Lifecycle team owns the end-to-end model lifecycle from validation and security scanning through quantization, optimization, and monitoring, ensuring every model meets rigorous standards for quality, safety, and performance.

Responsibilities

As an ML Performance Engineer Principal Lead, you will be responsible for:

Applying and evaluating quantization, distillation, and pruning techniques to optimize model performance while preserving accuracy
Designing hardware-aware model placement and scheduling strategies to match models with optimal compute resources
Implementing and tuning speculative decoding, KV‑cache optimization, and batching strategies to improve inference throughput and latency
Building benchmarking and profiling pipelines to measure model‑layer performance across architectures, hardware, and serving configurations
Mentoring and guiding engineers on the team through code reviews, design discussions, and technical problem‑solving
Collaborating with hardware performance engineers to identify and resolve end-to-end performance bottlenecks across the inference stack

Qualifications

To be successful in this role you will:

Have 12+ years of relevant experience with a Bachelor’s or Master’s degree in Computer Science, Machine Learning, or a related field
Possess hands‑on experience optimizing LLM inference performance (quantization, speculative decoding, model compression, etc.)
Have a solid understanding of transformer architectures and how design choices impact latency, throughput, and accuracy
Be experienced with inference serving frameworks such as vLLM, Tensor

RT‑LLM, Triton, or similar systems
Be proficient in Python and C++ with experience profiling and optimizing compute-intensive workloads
Have familiarity with hardware‑aware optimization, including GPU/accelerator scheduling and memory management trade‑offs

Work Arrangement

Flex Base, Akamai's Global Flexible Working Program, offers employees the choice to work from home, the office, or a hybrid arrangement in the country advertised, enabling high flexibility and supporting remote talent worldwide.

Benefits

At Akamai, we provide benefits that support all aspects of life:

Your health
Your finances
Your family
Your time at work
Your time pursuing other endeavors

EEO Statement

Akamai Technologies is an affirmative action, equal opportunity employer that values the strength that diversity brings to the workplace. All qualified applicants will receive consideration for employment and will not be discriminated against on the basis of gender, gender identity, sexual orientation, race/ethnicity, protected veteran status, disability, or other protected group status.

Compensation

Akamai is committed to fair and equitable compensation practices. For US-based candidates only, the base salary for this position ranges from $169,300 to $304,700 per year, dependent on experience, skills, and location. Compensation may also include incentive bonuses, equity awards, and an Employee Stock Purchase Plan. Akamai provides industry-leading benefits including healthcare, a 401(k) plan, company holidays, PTO, sick time, parental leave, and an employee assistance program focusing on mental and financial wellness.

#J-18808-Ljbffr