Principal Performance Engineer Lead
Cambridge, Middlesex County, Massachusetts, 02140, USA
Listed on 2026-04-23
-
Engineering
Systems Engineer, Software Engineer, AI Engineer, Data Engineer -
IT/Tech
Systems Engineer, AI Engineer, Data Engineer
Job Description
Join the Akamai Inference Cloud Team! The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design and operate AI platforms that enable customers to run models with unmatched performance, compliance, and economics. The Model Intelligence & Lifecycle team owns the end-to-end model lifecycle from validation and security scanning through quantization, optimization, and monitoring, ensuring every model meets rigorous standards for quality, safety, and performance.
ResponsibilitiesAs an ML Performance Engineer Principal Lead, you will be responsible for:
- Applying and evaluating quantization, distillation, and pruning techniques to optimize model performance while preserving accuracy
- Designing hardware-aware model placement and scheduling strategies to match models with optimal compute resources
- Implementing and tuning speculative decoding, KV‑cache optimization, and batching strategies to improve inference throughput and latency
- Building benchmarking and profiling pipelines to measure model‑layer performance across architectures, hardware, and serving configurations
- Mentoring and guiding engineers on the team through code reviews, design discussions, and technical problem‑solving
- Collaborating with hardware performance engineers to identify and resolve end-to-end performance bottlenecks across the inference stack
To be successful in this role you will:
- Have 12+ years of relevant experience with a Bachelor’s or Master’s degree in Computer Science, Machine Learning, or a related field
- Possess hands‑on experience optimizing LLM inference performance (quantization, speculative decoding, model compression, etc.)
- Have a solid understanding of transformer architectures and how design choices impact latency, throughput, and accuracy
- Be experienced with inference serving frameworks such as vLLM, Tensor
RT‑LLM, Triton, or similar systems - Be proficient in Python and C++ with experience profiling and optimizing compute-intensive workloads
- Have familiarity with hardware‑aware optimization, including GPU/accelerator scheduling and memory management trade‑offs
Flex Base, Akamai's Global Flexible Working Program, offers employees the choice to work from home, the office, or a hybrid arrangement in the country advertised, enabling high flexibility and supporting remote talent worldwide.
BenefitsAt Akamai, we provide benefits that support all aspects of life:
- Your health
- Your finances
- Your family
- Your time at work
- Your time pursuing other endeavors
Akamai Technologies is an affirmative action, equal opportunity employer that values the strength that diversity brings to the workplace. All qualified applicants will receive consideration for employment and will not be discriminated against on the basis of gender, gender identity, sexual orientation, race/ethnicity, protected veteran status, disability, or other protected group status.
CompensationAkamai is committed to fair and equitable compensation practices. For US-based candidates only, the base salary for this position ranges from $169,300 to $304,700 per year, dependent on experience, skills, and location. Compensation may also include incentive bonuses, equity awards, and an Employee Stock Purchase Plan. Akamai provides industry-leading benefits including healthcare, a 401(k) plan, company holidays, PTO, sick time, parental leave, and an employee assistance program focusing on mental and financial wellness.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).