×
Register Here to Apply for Jobs or Post Jobs. X

Software Inference Deployment Engineer

Job in Portland, Cumberland County, Maine, 04122, USA
Listing for: LUMAI
Full Time position
Listed on 2026-06-26
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 90000 - 120000 USD Yearly USD 90000.00 120000.00 YEAR
Job Description & How to Apply Below

The Opportunity

Lumai is redefining how the world computes. We are an ambitious, venture-backed UK startup pioneering a breakthrough AI accelerator for data centers which uses 3D optical compute. Our radical technology uses light to perform computation at orders of magnitude faster speeds and at far greater scales than ever before, all whilst consuming far less energy than traditional approaches.

Lumai is unlocking performance and efficiency gains that could transform the economics of AI and compute infrastructure and reshape how intelligence scales globally.

If you are passionate about bringing groundbreaking technology to market, and want to be part of a team pushing the boundaries of what is physically possible, Lumai is where you can make it happen.

About Lumai

Founded in 2022, Lumai is a University of Oxford spinout using optical processing to accelerate large language models (LLMs) and other transformer-based AI systems. The team combines expertise in optical computing, machine learning, and physics.

Lumai has already secured over $15 million in investment from leading deep-tech investors like Constructor Capital, IP Group, Photon Ventures and government grants, and is scaling rapidly to deploy the fastest optical compute currently available globally.

The Role

We are bringing the world's first optical AI compute platform to market. As we move from development into field deployment, we are looking for a Software Inference Deployment Engineer to own the software-side integration and customer support of Lumai Iris servers in third-party data centre environments.

You will begin by working alongside our software and engineering teams - helping integrate the Iris software stack, supporting model onboarding through the toolchain, and getting hands‑on with the disaggregated prefill/decode runtime. This is intentional: the best way to develop deep expertise in a novel platform is to build with it. As deployments go live, you will take ownership in the field - supporting customer integration into their inference stacks, troubleshooting software issues, and acting as a primary technical contact for customer ML and infrastructure engineering teams.

This is an opportunity to work at the cutting edge of efficient AI inference - deploying a genuinely novel compute platform into production for the first time, and playing a central role in how it reaches the world.

What You’ll Do
  • Work alongside Lumai's software and engineering teams to integrate, test, and harden the Iris software stack ahead of deployment
  • Support model onboarding through the Iris toolchain - loading, conversion, and framework integration
  • Develop hands‑on familiarity with the disaggregated prefill/decode runtime, including how Iris servers operate alongside decode processors
  • Support customer integration of Lumai Iris into their own frameworks
  • Own software‑side troubleshooting in the field, acting as the first line of response post‑deployment
  • Train and enable customer ML and infrastructure engineering teams on the Iris software platform
  • Feed field findings, integration issues, and customer feedback back into product and engineering
What We’re Looking For

Must‑Have

  • Hands‑on software engineering experience in AI infrastructure, inference serving, accelerator integration, or comparable deep‑tech hardware‑software environments
  • Strong Python skills and familiarity with major ML frameworks (PyTorch in particular)
  • Practical experience with model deployment workflows – loading, format conversion, quantisation, or framework integration
  • Comfortable working with inference serving stacks (for example vLLM, TensorRT‑LLM, or similar)
  • Familiarity with Linux, containerisation (Docker), and cluster environments
  • Comfortable in a customer‑facing role, able to communicate clearly with ML and infrastructure engineering teams
  • Comfortable working in a fast‑moving, early‑stage environment where the product and the deployment approach are both still being developed

Strong Preference For

  • Experience integrating accelerator hardware (GPUs, FPGAs, ASICs, NPUs, or novel architectures) into customer inference workflows
  • Familiarity with the NVIDIA inference stack – CUDA, TensorRT, Triton
  • Exposur…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary