Neural Network Optimization Engineer Job Central London area,City Of London England UK,Software Development

Location: City Of London

About Us

Founded in the US in 2022 and now based in London, UK, Recraft is an AI tool for professional designers, illustrators, and marketers, setting a new standard for excellence in image generation.

We designed a tool that lets creators quickly generate and iterate original images, vector art, illustrations, icons, and 3D graphics with AI. Over 3 million users across 200 countries have produced hundreds of millions of images using Recraft, and we’re just getting started.

Join a universe of professional opportunities, develop and support large‑scale projects, and shape the future of creativity. We are committed to making Recraft an essential, daily tool for every designer and setting the industry standard. Our mission is to ensure that creators can fully control their creative process with AI, providing them with innovative tools to turn ideas into reality.

If you’re passionate about pushing the boundaries of AI, we want you on board!

Job Description

We are seeking an experienced Neural Network Optimization Engineer who will specialize in enhancing the performance, latency, and throughput of neural network inference workflows. The ideal candidate will have substantial hands‑on experience optimizing inference workloads using technologies such as Tensor

RT, Triton language, and model quantization techniques. You will collaborate closely with ML researchers to ensure that our machine learning models run at peak efficiency and reliability in production environments.

Key Responsibilities

Optimize neural network models for inference performance and latency reduction
Implement model quantization methods (e.g., INT8, FP8) to maximize computational efficiency.
Benchmark, analyze, and improve inference performance on targeted hardware platforms.
Collaborate with the ML researchers to deploy optimized models in production environments.
Stay updated with the latest developments in model optimization, inference engines, quantization methods, and related technologies.

Requirements

Proven professional experience optimizing neural network inference workloads.
Strong expertise with Tensor

RT, Triton language, CUDA programming.
Experience with neural network quantization techniques.
Proficiency in Python and PyTorch.
Deep understanding of GPU architectures and performance optimization.
Excellent problem‑solving skills and ability to analyze performance bottlenecks.

What We Offer

Competitive salary.
We’re able to offer Skilled Worker visa sponsorship in the UK for qualified candidates.
Opportunities for professional growth and development.
A collaborative and user‑focused work environment.
The chance to shape the future of AI‑powered creativity through research.
Exciting projects where your insights will directly impact product development.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language