Gen AI Audio Researcher
Job in
Greater London, London, Greater London, W1B, England, UK
Listed on 2026-02-09
Listing for:
DNEG
Full Time
position Listed on 2026-02-09
Job specializations:
-
IT/Tech
AI Engineer, Artificial Intelligence, Data Scientist, Machine Learning/ ML Engineer
Job Description & How to Apply Below
Overview
We are looking for a Gen AI Researcher for Audio to join our team and help develop next-generation voice synthesis models. You ll research and build deep learning systems that can generate expressive, natural-sounding speech from text or audio prompts, and collaborate with cross-functional teams to integrate your work into production-ready pipelines. We are hiring remotely across the EMEA region.
Responsibilities- Research and develop state-of-the-art voice synthesis models (e.g., TTS, voice cloning, speech-to-speech).
- Build and fine-tune models using frameworks like PyTorch and Hugging Face.
- Design training pipelines and datasets for scalable voice model training.
- Explore techniques for emotional expressiveness, multilingual synthesis, and speaker adaptation.
- Work closely with product and creative teams to ensure models meet quality and production constraints.
- Stay on top of academic and industrial trends in speech synthesis and related fields.
- Strong background in machine learning and deep learning, with focus on speech/audio.
- Hands-on experience with TTS, voice cloning, or related voice synthesis tasks.
- Proficiency with Python and PyTorch; experience with libraries like torch audio, ESPnet, or similar.
- Experience training models at scale and working with large audio datasets.
- Familiarity with vocoders and transformer-based architectures.
- Strong problem-solving skills, ability to work autonomously in a remote-first environment.
- PhD degree in Computer Science/ Machine Learning and publications in top venues.
- Contributions to open-source speech research or participation in relevant benchmarks.
- Familiarity with adjacent areas like lip-syncing, audio-driven animation, or expressive speech control.
- Experience with voice datasets or proprietary pipelines.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×