More jobs:
Research Scientist - Foundation Model, Speech Understanding
Job in
San Jose, Santa Clara County, California, 95111, USA
Listed on 2026-06-02
Listing for:
ByteDance
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
AI Engineer (Applied/Software), Machine Learning/ ML Engineer, Artificial Intelligence, Data Scientist
Job Description & How to Apply Below
To date, we have launched industry-leading general foundation models and cutting-edge multimodal capabilities. Our technology powers over 50 application scenarios - including Doubao, Jimeng, TRAE, Dola and Dreamnia - and serves enterprise customers through Volcano Engine and Byte Plus. Third-party data shows that the Doubao App ranks first in user volume in the Chinese market, while Doubao foundation models lead the industry in average daily token consumption.
The mission of the Seed Speech team is to enrich interactive and creative processes through the application of multimodal speech technologies. The team focuses on the forefront of research and product development in speech and audio, music, natural language understanding, and multimodal deep learning. Responsibilities - Conduct research and development in speech/audio foundation models - Collaborate with cross-functional teams to identify key research areas and contribute to the development of innovative speech/audio models.
- Work with product development teams to integrate research findings into practical applications for Byte Dance and other platforms.
- Collaborate on team-driven projects to address complex challenges and enhance the overall effectiveness of the research team.
Minimum Qualifications - Master's or PhD in computer science, mathematics, engineering or related field - Have 3+ years of experience in one or more areas of machine learning and deep learning, including but not limited to:
Automatic Speech Recognition, Automatic Speech Translation, Speech/audio self-supervised learning and foundation models, Speaker recognition and verification, Speech emotion recognition, Multimodal foundation models, Large Language Model pre-training and fine-tuning.
Preferred Qualifications - Publications in accredited ML/DL venues such as NeurIPS, ICLR, ICML, AAAI and speech venues such as ICASSP, ASRU, Interspeech - Deep understanding of Large Language models - Familiar with distributed computing and large scale model training - Familiar with deep learning frameworks such as Tensorflow and Pytorch.
- Familiar with engineering principles and best practices.
- Highly competent in algorithms and programming;
Strong coding skills in C/C++ and Python.
- Ability to work collaboratively in a fast-paced, multi-functional environments
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×