×
Register Here to Apply for Jobs or Post Jobs. X

Research Scientist - Foundation Model, Speech Understanding

Job in San Jose, Santa Clara County, California, 95111, USA
Listing for: ByteDance
Full Time position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Machine Learning/ ML Engineer, Artificial Intelligence, Data Scientist
Job Description & How to Apply Below
About the Team Established in 2023, the Byte Dance Seed team is dedicated to pioneering new paths toward artificial general intelligence. We aspire to advance the frontier of intelligence to drive progress for both technology and society. With a long-term vision for the AI sector, the Seed team's research spans MLLM, Gen Media, AI for Science, and Robotics. We maintain a global presence with laboratories and career opportunities across China, Singapore, and the United States.

To date, we have launched industry-leading general foundation models and cutting-edge multimodal capabilities. Our technology powers over 50 application scenarios - including Doubao, Jimeng, TRAE, Dola and Dreamnia - and serves enterprise customers through Volcano Engine and Byte Plus. Third-party data shows that the Doubao App ranks first in user volume in the Chinese market, while Doubao foundation models lead the industry in average daily token consumption.

The mission of the Seed Speech team is to enrich interactive and creative processes through the application of multimodal speech technologies. The team focuses on the forefront of research and product development in speech and audio, music, natural language understanding, and multimodal deep learning. Responsibilities - Conduct research and development in speech/audio foundation models - Collaborate with cross-functional teams to identify key research areas and contribute to the development of innovative speech/audio models.

- Work with product development teams to integrate research findings into practical applications for Byte Dance and other platforms.

- Collaborate on team-driven projects to address complex challenges and enhance the overall effectiveness of the research team.

Minimum Qualifications - Master's or PhD in computer science, mathematics, engineering or related field - Have 3+ years of experience in one or more areas of machine learning and deep learning, including but not limited to:
Automatic Speech Recognition, Automatic Speech Translation, Speech/audio self-supervised learning and foundation models, Speaker recognition and verification, Speech emotion recognition, Multimodal foundation models, Large Language Model pre-training and fine-tuning.

Preferred Qualifications - Publications in accredited ML/DL venues such as NeurIPS, ICLR, ICML, AAAI and speech venues such as ICASSP, ASRU, Interspeech - Deep understanding of Large Language models - Familiar with distributed computing and large scale model training - Familiar with deep learning frameworks such as Tensorflow and Pytorch.

- Familiar with engineering principles and best practices.

- Highly competent in algorithms and programming;
Strong coding skills in C/C++ and Python.

- Ability to work collaboratively in a fast-paced, multi-functional environments
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary