×
Register Here to Apply for Jobs or Post Jobs. X

Research Scientist - Foundation Model, Video Generation

Job in San Jose, Santa Clara County, California, 95111, USA
Listing for: ByteDance
Full Time position
Listed on 2026-02-17
Job specializations:
  • IT/Tech
    Artificial Intelligence, Machine Learning/ ML Engineer, AI Engineer, Data Scientist
Job Description & How to Apply Below
With a long-term vision and a strong commitment to the AI field, the Team conducts research in a range of areas including natural language processing (NLP), computer vision (CV), and speech recognition and generation. It has labs and researcher roles in China, Singapore, and the US. Leveraging substantial data and computing resources and through continued investment in these domains, our team has built a proprietary general-purpose model with multimodal capabilities.

In the Chinese market, Doubao models power over 50 Byte Dance apps and business lines, including Doubao, Coze, and Dreamina, and was launched to external enterprise clients through Volcano Engine. The Doubao app is the most used AIGC app in China. About the Team Welcome to the Doubao Vision team, where we spearhead multi-modality foundation models on visual understanding and visual generation.

Our mission is to solve the visual intelligence problem for AI. We conduct cutting-edge research on areas like vision and language, large vision models, and generative foundation models. The team is a mix of experienced research scientists and engineers, aiming to advance the research boundaries in foundation models and apply our technologies to our rich application scenarios, whereas a feedback loop is created to help further improve our foundation technologies.

Join us in shaping the future of AI technologies and revolutionizing our product experience for global users. Responsibilities - Conduct cutting-edge research and development in foundation model and multimodal machine learning, especially in the areas of generative AI (e.g. image, video, or 3d generation). The primary objective is to research cutting-edge video generation technology through innovation.

-Develop foundation model to enhance the strategic advantages for Byte Dance products
-Explore new downstream products with artificial intelligence technology at its core.

Minimum Qualifications - 1-3 years of research and practical experience in one or more areas of computer vision and machine learning.

- Hands-on coding experience in deep learning frameworks (e.g., PyTorch) and large-scale training experience is preferred. Highly competent in algorithms and programming;
Strong coding skills in Python.

- Work and collaborate well with team members

Preferred Qualifications - Candidates with publications in top-tier venues such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, SIGGRAPH or Multimedia, etc.

- Experience in solving real-world machine learning technical bottlenecks.

- Experience in large-scale image and video processing and curation is preferred, particularly when it involves extensive work with foundation models.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary