Research Scientist - Foundation Model,Video Generation Job San Jose area,California USA,IT/Tech

With a long-term vision and a strong commitment to the AI field, the Team conducts research in a range of areas including natural language processing (NLP), computer vision (CV), and speech recognition and generation. It has labs and researcher roles in China, Singapore, and the US. Leveraging substantial data and computing resources and through continued investment in these domains, our team has built a proprietary general-purpose model with multimodal capabilities.

In the Chinese market, Doubao models power over 50 Byte Dance apps and business lines, including Doubao, Coze, and Dreamina, and was launched to external enterprise clients through Volcano Engine. The Doubao app is the most used AIGC app in China. About the Team Welcome to the Doubao Vision team, where we spearhead multi-modality foundation models on visual understanding and visual generation.

Our mission is to solve the visual intelligence problem for AI. We conduct cutting-edge research on areas like vision and language, large vision models, and generative foundation models. The team is a mix of experienced research scientists and engineers, aiming to advance the research boundaries in foundation models and apply our technologies to our rich application scenarios, whereas a feedback loop is created to help further improve our foundation technologies.

Join us in shaping the future of AI technologies and revolutionizing our product experience for global users. Responsibilities - Conduct cutting-edge research and development in foundation model and multimodal machine learning, especially in the areas of generative AI (e.g. image, video, or 3d generation). The primary objective is to research cutting-edge video generation technology through innovation.

-Develop foundation model to enhance the strategic advantages for Byte Dance products
-Explore new downstream products with artificial intelligence technology at its core.

Minimum Qualifications - 1-3 years of research and practical experience in one or more areas of computer vision and machine learning.

- Hands-on coding experience in deep learning frameworks (e.g., PyTorch) and large-scale training experience is preferred. Highly competent in algorithms and programming;
Strong coding skills in Python.

- Work and collaborate well with team members

Preferred Qualifications - Candidates with publications in top-tier venues such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, SIGGRAPH or Multimedia, etc.

- Experience in solving real-world machine learning technical bottlenecks.

- Experience in large-scale image and video processing and curation is preferred, particularly when it involves extensive work with foundation models.


Increase/decrease your Search Radius (miles)



Job Posting Language

Research Scientist - Foundation Model, Video Generation