Research Scientist Graduate; Multimodal Interaction and Model PhD Job San Jose area,California USA,Engineering

Position: Research Scientist Graduate (Multimodal Interaction and World Model) - 2026 Start (PhD)
About the Team Established in 2023, the Byte Dance Seed team is dedicated to pioneering new paths toward artificial general intelligence. We aspire to advance the frontier of intelligence to drive progress for both technology and society. With a long-term vision for the AI sector, the Seed team's research spans MLLM, Gen Media, AI for Science, and Robotics. We maintain a global presence with laboratories and career opportunities across China, Singapore, and the United States.

To date, we have launched industry-leading general foundation models and cutting-edge multimodal capabilities. Our technology powers over 50 application scenarios - including Doubao, Jimeng, TRAE, Dola and Dreamnia - and serves enterprise customers through Volcano Engine and Byte Plus. Third-party data shows that the Doubao App ranks first in user volume in the Chinese market, while Doubao foundation models lead the industry in average daily token consumption.

The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team is working also aspires to advance the exploration and development of multimodal assistant products We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth.

Launch your career where inspiration is infinite cessful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume. Responsibilities - Drive research and engineering to advance models that enhance understanding of multimodal data and enlarge reasoning capabilities.

- Explore research ideas that optimize both the model's performance and efficiency.

- Establish scaling laws, design and conduct systematic ablations that result in transferrable conclusions.

Minimum Qualifications - Individuals who are completing or have recently completed a PhD degree in Software Development, Computer Science, Computer Engineering, or a related technical discipline - Publications in accredited venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML - Strong research background in at least one of the following: reinforcement learning, multimodal learning, video understanding, or vision-language modeling

Preferred Qualifications - Expertise in Transformers (Dense and MoE) and familiar with how to scale Transformers on GPUs or TPUs.

- Rich hands-on experience in PyTorch/JAX and distributed training framework.

- Familiar with state-of-the-art techniques for preparing multimodal training data.