Student Researcher; Seed - Multimodal Interaction & Model - RL Focused PhD Job San Jose area,California USA,Engineering

Position: Student Researcher (Seed - Multimodal Interaction & World Model - RL Focused) - 2026 Start (PhD)
The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products. We are looking for talented individuals to join us for an internship in 2026. PhD Internships at our Company aim to provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies.

PhD internships at Our Company provides students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts. Applications will be reviewed on a rolling basis - we encourage you to apply early. Please state your availability clearly in your resume (Start date, End date).

Responsibilities:

- Design and implement reinforcement learning (RL) training systems for large-scale multimodal foundation models - Develop unified modeling frameworks that integrate video, audio, and language, with a focus on visual latent reasoning - Explore RL-based approaches to bridge understanding and generation for multimodal visual reasoning - Collaborate with researchers to evaluate models on tasks involving world modeling, reasoning, and instruction-conditioned generation

Minimum Qualifications:

- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline - Publications in accredited venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML - Strong research background in at least one of the following: reinforcement learning, multimodal learning, video understanding, or vision-language modeling

Preferred Qualifications:

- Experience with reinforcement learning in multimodal or interactive environments - Familiarity with video generation or diffusion-based generative models

- Experience with large-scale model training (e.g., distributed training, curriculum learning, or memory-augmented transformers) - Solid programming and engineering skills, with experience building training or evaluation pipelines for ML models As a condition of employment, all successful candidates must be able to establish authorization to work in the United States. For this position, the Company does not provide sponsorship or any immigration-related benefits.