Student Researcher Seed - Multimodal Interaction & Model - Unified Model PhD Job San Jose area,California USA,Engineering

Position: Student Researcher [Seed - Multimodal Interaction & World Model - Unified Model] - 2026 Start (PhD)
The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products.

- Develop and evaluate unified modeling architectures for multimodal foundation models across vision, audio, and language - Contribute to building a shared representation space that supports both generation and understanding tasks - Explore architectural and optimization strategies to improve generalization across modalities and tasks - Collaborate with researchers working on generation, reasoning, and world modeling to scale and adapt models for real-world scenarios

Minimum Qualifications:

- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline - Publications in top-tier venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML - Strong research background in at least one of the following: generative modeling (e.g., diffusion models, transformers), multimodal learning, or representation learning - Solid engineering and modeling skills, with experience building and training large-scale ML systems - Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Preferred Qualifications:

- Experience in building or training models for both generative and discriminative tasks - Familiarity with joint modeling strategies (e.g., multitask learning, contrastive alignment, autoregressive decoding for understanding) - Background in video generation, vision-language pretraining, or instruction-conditioned generation - Interest in long-context modeling, memory architectures, or world modeling tasks


Increase/decrease your Search Radius (miles)



Job Posting Language