×
Register Here to Apply for Jobs or Post Jobs. X

Student Researcher Seed - Multimodal Interaction & Model - Unified Model PhD

Job in San Jose, Santa Clara County, California, 95111, USA
Listing for: ByteDance
Apprenticeship/Internship position
Listed on 2026-02-17
Job specializations:
  • Engineering
    Research Scientist, Artificial Intelligence
Job Description & How to Apply Below
Position: Student Researcher [Seed - Multimodal Interaction & World Model - Unified Model] - 2026 Start (PhD)
The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products.

- Develop and evaluate unified modeling architectures for multimodal foundation models across vision, audio, and language - Contribute to building a shared representation space that supports both generation and understanding tasks - Explore architectural and optimization strategies to improve generalization across modalities and tasks - Collaborate with researchers working on generation, reasoning, and world modeling to scale and adapt models for real-world scenarios

Minimum Qualifications:

- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline - Publications in top-tier venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML - Strong research background in at least one of the following: generative modeling (e.g., diffusion models, transformers), multimodal learning, or representation learning - Solid engineering and modeling skills, with experience building and training large-scale ML systems - Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Preferred Qualifications:

- Experience in building or training models for both generative and discriminative tasks - Familiarity with joint modeling strategies (e.g., multitask learning, contrastive alignment, autoregressive decoding for understanding) - Background in video generation, vision-language pretraining, or instruction-conditioned generation - Interest in long-context modeling, memory architectures, or world modeling tasks
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary