More jobs:
Student Researcher Seed - Multimodal Interaction & Model - Unified Model PhD
Job in
San Jose, Santa Clara County, California, 95111, USA
Listed on 2026-02-17
Listing for:
ByteDance
Apprenticeship/Internship
position Listed on 2026-02-17
Job specializations:
-
Engineering
Research Scientist, Artificial Intelligence
Job Description & How to Apply Below
The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products.
- Develop and evaluate unified modeling architectures for multimodal foundation models across vision, audio, and language - Contribute to building a shared representation space that supports both generation and understanding tasks - Explore architectural and optimization strategies to improve generalization across modalities and tasks - Collaborate with researchers working on generation, reasoning, and world modeling to scale and adapt models for real-world scenarios
Minimum Qualifications:
- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline - Publications in top-tier venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML - Strong research background in at least one of the following: generative modeling (e.g., diffusion models, transformers), multimodal learning, or representation learning - Solid engineering and modeling skills, with experience building and training large-scale ML systems - Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Preferred Qualifications:
- Experience in building or training models for both generative and discriminative tasks - Familiarity with joint modeling strategies (e.g., multitask learning, contrastive alignment, autoregressive decoding for understanding) - Background in video generation, vision-language pretraining, or instruction-conditioned generation - Interest in long-context modeling, memory architectures, or world modeling tasks
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×