More jobs:
Student Researcher; Seed - Multimodal Interaction & Model - RL Focused PhD
Job in
San Jose, Santa Clara County, California, 95111, USA
Listed on 2026-06-02
Listing for:
ByteDance
Apprenticeship/Internship
position Listed on 2026-06-02
Job specializations:
-
Engineering
Computer Science, Artificial Intelligence
Job Description & How to Apply Below
The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products. We are looking for talented individuals to join us for an internship in 2026. PhD Internships at our Company aim to provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies.
PhD internships at Our Company provides students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies. Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts. Applications will be reviewed on a rolling basis - we encourage you to apply early. Please state your availability clearly in your resume (Start date, End date).
Responsibilities:
- Design and implement reinforcement learning (RL) training systems for large-scale multimodal foundation models - Develop unified modeling frameworks that integrate video, audio, and language, with a focus on visual latent reasoning - Explore RL-based approaches to bridge understanding and generation for multimodal visual reasoning - Collaborate with researchers to evaluate models on tasks involving world modeling, reasoning, and instruction-conditioned generation
Minimum Qualifications:
- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline - Publications in accredited venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML - Strong research background in at least one of the following: reinforcement learning, multimodal learning, video understanding, or vision-language modeling
Preferred Qualifications:
- Experience with reinforcement learning in multimodal or interactive environments - Familiarity with video generation or diffusion-based generative models
- Experience with large-scale model training (e.g., distributed training, curriculum learning, or memory-augmented transformers) - Solid programming and engineering skills, with experience building training or evaluation pipelines for ML models As a condition of employment, all successful candidates must be able to establish authorization to work in the United States. For this position, the Company does not provide sponsorship or any immigration-related benefits.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×