×
Register Here to Apply for Jobs or Post Jobs. X

Student Researcher; Seed – Multimodal Interaction & Model - RL Focused PhD

Job in San Jose, Santa Clara County, California, 95199, USA
Listing for: ByteDance
Apprenticeship/Internship position
Listed on 2026-06-02
Job specializations:
  • Engineering
    Computer Science
Salary/Wage Range or Industry Benchmark: 60 USD Hourly USD 60.00 HOUR
Job Description & How to Apply Below
Position: Student Researcher (Seed – Multimodal Interaction & World Model - RL Focused) – 2026 Start (PhD)

The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities and to advance multimodal assistant products.

We are looking for talented PhD students to join us for a 2026 internship. PhD Internships aim to provide students with an opportunity to actively contribute to our products and research, and to the organization’s future plans and emerging technologies.

Responsibilities
  • Design and implement reinforcement learning (RL) training systems for large-scale multimodal foundation models.
  • Develop unified modeling frameworks that integrate video, audio, and language, with a focus on visual latent reasoning.
  • Explore RL-based approaches to bridge understanding and generation for multimodal visual reasoning.
  • Collaborate with researchers to evaluate models on tasks involving world modeling, reasoning, and instruction-conditioned generation.
Minimum Qualifications
  • Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline.
  • Publications in accredited venues such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML.
  • Strong research background in at least one of the following: reinforcement learning, multimodal learning, video understanding, or vision-language modeling.
Preferred Qualifications
  • Experience with reinforcement learning in multimodal or interactive environments.
  • Familiarity with video generation or diffusion-based generative models.
  • Experience with large-scale model training (e.g., distributed training, curriculum learning, or memory-augmented transformers).
  • Solid programming and engineering skills, with experience building training or evaluation pipelines for ML models.
Eligibility

As a condition of employment, successful candidates must be able to establish authorization to work in the United States. This position does not provide sponsorship or any immigration-related benefits.

Compensation & Benefits

Hourly Rate: $60 per hour.
Interns have day one access to health insurance, life insurance, wellbeing benefits, 10 paid holidays per year, paid sick time (56 hours if hired in first half of the year, 40 if hired in second half). Interns who are not working 100% remote may also be eligible for housing allowance.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary