×
Register Here to Apply for Jobs or Post Jobs. X

Senior Research Scientist; Multimodal Language Model - PICO

Job in San Jose, Santa Clara County, California, 95111, USA
Listing for: ByteDance
Full Time position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Data Scientist, Machine Learning/ ML Engineer, Artificial Intelligence
  • Engineering
    AI Engineer (Applied/Software), Artificial Intelligence
Job Description & How to Apply Below
Position: Senior Research Scientist (Multimodal Large Language Model) - PICO
About the Team PICO-MR team is dedicated to pioneering core technologies for intelligent human-computer interaction in MR environments, with a focus on integrating multimodal large language models (MLLM) and tool-use capabilities to redefine user experiences. Our R&D directions cover cutting-edge fields including multimodal scene understanding, MLLM-based agent systems, tool-augmented MR interaction, 3D environment perception, and AIGC-driven content generation. Within MR scenarios, our work spans: MLLM optimization and adaptation for MR, intelligent task execution with tool use, multimodal scene understanding (vision, point clouds, text), AIGC-based scene generation, depth estimation (Mono/Stereo/MVS), 3D environment perception, large-scale 3D scene reconstruction (3

DGS, NeRF, etc.), visual localization, and lighting estimation-encompassing both fundamental research breakthroughs and industrial-grade solution deployment. Responsibilities:
1. Lead the R&D of multimodal large language models (MLLM) tailored for MR scenarios, integrating vision, point clouds, text, and other multimodal information-including model architecture optimization, cross-modal alignment, data construction, evaluation system enhancement, and end-to-end training/inference acceleration.
2. Drive the research and implementation of MLLM tool-use capabilities in MR environments, enabling models to proficiently utilize spatial interaction and spatial computing-related professional tools, support tool calls for both single-turn and multi-turn conversations, and solve complex user tasks through interaction.
3. Address key challenges in long-horizon, multi-turn tool-augmented tasks in MR, such as context memory management, tool selection strategy, and error correction mechanisms.
4. Keep abreast of cutting-edge technologies in MLLM, multimodal intelligence, and tool-use research, and lead the application and deployment of innovative technologies in PICO's MR products.
5. Collaborate with cross-functional teams (including software engineering, product design, and hardware development) to translate research outcomes into practical features that enhance user experience.

Minimum Qualifications 1. Master's or Ph.D. degree in Computer Science, Electrical Engineering, Machine Learning, Artificial Intelligence, or a related quantitative field.
2. Expertise in multimodal large model pre-training, post-training, fine-tuning, or cross-modal fusion technologies, with hands-on experience in model optimization, training workflow design, and performance tuning.
3. Proven research experience in LLM tool use, reinforcement learning, LLM agents, or interactive learning, with a deep understanding of single-turn and multi-turn interaction mechanisms.
4. Proficiency in core 2D/3D computer vision tasks, including detection, segmentation, depth estimation, image matching, and 3D scene perception.
5. Skilled in Python and C++, with solid programming capabilities and experience in developing large-scale models using mainstream deep learning frameworks (PyTorch/Tensor Flow). 6. Excellent problem-solving and independent research abilities, capable of addressing complex technical challenges in the integration of MR and MLLM tool use.

Preferred Qualifications 1. Publications in AI/ML/CV conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP) focusing on multimodal large models, LLM tool use, or agent systems.
2. Hands-on experience in building large-scale MLLM training pipelines, tool-use evaluation systems, or multimodal agent platforms.
3. Familiarity with MR/AR/VR technologies, spatial computing, or 3D scene reconstruction (3

DGS, NeRF, etc.) is a strong plus.
4. Experience in addressing long-horizon reasoning or asynchronous agent behavior challenges is highly valued.
5. Award winners of competitions such as ACM-ICPC, NOI/IOI, Top Coder, or AI/ML contests (e.g., Kaggle) are preferred.
6. Strong collaboration and communication skills, able to lead research initiatives and drive cross-team technical alignment.
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary