Senior AI/ML Research Engineer; Computer Vision Job Sunnyvale area,California USA,Software Development

Position: Senior AI/ML Research Engineer (Computer Vision)
Company Description

It started with a simple idea: what if surgery could be less invasive and recovery less painful? Nearly 30 years later, that question still fuels everything we do a global leader in robotic-assisted surgery and minimally invasive care, our technologies-like the da Vinci surgical system and Ion-have transformed how care is delivered for millions of patients worldwide.

We're a team of engineers, clinicians, and innovators united by one purpose: to make surgery smarter, safer, and more human. Every day, our work helps care teams perform with greater precision and patients recover faster, improving outcomes around the world.

The problems we solve demand creativity, rigor, and collaboration. The work is challenging, but deeply meaningful-because every improvement we make has the potential to change a life.

The Future Forward organization is Intuitive's advanced concepts group. We explore emerging technologies, prototype next-generation solutions, and build software experiences that shape the future of robotic-assisted surgery.

If you're ready to contribute to something bigger than yourself and help transform the future of healthcare, you'll find your purpose here.

Job Description

Primary Function of Position

We are building advanced augmented dexterity capabilities for next-generation robotic platforms. As a Senior AI/ML Research Engineer (Computer Vision), you will develop the perception models that let our Embodied-AI system understand the surgical scene. Working within a hierarchical, multimodal stack-where a high-level model interprets sensory observations into structured intent and a low-level policy turns that intent into precise, safe, real-time control-you will focus on the vision layer: designing, training, and evaluating models that extract anatomy, instruments, actions, and surgical context from intraoperative video.

You will partner with the broader AI/ML team to define how perception feeds reasoning and control, and you will drive the research-to-deployment path for your models, taking them from offline experimentation to robust, real-time performance in the OR.

Working within Intuitive's Future Forward research organization, you will identify, build and finetune the AI/ML models and algorithms that enables us to deliver safe and performant embodied AI systems. This role calls for someone who is equally comfortable getting hands-on with models and data and designing systems that scale.

Roles and Responsibilities

* Develop temporal models for activity and workflow understanding: event/state recognition and fine-grained temporal action segmentation.

* Benchmark in-house models against the state of the art and recommend the target perception architecture.

* Define the perception input/output specification and demonstrate offline feasibility on recorded data.

* Stand up a continuous-improvement loop (discrepancy flagging, active learning, human-in-the-loop relabeling) and the tooling/UI needed for offline evaluation and the path to real-time use.

* Partner with annotation and data teams to shape label taxonomies, QC, and the data pipeline that feeds the AI/ML models.

* Establish the path from offline evaluation on recorded data to real-time integration, including the continuous-improvement (human-in-the-loop) data loop.

* Partner with AI/ML researchers, robotics, data engineers, and other stakeholders to deliver a perception layer that enables rapid prototyping and learning while working toward a product solution.

Qualifications

Minimum Qualifications

* MS or PhD in CS, EE, Robotics, or a related field, with 5+ years of applied computer-vision research experience.

* Strong grasp of modern CV and deep-learning fundamentals: CNNs and vision transformers, segmentation, detection, tracking, and representation/self-supervised learning.

* Demonstrated work in video understanding, including temporal action segmentation, action/phase recognition, and video segmentation.

* Hands-on experience with modern video architectures, including video transformers and self-supervised video pretraining.

* Exposure to vision-action (VA) / vision-language-action (VLA) models and world-model / self-supervised predictive architectures (e.g., JEPA-style models, MAE, DINO) for learning visual representations and dynamics.

* Experience working with large, messy, real-world video datasets at scale.

* Strong software and experimentation skills in Python and C++, with proficiency in one or more of PyTorch/Tensor Flow/JAX, and the ability to stand up clean, reproducible experiments and run the full loop (data curation, augmentation, loss design, metrics, error analysis).

* A research-and-prototyping mindset: comfortable working in ambiguity, framing open-ended problems, running rapid experiments, and reading and reproducing recent papers to pull promising techniques into practice.

* Sound judgment about the path from prototype to product: writing code others can build on, knowing when to optimize versus when to move fast, and thinking…

Senior AI​/ML Research Engineer; Computer Vision

Senior AI/ML Research Engineer; Computer Vision