VLM Data Science Expert Job San Ramon area,California USA,IT/Tech

Overview

At Citius Tech
, we constantly strive to solve the industry s greatest challenges with technology, creativity, and agility. With over 8,500 healthcare technology professionals worldwide, Citius Tech powers healthcare digital innovation, business transformation, and industry-wide convergence for over 140 organizations through next-generation technologies, solutions, and products. We aim to accelerate the transition to a human-first, sustainable, and digital healthcare ecosystem with the world s leading Healthcare and life sciences organizations and our partners.

Here is an opportunity for you to make a difference and collaborate with global leaders to shape the future of healthcare and positively impact human lives.

Our vision: To inspire new possibilities for the health ecosystem with technology and human ingenuity.

Base pay range

$/yr - $/yr

Direct message the job poster from Citius Tech

To learn more about Citius Tech,

What is in it for you?

If you re a Senior Data Scientist with a strong background in Vision-Language Models (VLMs), this is a chance to lead the charge in building smart, scalable multimodal AI solutions. We’re looking for someone who’s worked hands-on with cutting-edge frameworks like VILA, Isaac, and VSS—and who knows how to take models from concept to production in real-world settings. If you ve got experience in healthcare, especially with medical devices, that s a big plus.

You ll be diving into the latest VLM techniques and deploying them on cloud platforms like AWS, helping shape the future of AI in a meaningful, impactful way.

Key Responsibilities

Design, train, and deploy efficient Vision-Language Models (e.g., VILA, Isaac Sim) for multimodal applications including image captioning, visual search, and document understanding, pose understanding, pose comparison.
Develop and manage Digital Twin frameworks using AWS IoT Twin Maker, Site Wise, and Green grass to simulate and optimize real-world systems.
Develop Digital Avatars using AWS services integrated with 3D rendering engines, animation pipelines, and real-time data feeds.
Explore cost-effective methods such as knowledge distillation, modal-adaptive pruning, and LoRA fine-tuning to optimize training and inference.
Implement scalable pipelines for training/testing VLMs on cloud platforms (AWS services such as Sage Maker, Bedrock, Rekognition, Comprehend, and Textract.)

NVIDIA Platforms

Should develop a blend of technical expertise, tool proficiency, and domain- specific knowledge on below NVIDIA Platforms:
NeMo Framework:
Training and scaling VLMs across thousands of GPUs.
Deep Stream SDK:
Integrates pose models like TRTPose and Open Pose, Real-time video analytics and multi-stream processing.

Multimodal AI Solutions

Develop solutions that integrate vision and language capabilities for applications like image-text matching, visual question answering (VQA), and document data extraction.
Leverage interleaved image-text datasets and advanced techniques (e.g., cross-attention layers) to enhance model performance.

Image Processing and Computer Vision

Develop solutions that integrate Vision based deep learning models for applications like live video streaming integration and processing, object detection, image segmentation, pose Estimation, Object Tracking and Image Classification and defect detection on medical Xray images
Knowledge of real-time video analytics, multi-camera tracking, and object detection.
Training and testing the deep learning models on customized data

Healthcare Domain Expertise (Nice to Have)

While it’s not a must, having experience in the healthcare space—especially with medical imaging, motion detection, or patient monitoring—can be a big advantage.
You’ll be applying Vision-Language Models to use cases like analyzing scans, detecting positioning and movement, and making precise measurements.
If you re familiar with healthcare standards and know how to handle sensitive data responsibly, that’s a definite plus.
Evaluate trade-offs between model size, performance, and cost using techniques like elastic visual encoders or lightweight architectures.
Benchmark different VLMs (e.g., GPT-4V, Claude 3.5, Nova Lite) for…


Increase/decrease your Search Radius (miles)



Job Posting Language