×
Register Here to Apply for Jobs or Post Jobs. X

VLM Data Science Expert

Job in San Ramon, Contra Costa County, California, 94583, USA
Listing for: CitiusTech
Full Time position
Listed on 2025-12-01
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer, Data Analyst, Data Scientist
Job Description & How to Apply Below

Overview

At Citius Tech
, we constantly strive to solve the industry s greatest challenges with technology, creativity, and agility. With over 8,500 healthcare technology professionals worldwide, Citius Tech powers healthcare digital innovation, business transformation, and industry-wide convergence for over 140 organizations through next-generation technologies, solutions, and products. We aim to accelerate the transition to a human-first, sustainable, and digital healthcare ecosystem with the world s leading Healthcare and life sciences organizations and our partners.

Here is an opportunity for you to make a difference and collaborate with global leaders to shape the future of healthcare and positively impact human lives.

Our vision: To inspire new possibilities for the health ecosystem with technology and human ingenuity.

Base pay range

$/yr - $/yr

Direct message the job poster from Citius Tech

To learn more about Citius Tech,

What is in it for you?

If you re a Senior Data Scientist with a strong background in Vision-Language Models (VLMs), this is a chance to lead the charge in building smart, scalable multimodal AI solutions. We’re looking for someone who’s worked hands-on with cutting-edge frameworks like VILA, Isaac, and VSS—and who knows how to take models from concept to production in real-world settings. If you ve got experience in healthcare, especially with medical devices, that s a big plus.

You ll be diving into the latest VLM techniques and deploying them on cloud platforms like AWS, helping shape the future of AI in a meaningful, impactful way.

Key Responsibilities
  • Design, train, and deploy efficient Vision-Language Models (e.g., VILA, Isaac Sim) for multimodal applications including image captioning, visual search, and document understanding, pose understanding, pose comparison.
  • Develop and manage Digital Twin frameworks using AWS IoT Twin Maker, Site Wise, and Green grass to simulate and optimize real-world systems.
  • Develop Digital Avatars using AWS services integrated with 3D rendering engines, animation pipelines, and real-time data feeds.
  • Explore cost-effective methods such as knowledge distillation, modal-adaptive pruning, and LoRA fine-tuning to optimize training and inference.
  • Implement scalable pipelines for training/testing VLMs on cloud platforms (AWS services such as Sage Maker, Bedrock, Rekognition, Comprehend, and Textract.)
NVIDIA Platforms
  • Should develop a blend of technical expertise, tool proficiency, and domain- specific knowledge on below NVIDIA Platforms:
  • NeMo Framework:
    Training and scaling VLMs across thousands of GPUs.
  • Deep Stream SDK:
    Integrates pose models like TRTPose and Open Pose, Real-time video analytics and multi-stream processing.
Multimodal AI Solutions
  • Develop solutions that integrate vision and language capabilities for applications like image-text matching, visual question answering (VQA), and document data extraction.
  • Leverage interleaved image-text datasets and advanced techniques (e.g., cross-attention layers) to enhance model performance.
Image Processing and Computer Vision
  • Develop solutions that integrate Vision based deep learning models for applications like live video streaming integration and processing, object detection, image segmentation, pose Estimation, Object Tracking and Image Classification and defect detection on medical Xray images
  • Knowledge of real-time video analytics, multi-camera tracking, and object detection.
  • Training and testing the deep learning models on customized data
Healthcare Domain Expertise (Nice to Have)
  • While it’s not a must, having experience in the healthcare space—especially with medical imaging, motion detection, or patient monitoring—can be a big advantage.
  • You’ll be applying Vision-Language Models to use cases like analyzing scans, detecting positioning and movement, and making precise measurements.
  • If you re familiar with healthcare standards and know how to handle sensitive data responsibly, that’s a definite plus.
  • Evaluate trade-offs between model size, performance, and cost using techniques like elastic visual encoders or lightweight architectures.
  • Benchmark different VLMs (e.g., GPT-4V, Claude 3.5, Nova Lite) for…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary