VLM Data Science Expert
Listed on 2025-12-01
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Data Analyst, Data Scientist
Overview
At Citius Tech
, we constantly strive to solve the industry s greatest challenges with technology, creativity, and agility. With over 8,500 healthcare technology professionals worldwide, Citius Tech powers healthcare digital innovation, business transformation, and industry-wide convergence for over 140 organizations through next-generation technologies, solutions, and products. We aim to accelerate the transition to a human-first, sustainable, and digital healthcare ecosystem with the world s leading Healthcare and life sciences organizations and our partners.
Here is an opportunity for you to make a difference and collaborate with global leaders to shape the future of healthcare and positively impact human lives.
Our vision: To inspire new possibilities for the health ecosystem with technology and human ingenuity.
Base pay range$/yr - $/yr
Direct message the job poster from Citius Tech
To learn more about Citius Tech,
What is in it for you?If you re a Senior Data Scientist with a strong background in Vision-Language Models (VLMs), this is a chance to lead the charge in building smart, scalable multimodal AI solutions. We’re looking for someone who’s worked hands-on with cutting-edge frameworks like VILA, Isaac, and VSS—and who knows how to take models from concept to production in real-world settings. If you ve got experience in healthcare, especially with medical devices, that s a big plus.
You ll be diving into the latest VLM techniques and deploying them on cloud platforms like AWS, helping shape the future of AI in a meaningful, impactful way.
- Design, train, and deploy efficient Vision-Language Models (e.g., VILA, Isaac Sim) for multimodal applications including image captioning, visual search, and document understanding, pose understanding, pose comparison.
- Develop and manage Digital Twin frameworks using AWS IoT Twin Maker, Site Wise, and Green grass to simulate and optimize real-world systems.
- Develop Digital Avatars using AWS services integrated with 3D rendering engines, animation pipelines, and real-time data feeds.
- Explore cost-effective methods such as knowledge distillation, modal-adaptive pruning, and LoRA fine-tuning to optimize training and inference.
- Implement scalable pipelines for training/testing VLMs on cloud platforms (AWS services such as Sage Maker, Bedrock, Rekognition, Comprehend, and Textract.)
- Should develop a blend of technical expertise, tool proficiency, and domain- specific knowledge on below NVIDIA Platforms:
- NeMo Framework:
Training and scaling VLMs across thousands of GPUs. - Deep Stream SDK:
Integrates pose models like TRTPose and Open Pose, Real-time video analytics and multi-stream processing.
- Develop solutions that integrate vision and language capabilities for applications like image-text matching, visual question answering (VQA), and document data extraction.
- Leverage interleaved image-text datasets and advanced techniques (e.g., cross-attention layers) to enhance model performance.
- Develop solutions that integrate Vision based deep learning models for applications like live video streaming integration and processing, object detection, image segmentation, pose Estimation, Object Tracking and Image Classification and defect detection on medical Xray images
- Knowledge of real-time video analytics, multi-camera tracking, and object detection.
- Training and testing the deep learning models on customized data
- While it’s not a must, having experience in the healthcare space—especially with medical imaging, motion detection, or patient monitoring—can be a big advantage.
- You’ll be applying Vision-Language Models to use cases like analyzing scans, detecting positioning and movement, and making precise measurements.
- If you re familiar with healthcare standards and know how to handle sensitive data responsibly, that’s a definite plus.
- Evaluate trade-offs between model size, performance, and cost using techniques like elastic visual encoders or lightweight architectures.
- Benchmark different VLMs (e.g., GPT-4V, Claude 3.5, Nova Lite) for…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).