More jobs:
Vision-Language Models; VLMs
Job in
Waukesha, Waukesha County, Wisconsin, 53188, USA
Listed on 2026-01-01
Listing for:
TalentOla
Full Time
position Listed on 2026-01-01
Job specializations:
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below
2 weeks ago Be among the first 25 applicants
Get AI-powered advice on this job and more exclusive features.
Direct message the job poster from Talent Ola
OverviewRole: Senior Data Scientist with expertise in Vision-Language Models (VLMs)
Experience: 10+ Years
Location: San Ramon, CA or Waukesha, WI (Onsite)
Responsibilities- Design, train, and deploy efficient Vision-Language Models (e.g., VILA, Isaac Sim) for multimodal applications including image captioning, visual search, and document understanding, pose understanding, pose comparison.
- Develop and manage Digital Twin frameworks using AWS IoT Twin Maker, Site Wise, and Green grass to simulate and optimize real-world systems.
- Develop Digital Avatars using AWS services integrated with 3D rendering engines, animation pipelines, and real-time data feeds.
- Explore cost-effective methods such as knowledge distillation, modal-adaptive pruning, and LoRA fine-tuning to optimize training and inference.
- Implement scalable pipelines for training/testing VLMs on cloud platforms (AWS services such as Sage Maker, Bedrock, Rekognition, Comprehend, and Textract).
Candidate should develop a blend of technical expertise, tool proficiency, and domain-specific knowledge on below NVIDIA Platforms:
- NIM (NVIDIA Inference Microservices): Containerized VLM deployment.
- NeMo Framework: Training and scaling VLMs across thousands of GPUs.
- Deep Stream SDK: Integrates pose models like TRTPose and Open Pose, Real-time video analytics and multi-stream processing.
- Multimodal AI Solutions: Develop solutions that integrate vision and language capabilities for applications like image-text matching, visual question answering (VQA), and document data extraction.
- Image Processing and Computer Vision: Develop solutions that integrate Vision based deep learning models for applications like live video streaming integration and processing, object detection, image segmentation, pose estimation, object tracking and image classification and defect detection on medical X-ray images.
- Knowledge of real-time video analytics, multi-camera tracking, and object detection.
- Training and testing the deep learning models on customized data.
- Apply VLMs to healthcare-specific use cases such as medical imaging analysis, position detection, motion detection and measurements.
- Ensure compliance with healthcare standards while handling sensitive data.
- Evaluate trade-offs between model size, performance, and cost using techniques like elastic visual encoders or lightweight architectures.
- Benchmark different VLMs (e.g., GPT-4V, Claude 3.5, Nova Lite) for accuracy, speed, and cost-effectiveness on specific tasks.
- Benchmarking on GPU vs CPU.
- Collaborate with cross-functional teams including engineers and domain experts to define project requirements.
- Mentor junior team members and provide technical leadership on complex projects.
- Education:
Master’s or Ph.D. in Computer Science, Data Science, Machine Learning, or a related field. - Experience:
Minimum of 10+ years of experience in machine learning or data science roles with a focus on vision-language models. - Proven expertise in deploying production-grade multimodal AI solutions.
- Experience in self-driving cars and self-navigating robots.
- Technical
Skills:
Proficiency in Python and ML frameworks (e.g., PyTorch, Tensor Flow). - Hands-on experience with VLMs such as VILA, Isaac Sim, or VSS.
- Familiarity with cloud platforms like AWS Sage Maker or Azure ML Studio for scalable AI deployment.
- CUDA, cuDNN
- Domain Knowledge: Understanding of medical datasets (e.g., imaging data) and healthcare regulations.
- Soft Skills: Strong problem-solving skills with the ability to optimize models for real-world constraints;
Excellent communication skills to explain technical concepts to diverse stakeholders.
- Multimodal Techniques:
Cross-attention layers, interleaved image-text datasets - MLOps Tools:
Docker, MLflow
- Information Technology
- Information Services
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×