AI/Machine Learning Engineer - Vision Language Models/Multimodal AI; NGA Job Herndon area,Virginia USA,IT/Tech

Position: AI/Machine Learning Engineer - Vision Language Models / Multimodal AI (NGA)

Title: AI/Machine Learning Engineer – Vision Language Models / Multimodal AI (NGA)

Location: Springfield or Herndon, VA (onsite)

Clearance: TS/SCI (CI Poly preferred)

Position Type: Full-Time, Direct Hire

Pay: $175,000 to $250,000 for an SME

Company: The name of our partner organization will be disclosed during the interview process. This is not a direct role with Launch Code; it is a position through Launch Code, working with one of our partner companies.

Disclaimer: We are unable to provide work sponsorship for this role

Overview:

We’re hiring a AI/Machine Learning Engineer with strong experience in multimodal AI and large-scale model training to support advanced vision-language initiatives in a secure government environment. This role will focus on fine-tuning Vision Language Models (VLMs) on domain-specific geospatial imagery, building scalable AWS training infrastructure, and developing evaluation frameworks for image understanding and spatial reasoning. Ideal candidates will have deep experience with PyTorch, Hugging Face, distributed training, and computer vision, along with the ability to optimize and deploy multimodal models in mission-critical environments.

Huge plus for candidates who have hands-on experience taking multimodal models such as CLIP, LLaVA, Qwen-VL, or similar Vision Language Models and fine-tuning them on classified or mission-specific imagery datasets. The ideal candidate can build the AWS infrastructure needed to train and scale these models, evaluate performance improvements across real-world use cases, and deploy solutions into secure government or air-gapped environments.

Key Responsibilities:

Design and execute fine-tuning pipelines for Vision Language Models (VLMs) using domain-specific imagery datasets
Handle data preprocessing, training orchestration, and hyperparameter optimization for multimodal models
Build evaluation frameworks for image understanding, visual question answering, and spatial reasoning tasks
Develop scalable AWS-based ML infrastructure using Sage Maker and GPU-enabled EC2 for distributed training
Create data pipelines for curating, annotating, and transforming geospatial imagery into model-ready datasets
Partner with applied scientists and architects on model architecture improvements, LoRA/QLoRA strategies, and inference optimization

Required Qualifications:

Active TS/SCI with CI Poly
5+ years of machine learning engineering experience focused on deep learning
1+ year of hands-on experience fine-tuning foundation models (LLMs or VLMs)
Experience with LoRA, QLoRA, adapters, supervised fine-tuning, instruction tuning, and RLHF/DPO
4+ years of advanced Python development for ML workloads
Strong PyTorch and Hugging Face experience (Transformers, PEFT, Datasets, Accelerate)
Experience with distributed training frameworks such as Deep Speed, FSDP, or Megatron
3+ years working with computer vision or multimodal models
Familiarity with vision transformer architectures (ViT, CLIP, LLaVA, etc.)
Experience processing and augmenting image datasets at scale
3+ years with AWS ML infrastructure including Sage Maker, EC2 GPU environments, and S3
Experience with ML evaluation pipelines, benchmarking, metrics, and result analysis
Strong software engineering fundamentals including version control, testing, and CI/CD

Preferred Qualifications:

2+ years working with geospatial or remote sensing imagery
Experience with EO or SAR satellite imagery
Understanding of geospatial metadata, coordinate systems, and imagery preprocessing
Experience with model quantization / inference optimization (vLLM, Tensor

RT, ONNX)
MLOps tooling experience (MLflow, Weights & Biases, Sage Maker Experiments)
Familiarity with annotation tools and active learning workflows
Containerized ML experience with Docker / ECR / ECS / EKS
Experience supporting ATO processes and NIST 800-53 compliance
Experience deploying in air-gapped/disconnected environments
Familiarity with multimodal evaluation benchmarks (MMMU, MMBench, GQA)
Publications or contributions in computer vision, multimodal AI, or VLMs
Synthetic data generation experience for training augmentation

#J-18808-Ljbffr

AI​/Machine Learning Engineer - Vision Language Models​/Multimodal AI; NGA

AI/Machine Learning Engineer - Vision Language Models/Multimodal AI; NGA