Description

Automated Data Labeling Using Vision-Language Models

Learn how to use the latest vision-language models in MATLAB^® to perform automated data labeling for aerial imagery. The vision-language models include CLIP, Grounding DINO, and Moondream for image retrieval, text-prompted object detection, and image captioning, respectively. SAM is also included to generate pixel-level object masks from object detections. The imagery is generated from NAIP data from USGS Earth Explorer, shot over Hanscom Air Force Base.

Published: 13 Feb 2026

Full Transcript

Related Resources

Computer Vision Toolbox

Try for free
Get pricing

Up Next:

Read in a Sudoku puzzle using a USB webcam, extract data from it using image processing, and solve it using a simple numerical algorithm. Sudoku is a registered trademark of NIKOLI Co., Ltd. in Japan. — Solving a Sudoku Puzzle Using a Webcam

Automated Data Labeling Using Vision-Language Models

Related Products

Computer Vision Toolbox

Up Next:

Related Videos: