Object Detection

Label ground truth and detect objects using pretrained AI models like YOLO and Grounding DINO, create custom detectors using transfer learning

Computer Vision Toolbox™ provides a comprehensive set of tools and functions to build, train, evaluate, and deploy object detection models using both deep learning and traditional computer vision techniques. You can start by creating labeled ground truth using the Image Labeler and Video Labeler apps, which support interactive and AI-assisted annotation of bounding boxes around objects in images and video frames.

Once you have labeled data, you can choose from a wide range of pretrained deep learning object detectors, including YOLO v2, YOLO v3, YOLO v4, YOLOX, RTMDet, SSD, and Grounding DINO. The toolbox also contains specialized detectors like peopleDetector and faceDetector for human and face recognition tasks. You can use these models directly for inference or as a starting point for transfer learning, enabling you to customize them to specific data sets and applications. For more information, see Get Started with Object Detection Using Deep Learning. For classical object detection methods, the toolbox includes support for the aggregate channel features (ACF) and cascade (Viola-Jones) object detectors.

The toolbox provides functions for training object detectors using transfer learning. The toolbox also provides functionality to manage and preprocess training data as well as data augmentation tools, that ensure robust model training by simulating real-world variations. For more information, see Get Started with Image Preprocessing and Augmentation for Deep Learning.

After you generate detections using pretrained or custom models, you can use the Object Detector Analyzer app to compare the detection results against ground truth data. The app enables you to evaluate key performance metrics, such as the confusion matrix, precision, recall, F1 score and mean Average Precision (mAP), across a range of intersection over union (IOU) thresholds. Alternatively, you can use the evaluateObjectDetection function to evaluate detection performance metrics. For more information, see Evaluate Object Detector Performance and Get Started with Object Detector Analyzer App.

Three images: the first contains labeled boats, the second a diagram of a neural network, and the third the keypoints from a person detector overlaid on the image of the people it has detected.

Apps

Image Labeler	Label images for computer vision applications
Video Labeler	Label video for computer vision applications
Object Detector Analyzer	Interactively visualize and evaluate object detection results against ground truth (Since R2026a)

Functions

expand all

Detect Objects Using Pretrained AI Models

Deep Learning Detectors

`groundingDinoObjectDetector`	Detect and localize objects using Grounding DINO object detector (Since R2026a)
`rtmdetObjectDetector`	Detect objects using RTMDet object detector (Since R2024b)
`ssdObjectDetector`	Detect objects using SSD deep learning detector
`yolov2ObjectDetector`	Detect objects using YOLO v2 object detector
`yolov3ObjectDetector`	Detect objects using YOLO v3 object detector
`yolov4ObjectDetector`	Detect objects using YOLO v4 object detector (Since R2022a)
`yoloxObjectDetector`	Detect objects using YOLOX object detector (Since R2023b)
`peopleDetector`	Detect people using pretrained deep learning object detector (Since R2024b)
`faceDetector`	Detect faces using pretrained RetinaFace face detector (Since R2025a)
`detectTextCRAFT`	Detect texts in images by using CRAFT deep learning model (Since R2022a)
`imfindcirclesYOLO`	Find circles using YOLOX object detector (Since R2026a)

Feature-based Detectors

`acfObjectDetector`	Detect objects using aggregate channel features
`peopleDetectorACF`	Detect people using aggregate channel features
`vision.CascadeObjectDetector`	Detect objects using the Viola-Jones algorithm
`vision.ForegroundDetector`	Foreground detection using Gaussian mixture models
`vision.BlobAnalysis`	Properties of connected regions

Select Detected Objects

`selectStrongestBbox`	Select strongest bounding boxes from overlapping clusters using nonmaximal suppression (NMS)
`selectStrongestBboxMulticlass`	Select strongest multiclass bounding boxes from overlapping clusters using nonmaximal suppression (NMS)

Train Custom Object Detectors Using Transfer Learning

Load Training Data

`boxLabelDatastore`	Datastore for bounding box label data
`groundTruth`	Ground truth label data
`imageDatastore`	Datastore for image data
`objectDetectorTrainingData`	Create training data for an object detector
`combine`	Combine data from multiple datastores

Train Deep Learning Based Object Detectors

`trainSSDObjectDetector`	Train SSD deep learning object detector
`trainYOLOv2ObjectDetector`	Train YOLO v2 object detector
`trainYOLOv3ObjectDetector`	Train YOLO v3 object detector (Since R2024a)
`trainYOLOv4ObjectDetector`	Train YOLO v4 object detector (Since R2022a)
`trainYOLOXObjectDetector`	Train YOLOX object detector (Since R2023b)

Train Feature-Based Object Detectors

`trainACFObjectDetector`	Train ACF object detector
`trainCascadeObjectDetector`	Train cascade object detector model

Augment and Preprocess Training Data for Deep Learning

`balanceBoxLabels`	Balance bounding box labels for object detection
`bboxcrop`	Crop bounding boxes
`bboxerase`	Remove bounding boxes
`bboxresize`	Resize bounding boxes
`bboxwarp`	Apply geometric transformation to bounding boxes
`bbox2points`	Convert rectangle to corner points list
`blockLocationsWithROI`	Select image block locations that contain bounding box ROIs (Since R2025a)
`imwarp`	Apply geometric transformation to image
`imcrop`	Crop image
`imresize`	Resize image
`randomAffine2d`	Create randomized 2-D affine transformation
`centerCropWindow2d`	Create rectangular center cropping window
`randomWindow2d`	Randomly select rectangular region in image
`integralImage`	Calculate 2-D integral image
`transform`	Transform datastore

Design Deep Neural Networks for Object Detection

R-CNN (Regions With Convolutional Neural Networks)

`roiAlignLayer`	Non-quantized ROI pooling layer for Mask-CNN
`roiMaxPooling2dLayer`	Neural network layer used to output fixed-size feature maps for rectangular ROIs
`roialign`	Non-quantized ROI pooling of `dlarray` data (Since R2021b)

YOLO v2 (You Only Look Once version 2)

`yolov2TransformLayer`	Create transform layer for YOLO v2 object detection network
`spaceToDepthLayer`	Space to depth layer

Focal Loss

focalCrossEntropy Compute focal cross-entropy loss

SSD (Single Shot Detector)

ssdMergeLayer Create SSD merge layer for object detection

Anchor Boxes

estimateAnchorBoxes Estimate anchor boxes for deep learning object detectors

Evaluate Object Detection Results

`evaluateObjectDetection`	Evaluate object detection data set against ground truth (Since R2023b)
`objectDetectionMetrics`	Object detection quality metrics (Since R2023b)
`mAPObjectDetectionMetric`	Mean average precision (mAP) metric for object detection (Since R2024a)
`bboxOverlapRatio`	Compute bounding box overlap ratio
`bboxPrecisionRecall`	Compute bounding box precision and recall against ground truth
`drise`	Explain object detection network predictions using D-RISE (Since R2024a)

Visualize Object Detection Results

`cuboid2img`	Project cuboids from 3-D world coordinates to 2-D image coordinates (Since R2022b)
`insertObjectAnnotation`	Annotate truecolor or grayscale image or video
`insertObjectMask`	Insert masks in image or video stream
`insertShape`	Insert shapes in image or video
`insertText`	Insert text in image or video
`showShape`	Display shapes on image, video, or point cloud

Blocks

Deep Learning Object Detector

Detect objects using trained deep learning object detector (Since R2021b)

Topics

Create Ground Truth and Training Data for Object Detection

Get Started with the Image Labeler
Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification.
Get Started with the Video Labeler
Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification in a video or image sequence.
Training Data for Object Detection and Semantic Segmentation
Create training data for object detection or semantic segmentation using the Image Labeler or Video Labeler.
Get Started with Image Preprocessing and Augmentation for Deep Learning
Preprocess data for deep learning applications with deterministic operations such as resizing, or augment training data with randomized operations such as random cropping.

Detect Objects Using Pretrained Detectors

Get Started with Object Detection Using Deep Learning
Perform object detection using deep learning neural networks such as YOLOX, YOLO v4, RTMDet, and SSD.
Choose an Object Detector
Compare object detection deep learning models, such as YOLOX, YOLO v4, RTMDet, and SSD.
Get Started with Cascade Object Detector
Train a custom classifier.
Deep Learning in MATLAB (Deep Learning Toolbox)
Discover deep learning capabilities in MATLAB^® using convolutional neural networks for classification and regression, including pretrained networks and transfer learning, and training on GPUs, CPUs, clusters, and clouds.
Pretrained Deep Neural Networks (Deep Learning Toolbox)
Learn how to download and use pretrained convolutional neural networks for classification, transfer learning and feature extraction.

Evaluate Object Detection Results

Evaluate Object Detector Performance
Evaluate object detector performance using metrics such as average precision, precision recall, and confusion matrix.
Get Started with Object Detector Analyzer App
Use Object Detector Analyzer app to evaluate pretrained object detectors or precomputed detection results against the ground truth data, and evaluate performance metrics.
Calibrate Object Detection Confidence Scores
This example shows how to calibrate the confidence scores of an object using Platt scaling.

Featured Examples

New

Automatically Search and Label Video Frames Using VLMs

Automatically search and detect objects based on natural language text queries using vision-language models (VLMs).

Since R2026a
Open Live Script

New

Visualize Object Detection Results from Pretrained PyTorch Model

Detect objects using a pretrained PyTorch® model and visualize the results in Object Detector Analyzer.

Since R2026a
Open Live Script

New

Automatically Label Ground Truth Using Vision-Language Model

Automatically label ground truth images for object detection using the Grounding DINO vision-language model (VLM).

Since R2026a
Open Live Script

Detect Small Objects Using Tiled Training of YOLOX Network

Detect small objects in full-resolution images using tiled training of a you only look once version X (YOLOX) deep learning network.

Since R2024b
Open Live Script

Object Detection in Large Satellite Imagery Using Deep Learning

Perform object detection on large satellite imagery using deep learning.

Open Live Script

Object Detection Using YOLO v4 Deep Learning

Detect objects in images using you only look once version 4 (YOLO v4) deep learning network. In this example, you will

Open Live Script

Multiclass Object Detection Using YOLO v2 Deep Learning

Train a YOLO v2 multiclass object detector and evaluate object detector performance across selected classes and overlap thresholds.

Since R2024b
Open Live Script

Train Object Detectors in Experiment Manager

Use the Experiment Manager app to find optimal training options for object detectors.

Open Script

Find Object in Cluttered Scene Using Image Point Features

Detect a particular object in a cluttered scene, given a reference image of the object.

Open Script

Detect Cars Using Gaussian Mixture Models

Detect and count cars in a video sequence using foreground detector based on Gaussian mixture models (GMMs).

Open Script

Import Pretrained ONNX YOLO v2 Object Detector

Import pretrained YOLO v2 object detector from ONNX deep learning framework.

Open Live Script

Export YOLO v2 Object Detector to ONNX

Export pretrained YOLO v2 object detector to ONNX deep learning framework.

Open Live Script

Generate Code for Detecting Objects in Images by Using ACF Object Detector

Generate code from a MATLAB® function that detects objects in images by using an acfObjectDetector object. When you intend to generate code from your MATLAB function that uses an acfObjectDetector object, you must create the object outside of the MATLAB function. The example explains how to modify the MATLAB code in Train a Stop Sign Detector Using an ACF Object Detector to support code generation.

Open Live Script

Code Generation for Object Detection by Using YOLO v2

Generate CUDA® code for object detection using YOLO v2.

Open Live Script

Code Generation for Object Detection by Using Single Shot Multibox Detector

Generate CUDA code for an SSD network.

Open Live Script

Code Generation for People Detection Using Deep Learning

Generate CUDA code for people detection

Since R2025a
Open Live Script