Main Content

Develop Visual SLAM Algorithm Using Unreal Engine Simulation

This example shows how to develop a visual Simultaneous Localization and Mapping (SLAM) algorithm using image data obtained from the Unreal Engine® simulation environment.

Visual SLAM refers to the process of calculating the position and orientation of a camera with respect to its surroundings while simultaneously mapping the environment. However, developing a visual SLAM algorithm and evaluating its performance in varying conditions is a challenging task. One of the biggest challenges is generating the ground truth of the camera sensor, especially in outdoor environments. The use of simulation enables testing under a variety of scenarios and camera configurations while providing precise ground truth.

This example demonstrates the use of Unreal Engine simulation to develop a monocular visual SLAM algorithm in a parking scenario. For more information about the implementation of the visual SLAM pipeline, see the Monocular Visual Simultaneous Localization and Mapping (Computer Vision Toolbox) example.

Set up Scenario in Simulation Environment

Use the Simulation 3D Scene Configuration block to set up the simulation environment. Select the built-in Large Parking Lot scene, which contains several parked vehicles. The visual SLAM algorithm matches features across consecutive images. To increase the number of potential feature matches, you can use the Parked Vehicles subsystem to add more parked vehicles to the scene. To specify the parking poses of the vehicles, use the helperAddParkedVehicle function. If you select a more natural scene, the presence of additional vehicles is not necessary. Natural scenes usually have enough texture and feature variety suitable for feature matching.

You can follow the Select Waypoints for Unreal Engine Simulation example to interactively select a sequence of parking locations. You can use the same approach to select a sequence of waypoints and generate a reference trajectory for the ego vehicle. This example uses a recorded reference trajectory and parked vehicle locations.

% Load reference path
data = load('parkingLotReferenceData.mat');

% Set reference trajectory of the ego vehicle
refPosesX = data.refPosesX;
refPosesY = data.refPosesY;
refPosesT = data.refPosesT;

% Set poses of the parked vehicles
parkedPoses = data.parkedPoses;

% Display the reference path and the parked vehicle locations
sceneName = 'LargeParkingLot';
hScene = figure;
helperShowSceneImage(sceneName);
hold on
plot(refPosesX(:,2), refPosesY(:,2), 'LineWidth', 2, 'DisplayName', 'Reference Path');
scatter(parkedPoses(:,1), parkedPoses(:,2), [], 'filled', 'DisplayName', 'Parked Vehicles');
xlim([-60 40])
ylim([10 60])
hScene.Position = [100, 100, 1000, 500]; % Resize figure
legend
hold off

Open the model and add parked vehicles.

modelName = 'GenerateImageDataOfParkingLot';
open_system(modelName);
snapnow;

helperAddParkedVehicles(modelName, parkedPoses);

Set Ego Vehicle and Camera Sensor

Set up the ego vehicle moving along the specified reference path by using the Simulation 3D Vehicle with Ground Following block. Mount a camera on the vehicle roof center by using the Simulation 3D Camera block and specify its intrinsic parameters. You can use the Camera Calibrator (Computer Vision Toolbox) app to estimate camera intrinsics of the actual camera that you want to simulate.

% Camera intrinsics
focalLength    = [700, 700];  % specified in units of pixels
principalPoint = [600, 180];  % in pixels [x, y]
imageSize      = [370, 1230]; % in pixels [mrows, ncols]
intrinsics     = cameraIntrinsics(focalLength, principalPoint, imageSize);

Visualize and Record Sensor Data

Run the simulation to visualize and record sensor data. Use the Video Viewer block to visualize the image output from the camera sensor. Use the To Workspace block to record the ground truth location and orientation of the camera sensor.

close(hScene)

if ~ispc
    error("Unreal Engine Simulation is supported only on Microsoft" + char(174) + " Windows" + char(174) + ".");
end

% Open video viewer to examine camera images
open_system([modelName, '/Video Viewer']);

% Run simulation
simOut = sim(modelName);
snapnow;

% Extract camera images as an imageDatastore
imds = helperGetCameraImages(simOut);

% Extract ground truth as an array of rigid3d objects
gTruth = helperGetSensorGroundTruth(simOut);

close_system([modelName, '/Video Viewer']);

Develop Visual SLAM Algorithm Using Recorded Data

Use the images to evaluate the visual SLAM algorithm. The function helperVisualSLAM implements the ORB-SLAM pipeline:

  • Map Initialization: ORB-SLAM starts by initializing the map of 3-D points from two images. Use relativeCameraPose (Computer Vision Toolbox) to compute the relative pose based on 2-D ORB feature correspondences and triangulate (Computer Vision Toolbox) to compute the 3-D map points. The two frames are stored in an imageviewset (Computer Vision Toolbox) object as key frames. The 3-D map points and their correspondences to the key frames are stored in a worldpointset object.

  • Tracking: Once a map is initialized, for each new image, the function helperTrackLastKeyFrame estimates the camera pose by matching features in the current frame to features in the last key frame. The function helperTrackLocalMap refines the estimated camera pose by tracking the local map.

  • Local Mapping: The current frame is used to create new 3-D map points if it is identified as a key frame. At this stage, bundleAdjustment (Computer Vision Toolbox) is used to minimize reprojection errors by adjusting the camera pose and 3-D points.

  • Loop Closure: Loops are detected for each key frame by comparing it against all previous key frames using the bag-of-features approach. Once a loop closure is detected, the pose graph is optimized to refine the camera poses of all the key frames using the optimizePoseGraph (Navigation Toolbox) function.

For the implementation details of the algorithm, see the Monocular Visual Simultaneous Localization and Mapping (Computer Vision Toolbox) example.

[mapPlot, optimizedPoses, addedFramesIdx] = helperVisualSLAM(imds, intrinsics);
Map initialized with frame 1 and frame 3
Loop edge added between keyframe: 5 and 173
Iteration 1, residual error 0.329001
Iteration 2, residual error 0.322270
Iteration 3, residual error 0.322259
Iteration 4, residual error 0.322259
Solver stopped because change in function value was less than specified function tolerance.

Evaluate Against Ground Truth

You can evaluate the optimized camera trajectory against the ground truth obtained from the simulation. Since the images are generated from a monocular camera, the trajectory of the camera can only be recovered up to an unknown scale factor. You can approximately compute the scale factor from the ground truth, thus simulating what you would normally obtain from an external sensor.

% Plot the camera ground truth trajectory
scaledTrajectory = plotActualTrajectory(mapPlot, gTruth(addedFramesIdx), optimizedPoses);

% Show legend
showLegend(mapPlot);

You can also calculate the root mean square error (RMSE) of trajectory estimates.

helperEstimateTrajectoryError(gTruth(addedFramesIdx), scaledTrajectory);
Absolute RMSE for key frame trajectory (m): 4.0924

Close model and figures.

close_system(modelName, 0);
close all

Supporting Functions

helperGetCameraImages Get camera output

function imds = helperGetCameraImages(simOut)
% Save image data to a temporary folder
dataFolder   = fullfile(tempdir, 'parkingLotImages', filesep);
folderExists = exist(dataFolder, 'dir');
if ~folderExists
    mkdir(dataFolder);
end

files = dir(dataFolder);
if numel(files) < 3
    numFrames = numel(simOut.images.Time);
    for i = 3:numFrames % Ignore the first two frames
        img = squeeze(simOut.images.Data(:,:,:,i));
        imwrite(img, [dataFolder, sprintf('%04d', i-2), '.png'])
    end
end

% Create an imageDatastore object to store all the images
imds = imageDatastore(dataFolder);
end

helperGetSensorGroundTruth Save the sensor ground truth

function gTruth = helperGetSensorGroundTruth(simOut)
numFrames = numel(simOut.images.Time);
gTruth = repmat(rigid3d, numFrames-2, 1);
for i = 1:numFrames-2 % Ignore the first two frames
    gTruth(i).Translation = squeeze(simOut.location.Data(:, :, i+2));
    % Ignore the roll and the pitch rotations since the ground is flat
    yaw = double(simOut.orientation.Data(:, 3, i+2));
    gTruth(i).Rotation = [cos(yaw), sin(yaw), 0; ...
        -sin(yaw), cos(yaw), 0; ...
        0, 0, 1];
end
end

helperEstimateTrajectoryError Calculate the tracking error

function rmse = helperEstimateTrajectoryError(gTruth, scaledLocations)
gLocations      = vertcat(gTruth.Translation);

rmse = sqrt(mean( sum((scaledLocations - gLocations).^2, 2) ));
disp(['Absolute RMSE for key frame trajectory (m): ', num2str(rmse)]);
end

See Also

(Computer Vision Toolbox) | (Computer Vision Toolbox) | (Computer Vision Toolbox) | (Computer Vision Toolbox) | (Computer Vision Toolbox)

Related Topics