Real-Time Appearance-Based Mapping

RTAB-MAP (Real-Time Appearance-Based Mapping) is a graph-based SLAM approach using RGBD cameras, stereo cameras, or a 3D Lidar. It is based on an incremental appearance-based loop closure detector. The process uses a bag-of-words method to determine whether each newly acquired frame of image (or Lidar scan) came from a previous location or a new location. A graph optimizer is then applied to minimize the errors based on the frame. Throughout the process, a memory management scheme is applied to limit the number of locations used for the loop closure detection and graph optimization — this is to ensure acceptable real-time performance when dealing with a large-scale environment.
Loop closure detection is essential for the mapping process as it determines map completion and also allows the bot to localize itself. The underlying structure of the map essentially is a graph which consists of nodes and links. The nodes save the odometry poses for each location in the map. The nodes also contain visual information like RGB data, depth data and visual words used for loop closure detection. The links store rigid geometrical transformations between nodes. There are two types of links: neighbor and loop closure.
Neighbor links are added between the current and the previous nodes with their odometry transformation. Loop closure links are added when loop closure detection is found between the current node and one from the same or previous maps.
The various operations required for mapping are:
Loop Closure Detection: For global loop closure detection, the “Bag-of-Words” approach is used to form a visual dictionary by generating visual words using SURF (Speeded up Robust Features) and SIFT (Scale Invariant Feature Transform). In this approach, a Bayesian filter is used to evaluate loop closure hypotheses over all the previous images. When a loop closure hypothesis reaches some predefined threshold H, a loop closure is detected. Visual words are used to compute the likelihood required by the filter. Here the RGB image, from which the visual words are to be extracted, is combined with a depth image, i.e., for each 2D point in the RGB image, a corresponding 3D position can be computed using the calibration matrix and the depth information given by the depth image. When a loop closure is detected, the rigid transformation between the matching images is computed by a RANSAC (Random Sample Consensus) approach using the 3D visual word correspondences. If a minimum of inliers are found, loop closure is accepted and a link with this transformation between the current node and the loop closure hypothesis node is added to the graph. RANSAC essentially provides a yes or no answer to determine whether loop closure has been completed.
Graph Optimization: Generally in the field of robotics, the problem of minimizing non-linear errors is always present. These error functions can always be expressed as a graph. The overall goal in graph optimization is to find the configuration of parameters or state variables that maximally explain a set of measurements that are affected by Gaussian noise. When loop closures are found, the errors introduced by odometry can then be propagated to all links, thus correcting the map.
Memory Management for Mapping: For real time mapping any incoming data must be processed faster than the time required to acquire them. RTAB-Map employs a memory management approach is used to maintain a graph, which can be managed online using loop closure detection and graph optimization algorithms, thus making the metric SLAM approach independent of the size of the environment. The memory is composed of a Short-Term Memory (STM), a Working Memory (WM) and a Long-Term Memory (LTM), as shown by Figure. The STM is the point of entry for new nodes to be added to the graph whenever new data is acquired, with a fixed size S. When the STM size reaches S nodes, the oldest node is moved to WM to be considered for loop closure detection. The WM size depends on a fixed time limit T.

When the time required to process the new data reaches T, some nodes of the graph are transferred from WM to LTM, thus keeping the WM size nearly constant. Whenever loop closure is detected, neighbors in LTM of the old node can be transferred back to WM (a process called Retrieval) for further loop closure detections. In other words, when a robot revisits an area which was previously forgotten, it can remember the area if a least one node of this area is still in WM.