ORB-SLAM is based on the ORB descriptor which is invariant to large view point and rotational changes. The back-end based on bundle adjustment (BA) with monocular and stereo observations allows for accurate trajectory estimation with metric scale.
Characteristics of ORB SLAM:
1. It computes the camera trajectory and build a sparse 3D reconstruction.
2. It is able to detect loops and relocalize the camera in real time.
3. It can perform in real-time on standard CPUs.
4. It can be used in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city.
Major modules of ORB SLAM:
The algorithms works on three threads, a tracking thread, a local mapping thread and a loop closing thread.
Map is initialized by computing the relative pose between two scenes; it computes two geometrical models in parallel, one for a planar scene, a homography and one for non-planar scenes, a fundamental matrix.
Based on their relative score, one of the model is chosen. Using the selected model it estimates multiple motion hypotheses and checks if anyone of the hypothesis and its corresponding parameters is significantly better than the others. A full bundle adjustment is done on a successful check, otherwise the initialization starts over.
Map points and keyframes are created with a generous policy, while a later very exigent culling mechanism is in charge of detecting redundant keyframes and wrongly matched or not trackable map points.
Covisibility information between keyframes is maintained and represented as an undirected weighted graph. Each node is a keyframe and an edge between two keyframes exists if they share observations of the same map points
The tracking part localizes the camera and decides when to insert a new keyframe.
The features extracted are FAST corners. (For res. till 752x480, 1000 corners should be good, for higher 1241x376 res., 2000 corners works). Multiple scale-levels (factor 1.2) are used and each level is divided into a grid in which 5 corners per cell are attempted to be extracted. These FAST corners are then described using ORB.
Features are matched with the previous keyframe and the pose is optimized using motion-only bundle adjustment through re-projection. The initial pose is estimated using a constant velocity motion model. If the tracking is lost, the place recognition module kicks in and tries to re-localize itself.
When there is an estimation of the camera pose and feature matchings, the covisibility graph of keyframes that is maintained by the system is used to get a local visible map, see Fig. (a) & Fig. (b).
Finally it is decided if a new Keyframe needs to be created. New keyframes are inserted very frequently to make tracking more robust. A new keyframe is created when:
1. At least 20 frames has passed from the last keyframe.
2. Last global re-localization frame tracks at least 50 points of which less than 90% are point from the reference keyframe.
The local mapping processes new keyframes and performs local BA to achieve an optimal reconstruction in the surroundings of the camera pose.
Sometime after creation, based on the information gathered during the tracking, an exigent point culling policy is applied in order to retain only high quality points.
KeyFrame Insertion: First the new keyframe is inserted into the covisibility graph and the edges and the resulting spanning tree linking current keyframe with the keyframe with most points in common are updated. ‘Bag of words’ representation of the keyframe (used for data association for triangulating new points) is computed.
Recent Map Points Culling: Map points, in order to be retained in the map, must pass a restrictive test during the first three keyframes after creation which ensures that they are trackable and not wrongly triangulated, i.e. due to spurious data association.
A point must fulfill these two conditions:
1) The tracking must find the point in more than the 25% of the frames in which it is predicted to be visible.
2) After map point creation, it must be observed from at least three keyframes failing which it can be removed at any time if it is observed from less than three keyframes.
New Map Point Creation: New map points are created by triangulating ORB from connected keyframes in the covisibility graph. The unmatched ORB in a keyframe are compared with other unmatched ORB in other keyframes.
The match must fulfill the epipolar constraint to be valid. To be a match, the ORB pairs are triangulated and checked if in both frames they have a positive depth, and the parallax, re projection error and scale consistency is checked. Then the match is projected to other connected keyframes to check if it is also in these.
Local Bundle Adjustment: The local BA optimizes the currently processed keyframe , all the keyframes connected to it in the covisibility graph , and all the map points seen by those keyframes
Local Keyframe Culling: In order to maintain a compact reconstruction, the local mapping tries to detect redundant keyframes and delete them. It is beneficial as bundle adjustment complexity grows with the number of keyframes. It is done by discarding all the keyframes whose 90% of the map points have been seen in at least other three keyframes in the same or finer scale.
The loop closing thread takes the last keyframe Kl processed by the local mapping, and tries to detect and close loops.
Loop Candidate Detection: To detect possible loops, similarity score between bag of words vectors of the current keyframe and all its neighbors in the covisibility graph is computed. The min. similarity score is retained.
All the key frames whose score is less than this benchmark is discarded. Additionally all the keyframes that are already connected to the Ki are removed. If three loop candidates detected consecutively, this loop is regarded as a serious candidate.
Similarity Transformation: In monocular SLAM there are seven degrees of freedom in which the map can drift, three translations, three rotations and a scale factor. Therefore to close a loop, a similarity transformation need to be computed from current keyframe to the loop key frame. It gives idea about the error accumulated in the loop.
In order to compute the similarity score, RANSAC iterations are performed to find the similarity transformation.
Loop Fusion & Optimization: In order to correct the accumulated error in the loop duplicate map points are fused and new edges are inserted in the covisibility graph that will attach the loop closure.
The current keyframe pose in then adjusted and this is propagated to its neighbors. Finally a pose graph optimization is preformed over the essential graph to take out the loop closure created errors along the graph. This also corrects for scale drift.