SLAM

Simultaneous Localization and Mapping or SLAM, for short, is a relatively well studied problem is robotics with a two-fold aim:
- Mapping: building a representation of the environment which is also called as “map” and
- Localization: finding where the robot is with respect to the map.
In this technique, we estimate our position but also the position of landmarks.
Characteristics of a Good Landmark:
- Landmark should be re-observable.
- It should be distinct from other landmarks and environment.
- It should be stationary.
Another term used in SLAM is Odometry, which estimates the change in the position of the mobile agent from the motion sensors, uses a starting position and a wheel displacement calculation to estimate the position of agent at a time t
Why simultaneous?
“Which came first, the chicken or the egg?” is an age old question which also answers the simultaneous bit of SLAM. If we already have a map, it is relatively easy to localize the robot with respect to it since we know what to localize against. Similarly, if for all times we know where the robot has been (position as well as orientation), it is easy to construct a map by superimposing sensor measurements in a common frame of reference. Starting with neither a map nor a location (it is customary to consider the first measurement location as the origin of the map), both localization and mapping need to be done simultaneously to get an updated location of the robot as well as an updated estimate of the map.
In SLAM we use both the prediction from the control input and odometry of the agent as well as the measurement we receive from sensors. There can be error both in the prediction and measurement phase and it is impossible to rule out all kinds of noise. Therefore it is imperative to find an optimal position of the agent based on both the prediction and measurement results.
Types of SLAM
Semantic SLAM focusses on the geometry and semantic meanings of the surface.
Monocular semantic online mapping (monoSOM) is a new trending topic where a neural network is used to fuse temporal sequences of monocular images from multiple camera into a semantic birds-eye-view map.
Problems with monocular SLAM
- Scale Ambiguity
- Scale drift
Pure rotation: Monocular SLAM dies under pure rotation, it’s that bad. With a single camera, the baseline (translation between two camera positions) is used to estimate the depth of the scene being observed, a process called triangulation. If a camera only rotates, the baseline is zero and no new points can be triangulated. What makes things worse is that the apparent motion on the image plane is greater under rotation than under translation. Effectively, the point for which we knew the depth whizz out of the field of view and no new points can be estimated since there is no baseline.
In the Graph SLAM formulation, the vertices in the graph are entities that we want to estimate (outputs): robot positions, location of points in the world, etc. Edges represent constraints between these entities which are derived from the raw sensor measurements (inputs).
Challenges in SLAM
The ultimate goal is to make SLAM handle environments with too many or very few salient features,
- Dimensionality of the configuration space.
- Continuous nature of configuration space.
- Dynamic environments (i.e. with lots of moving objects or people)
- Camera Issues : Non-smooth movements of the camera, occlusions, transparency and reflection.
- Computation time, memory, sensory device and motion constraints.