Signed Distance Field
Euclidean Signed Distance Field (ESDF) & Truncated Signed Distance Field (TSDF), are increasingly being used for collision checking and collision costs in planning especially for visual SLAM.
Euclidean Signed Distance Fields (ESDFs) have long been used in planning literature for collision checking (especially of complex shapes), inferring distances and gradients to objects for planning, and ﬁnding large free areas. While RGB-D cameras, Kinect Fusion has made Truncated Signed Distance Fields (TSDFs) as a fast, ﬂexible map representation that implicitly computes the position of the surface using zero crossings.
ESDF VS TSDF
Though both of these representations are signed distance ﬁelds (SDFs), the way that the distance of a voxel is computed differs.
In the case of the ESDFs, a free voxel’s distance represents the Euclidean distance to the nearest occupied cell (or if inside an object, distance to the nearest free cell). The ESDF is computed for every voxel in the map.
On the other hand, the distance of a voxel in a TSDF represents the distance to the surface along the ray direction from the center of the sensor, i.e. it represents a distance to the nearest occupied cell not in Euclidean space, but along this one-dimensional ray extending from the sensor center and is truncated to only have values very near the surface, allowing for greater compression and decreasing errors due to this approximate distance metric.
Employing occupancy grids to map 3D scenes leads to huge memory requirements as well as slow ray-casts and look-ups for any space larger than a room. Occupancy grid just provide a binary information as to whether a cell is free or occupied and its representation is fixed sized cells.
The solution most commonly used in 3D contexts while building a map online is Octomap. It assigns probabilities to each raycast, merging multiple observations of the same scene together using flexible sized voxel.
This approach uses an octree-based representation of occupancy probabilities of cells in 3D space. The octree structure allows large blocks of space with the same probability to be represented by a single large cell, therefore vastly decreasing the amount of memory needed to represent areas of unknown or free space.
Since its advent, Octomap has been very widely used for 3D robotics applications, most notably for UAVs due to a number of factors:
1. The open-source implementation and associated ROS wrappers have made it a very easy off-the-shelf solution for many applications.
2. The probabilistic nature of the representation make it a good representation for noisy sensor data, such as stereo matching or RGB-D sensors where ‘speckles’ are common. This adds a level of low-pass ﬁltering even to sensors exhibiting non-Gaussian error models.
3. Optimal memory efﬁciency and High speed: the ﬂexible voxel size allows representing large areas, and with some straight-forward optimizations and performing collision checking in this space is also very fast.
However, this representation also has downsides:
1. The probability model used does not accurately represent the error model of vision-based depth sensing. Since Octomap was originally designed to use with laser measurements, the accuracy of which does not degrade with distance to the sensor.
2. Octomap sensor model has a single probability of occupancy for one voxel at the end of the ray-cast. However, this is not an accurate model for stereo or other vision based sensing, where it is possible to have an expected error of over a meter at high distances, depending on the camera setup.
3. Fails to provide additional information like gradient and distance to obstacles required for trajectory optimization based planner such as CHOMP and TrajOpt.
ESDF VS OCTOMAP
Trajectory optimization based planner such as CHOMP and TrajOpt require an ESDF form of representation that is not truncated, and contains distance values over the entire voxel space.
Having a distance map also speeds up collision checking of complex shapes — for example, many-jointed robot arms are commonly represented as a set of overlapping spheres and check the distance ﬁeld in the center of each sphere.
For gradient-based trajectory optimization methods, the collision cost (which is necessary to produce collision-free trajectories) also needs a gradient. For these, the ESDF gives a natural cost (a function, such as hinge loss or a smoothed hinge loss of the distance) and checking the distance values of the neighbors gives the gradient at a given point. This allows CHOMP and other such methods to follow the upward gradient of the distance to push points on the trajectory out of collision.
Table: Comparison of map building strategies from a single sensor scan of a line. (a) Shows an occupancy representation, where each cell is either labelled as occupied or free. (b) Shows a TSDF, which stores projective (along the sensor ray) distance information close to the object boundary. © Shows the ground truth ESDF, which represents the true Euclidean distance to the surface at each cell.
TSDF VS OCTOMAP
One advantage of the TSDF is that even when discretized, it models a continuous function. Therefore it is possible to recover the position of the surface at a precision above the minimum voxel size, allowing the use of larger voxels and therefore smaller maps in memory.
The other advantage over Octomap is that TSDF has two values for each voxel: the distance to the surface (along the ray from the camera) and the weight/probability of this measurement. This allows us to more accurately model the actual error of vision-based depth estimates, and when merging multiple measurements, leads to a maximum-likelihood estimate of the surface, since the surface is found as a zero crossing.