Homography

4 min readJun 28, 2021

Homography is a planar relationship that transforms points from one plane to another. Homography stores the position and orientation of the camera and this can be retrieved by decomposing the homography matrix.

Homogeneous Coordinate

Since there is a conversion from 3D to 2D when taking a picture, the scale of depth is lost. Therefore, an infinite amount of 3D points can be projected to the same 2D point, making homogeneous coordinates very versatile in describing the ray of possibilities since they are similar by scale. Homogeneous coordinates simply take the normal cartesian coordinates, and augment a dimension to the end.

Given a homogeneous coordinate, divide all elements by the last element of the vector (scale factor), and then the Cartesian coordinate is a vector composing of all the elements except the last.

Projection Matrix

The Projection or Fundamental Matrix is a multiplication of 2 other matrices that are related to the camera properties. They are the Extrinsic and Intrinsic camera matrices. These matrices store the extrinsic parameters and the intrinsic parameters of the camera respectively.

The Extrinisic matrix maps 3D real-world points to 3D image points. This information is stored in a rotation matrix as well as a translation vector. The rotation matrix stores the camera’s 3D orientation while the translation vector stores its position in 3D space.

Three types of rotations are required as extrinsic parameters which are pitch, roll and yaw.

Rotation around the front-to-back axis is called roll.

Rotation around the side-to-side axis is called pitch.

Rotation around the vertical axis is called yaw.

The rotation matrix and the translation vector are then concatenated to create the extrinsic matrix. Functionally, the extrinsic matrix transforms 3D homogeneous coordinates from the global to the camera coordinate systems. The camera Extrinsic matrix is also called essential matrix and it is part of the fundamental matrix which is nothing but the projection matrix.

The intrinsic matrix stores the camera intrinsic such as focal length and the principal point.

P = [R|T] K

If we have the 2D coordinates, then using calibration parameters, we can map to 3D and vice versa using the following equation.

Now Homography is a special case of the pinhole camera model where all the real world coordinates lying on a plane where the z coordinate is 0. Here is a derivation for Homography.

Derivation of Homography

H is the Homography matrix, a 3 by 3 matrix that transforms points from one plane to another.

Here, the transformation is between the plane where Z = 0 and the image plane that points get projected onto. The Homography matrix is usually solved through the 4 point algorithm. Essentially, it uses 4 point correspondences from the 2 planes to solve for the Homography matrix.

Once we have the Homography matrix, we can decompose it into translation and rotation of the camera.

Difference between Homography and Fundamental Matrix

Fundamental matrix is also the mapping of the world point between two views but it maps image of a point in one view to an epipolar line in the second view¹. Point on the second view can be on any point on the epipolar line. Since there is a line constraint for the point in the second view it is a rank deficient (=2) matrix unlike in Homography where there are no constraints in either views, which results in a full rank matrix. Fundamental matrix is independent of scene structure unlike Homography which requires all the world points to lie on a plane i.e. Z-Coordinate should be 0. Thus, Homography is a special case of fundamental matrix. Homography is generally used to map a plane to another plane while fundamental matrix is used to calculate depths of scene structure with objects of varying depths.

Homography for Visual Localization

Advantages

Using Homography is far simpler than other algorithms because of how straightforward and more intuitive it is. Other approaches that utilize the Fundamental or Essential matrix require complicated algorithms and more effort to implement.

Disadvantages

Since Homography is only possible when the Z coordinate equals 0, it only works in scenarios where the desired target lies on a plane. Otherwise, other approaches are necessary to localize such as Epipolar Geometry which does not have such constraints.

This article gave a conceptual overview to pinhole camera model, fundamental matrix, its decomposition into camera extrinsic and intrinsic matrices, Homography matrix, its difference from projection matrix and its advantages and disadvantages for visual localization.

Homography

Homogeneous Coordinate

Projection Matrix

Derivation of Homography

Difference between Homography and Fundamental Matrix

Advantages

Disadvantages

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Dibyendu Biswas

No responses yet