Fourier transform (FT) is a mathematical transform that decomposes a function (often a function of time or a signal) into its constituent frequencies, and takes them from the time domain into the frequency domain.

The output of Fourier transform is a complex number, which is represented as

Here, xn is the value of the signal at time n. Xk is the Fourier coefficient for each frequency k. N is the total number of samples of the signal (i.e. the number of discrete time steps over which the signal was recorded). The FFT algorithm returns the Xk values for each frequency. The complex exponential can be decomposed into sines and cosines using the Euler’s formula. This provides a sanity check to the intuitions developed so far.

There is also the inverse of Fourier Transform (IFT), which takes a frequency domain image as input and then restores the original image. While displaying any image transformed from a spectrum using inverse Fourier transform, we usually take the absolute values of the output.

Time complexity of Fourier Transform is O (N²). Another popular variant of Fourier transform is Fast Fourier transform that minimizes this complexity by a strategy called divide and conquer to O (NlogN)

From the Fourier Transform Representation, we can see a central white speck in the image. This central speck is the DC component of the image, which gives the information of the color intensity of the image. If this central speck is altered, it can drastically change the image altogether. Note that there can be smaller white specks in the Fourier Transform Representation on some images — this pertains to the noise in the image.

In general, suppressing a vertical line in the Fourier Transform Representation will remove horizontal patterns in the original image, and vice versa.

Complex Fourier Series (CFS) radically changes its property when converted into different domains. On the other hand, Fourier Transform (FT) and Discrete Fourier Transform (DFT) stays in one place no matter which domain they are in.

Let us plot a signal with 3 frequencies and a DC component after Fourier Transform

FFT sequences are complex sequences. Because they contain information about amplitude and phase for a given frequency component and complex numbers are the best way to describe it. Let us take the absolute value (magnitude) of those sequences then.

This do not make much sense at all. First of all, there are 7 peaks (including the one at zero). But we were expecting 4 peaks, (3 for frequencies f1, f2, f3 and 1 for the DC component). Secondly, what has happened to the amplitude? It looks too high! The peaks are in wrong place as well.

Let us break the first confusion: why there are more peaks than we were expecting? This is where Single Side Band (SSB) spectrum comes to play. If you look closely, one side of the plot above is the mirror reflection of another side, dropping the peak at 0. This is because, for real signals, the coefficients of positive and negative frequencies become complex conjugates. That means, we do not need both sides of spectrum to represent the signal, a single side will do. This is known as Single Side Band (SSB) spectrum.

It turns out, if you take the spectrum associated with positive frequencies only and multiply it by 2, you will get the SSB spectrum

Next, we need to fix an appropriate x-axis for the FFT spectrum. In our case, we need to put the physical frequencies in the x-axis so that we know we are getting peaks at the correct frequencies. This is an extremely crucial step. To do that, we need to understand how FFT creates “bins”. For N point FFT, the number of bins created is N/2.

FFT is just an implementation of Discrete Fourier Transform (DFT). To discretize the continuum of frequencies, the frequency axis is evenly segmented into finite number of parts which are known as bins. Bins can be considered as spectrum samples. In our example, the sampling frequency Fs = 1000 samples/second. According to Nyquist law, the highest physical frequency which can be represented by its samples without aliasing is simply Fs/2 = 500 Hz. So, essentially the frequency spectrum is being segmented into strips of (Fs/2)/(N/2) or Fs/N bins.

To bring the peaks in the correct place, we need to use a normalization factor. To be specific, a normalization factor of signal length. That’s what we will do, we will normalize the spectrum with signal length. Now, let us put all of these understandings together and let us see what happens.

We have successfully retrieved our frequencies from deep beneath the noise.

You can see that the result image of the example above has brightest pixels (larger values) at the center and outer size pixels relatively less bright (smaller values).

The output frequency domain image tells us how much each frequency component is included in the original image. Pixels near the center represent lower frequency components, and outer side pixels represent higher frequency component. Thus, if pixels near the center are brighter than others outer side, this means that the original image is composed with lower frequency components more than higher frequency components.

Here, a lower frequency component of an image appears in the original image as rough shape of large objects. Conversely, a higher frequency component can add some details into the image, such as the texture of the ground in the example image above. Remember that small objects and edges of objects are mainly made of higher frequencies. The high-frequency features are sharply visible as narrow peaks. These include the eyes, the outlines of the facial structure, and the outline of the human hand. The low-frequency features, which are usually huge stretches of uniform color, are observed in the background. These include large regions of black colored fur and the background.

Minimizing computational cost: Convolution of two signals in spatial domain is equal to the multiplication of their Fourier transforms in the frequency domain. So instead of multiplying throughout the image with the kernel we could take the Fourier transform of it and just get a bit wise multiplication. Before the convolutional, transform the input and kernel to frequency domain then multiply then convert back. Even though it deals with transforming and reverse transforming still it is computationally less expensive.

Image Denoising: We can remove higher frequency components from the frequency domain image and then apply Inverse Fourier Transform on it, we can get a denoised image. This is the principle of Image Low Pass Filter. So in low pass filter only the center portion has high values which diminishes going beyond center. As we have already seen the center contains low frequency components. Thus it removes high frequency component when we multiply and keep low frequency.

Robotics Enthusiast. Well versed with computer vision, path planning algorithms, SLAM and ROS