Fourier Transform Laboratory Exercise

Audio Signal Processing and Shazam Fingerprinting with MATLAB

For Undergraduate Electrical Engineering Students

Duration: 3 hours
Level: Undergraduate
Tools: MATLAB R2020a+
Includes MATLAB Code

1 Laboratory Overview

Learning Objectives

  • Understand the mathematical foundation of Fourier Transform and its applications
  • Implement Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT) in MATLAB
  • Analyze audio signals in both time and frequency domains
  • Simulate Shazam's audio fingerprinting algorithm using spectrogram analysis
  • Apply windowing techniques to reduce spectral leakage
  • Compare different Fourier transform implementations for efficiency

Background

Shazam is a popular music identification service that uses audio fingerprinting technology. At the core of this technology lies the Fourier Transform, which converts time-domain audio signals into frequency-domain representations. This transformation allows Shazam to create unique "fingerprints" of songs by identifying prominent frequency peaks and their temporal relationships.

In this laboratory exercise, you will implement key components of audio fingerprinting using MATLAB, exploring how Fourier Transform enables music recognition systems like Shazam to work efficiently.

2 Fourier Transform Fundamentals

Continuous Fourier Transform: \(F(\omega) = \int_{-\infty}^{\infty} f(t) e^{-j\omega t} dt\)
Discrete Fourier Transform: \(X[k] = \sum_{n=0}^{N-1} x[n] e^{-j\frac{2\pi}{N}kn}\)
Inverse DFT: \(x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k] e^{j\frac{2\pi}{N}kn}\)

Shazam's Audio Fingerprinting Process

1

Audio Capture

Record audio sample (10-15 seconds)

2

Preprocessing

Bandpass filter & downsampling

3

STFT

Short-Time Fourier Transform

4

Peak Finding

Identify prominent frequency peaks

5

Fingerprinting

Create hashes from peak pairs

3 MATLAB Implementation

Part A: Basic Fourier Transform Implementation

DFT Implementation (dft_manual.m)
%% Discrete Fourier Transform Manual Implementation
function X = dft_manual(x)
% DFT_MANUAL - Manual implementation of Discrete Fourier Transform
%   X = DFT_MANUAL(x) computes the DFT of input signal x
%   
%   Input:
%       x - input signal (vector of length N)
%   Output:
%       X - DFT coefficients (complex vector of length N)

N = length(x);
X = zeros(1, N);

% Precompute the exponential factor for efficiency
W = exp(-1j * 2 * pi / N);

% Compute DFT using the definition
for k = 0:N-1
    sum_val = 0;
    for n = 0:N-1
        sum_val = sum_val + x(n+1) * W^(k*n);
    end
    X(k+1) = sum_val;
end
end

%% Test the DFT implementation
% Generate test signal
Fs = 1000;            % Sampling frequency (Hz)
t = 0:1/Fs:1-1/Fs;    % Time vector (1 second)
f1 = 50;              % First frequency component (Hz)
f2 = 120;             % Second frequency component (Hz)

% Create signal with two frequency components
x = 0.7*sin(2*pi*f1*t) + sin(2*pi*f2*t);

% Compute DFT using manual implementation
X_manual = dft_manual(x);
N = length(x);
f = Fs*(0:(N/2))/N;   % Frequency vector for plotting

% Compute magnitude spectrum
P2_manual = abs(X_manual/N);
P1_manual = P2_manual(1:N/2+1);
P1_manual(2:end-1) = 2*P1_manual(2:end-1);

% Plot results
figure('Position', [100, 100, 1200, 400]);

subplot(1,3,1);
plot(t, x);
title('Time Domain Signal');
xlabel('Time (s)');
ylabel('Amplitude');
grid on;

subplot(1,3,2);
plot(f, P1_manual);
title('Magnitude Spectrum (Manual DFT)');
xlabel('Frequency (Hz)');
ylabel('|X(f)|');
grid on;

% Compare with MATLAB's built-in FFT
X_fft = fft(x);
P2_fft = abs(X_fft/N);
P1_fft = P2_fft(1:N/2+1);
P1_fft(2:end-1) = 2*P1_fft(2:end-1);

subplot(1,3,3);
plot(f, P1_fft);
title('Magnitude Spectrum (MATLAB FFT)');
xlabel('Frequency (Hz)');
ylabel('|X(f)|');
grid on;

% Calculate and display error
error = norm(P1_manual - P1_fft);
fprintf('Error between manual DFT and MATLAB FFT: %e\n', error);
Figure 1: Time-domain signal and frequency spectra comparison

Part B: Audio Signal Analysis

Audio Analysis (audio_analysis.m)
%% Audio Signal Analysis for Shazam-like Fingerprinting
clear; close all; clc;

% Load an audio file (ensure you have an audio file in the working directory)
% For this example, we'll create a synthetic audio signal
Fs = 44100;                 % Sampling frequency (standard audio rate)
duration = 3;               % Duration in seconds
t = 0:1/Fs:duration-1/Fs;  % Time vector

% Create a musical chord (C major: C4, E4, G4)
f_C4 = 261.63;              % C4 frequency (Hz)
f_E4 = 329.63;              % E4 frequency (Hz)
f_G4 = 392.00;              % G4 frequency (Hz)

% Generate the chord with different amplitudes
audio_signal = 0.5*sin(2*pi*f_C4*t) + 0.3*sin(2*pi*f_E4*t) + 0.2*sin(2*pi*f_G4*t);

% Add some noise to simulate real recording
noise = 0.05 * randn(size(audio_signal));
audio_signal = audio_signal + noise;

% Normalize the signal
audio_signal = audio_signal / max(abs(audio_signal));

%% Part 1: Time-Frequency Analysis using Spectrogram
figure('Position', [100, 100, 1000, 800]);

% Compute and display spectrogram (STFT)
subplot(3,2,1);
spectrogram(audio_signal, 1024, 512, 1024, Fs, 'yaxis');
title('Spectrogram of C Major Chord');
colorbar;

% Alternative: Manual STFT implementation
window_size = 1024;
overlap = 512;
nfft = 1024;

[S, F, T] = spectrogram(audio_signal, window_size, overlap, nfft, Fs);

subplot(3,2,2);
imagesc(T, F, 10*log10(abs(S)));
axis xy;
xlabel('Time (s)');
ylabel('Frequency (Hz)');
title('Manual Spectrogram Implementation');
colorbar;

%% Part 2: Peak Detection in Frequency Domain
% Compute FFT of entire signal
N = length(audio_signal);
X = fft(audio_signal);
f = Fs*(0:(N/2))/N;

% Compute magnitude spectrum
P2 = abs(X/N);
P1 = P2(1:N/2+1);
P1(2:end-1) = 2*P1(2:end-1);

subplot(3,2,3);
plot(f, P1);
title('Full Signal Frequency Spectrum');
xlabel('Frequency (Hz)');
ylabel('|X(f)|');
grid on;

% Find peaks in the magnitude spectrum
[peaks, peak_locs] = findpeaks(P1, 'MinPeakHeight', 0.1, 'MinPeakDistance', 50);

% Annotate peaks on the plot
hold on;
plot(f(peak_locs), peaks, 'ro', 'MarkerSize', 8, 'LineWidth', 2);
hold off;

% Display detected frequencies
fprintf('Detected Frequency Peaks:\n');
for i = 1:min(length(peak_locs), 5)
    fprintf('Peak %d: %.2f Hz (Magnitude: %.4f)\n', i, f(peak_locs(i)), peaks(i));
end

%% Part 3: Windowed Analysis (Reducing Spectral Leakage)
window_types = {'rectwin', 'hamming', 'hann', 'blackman'};
window_names = {'Rectangular', 'Hamming', 'Hann', 'Blackman'};

for w = 1:length(window_types)
    % Create window
    eval(sprintf('win = %s(256);', window_types{w}));
    
    % Apply window to a segment of the signal
    segment = audio_signal(10000:10000+255) .* win';
    
    % Compute FFT of windowed segment
    N_seg = length(segment);
    X_seg = fft(segment, nfft);
    f_seg = Fs*(0:(nfft/2))/nfft;
    
    P2_seg = abs(X_seg/N_seg);
    P1_seg = P2_seg(1:nfft/2+1);
    P1_seg(2:end-1) = 2*P1_seg(2:end-1);
    
    subplot(3,2,3+w);
    plot(f_seg, P1_seg);
    title(sprintf('Windowed Spectrum (%s)', window_names{w}));
    xlabel('Frequency (Hz)');
    ylabel('|X(f)|');
    grid on;
    xlim([200, 500]);
end

% Add tight layout
sgtitle('Audio Signal Analysis for Shazam-like Fingerprinting', 'FontSize', 14);

Note

In real Shazam implementation, the algorithm creates "fingerprints" by pairing frequency peaks in the spectrogram. Each fingerprint consists of (f1, f2, Δt) where f1 and f2 are frequency peaks and Δt is their time difference.

4 Laboratory Procedure

Step-by-Step Instructions

  1. Set up your MATLAB environment: Ensure you have MATLAB R2020a or later with Signal Processing Toolbox installed.
  2. Implement the DFT function: Create a MATLAB function dft_manual.m that implements the Discrete Fourier Transform formula. Test it with simple signals.
  3. Compare with built-in FFT: Generate a test signal with multiple frequency components and compare the results of your DFT implementation with MATLAB's built-in fft function.
  4. Analyze audio signals: Load or generate an audio signal containing musical notes. Use the provided code to create spectrograms and identify frequency peaks.
  5. Implement peak detection: Modify the peak detection algorithm to find prominent frequency peaks in different time windows of the audio signal.
  6. Simulate Shazam fingerprinting: Create a simple fingerprinting algorithm that pairs detected frequency peaks and creates unique hashes.
  7. Experiment with windowing: Compare the effects of different window functions (Hamming, Hann, Blackman) on frequency resolution and spectral leakage.
  8. Optimize the implementation: Measure the computational time of your DFT implementation versus MATLAB's FFT for different signal lengths (N = 64, 256, 1024, 4096).

Important Notes

  • Save your MATLAB scripts with appropriate names and in a dedicated folder for this lab.
  • Use comments in your code to explain each section.
  • Capture figures and results for your lab report.
  • The manual DFT implementation has O(N²) complexity, so use small N values for testing.

Post-Exercise Questions

Question 1: DFT Fundamentals

Explain the mathematical relationship between the Discrete Fourier Transform (DFT) and the Continuous Fourier Transform (CFT). What assumptions are made when applying DFT to real-world signals, and how does the sampling frequency affect the frequency resolution?

Answer: The DFT is the discrete version of the CFT, obtained by sampling both the time and frequency domains. The key relationship is:

\(X[k] = \sum_{n=0}^{N-1} x[n] e^{-j\frac{2\pi}{N}kn}\) (DFT)
\(X(f) = \int_{-\infty}^{\infty} x(t) e^{-j2\pi ft} dt\) (CFT)

Assumptions made when applying DFT:

  1. The signal is periodic with period N (implicit periodicity assumption)
  2. The signal is bandlimited (Nyquist theorem)
  3. The signal is time-limited to the observation window

Sampling frequency effect: The frequency resolution Δf = Fs/N, where Fs is the sampling frequency and N is the number of samples. Higher Fs increases the Nyquist frequency (Fs/2) but doesn't improve resolution. To improve resolution, increase N (longer observation time).

Question 2: FFT Efficiency

The Fast Fourier Transform (FFT) is an efficient algorithm for computing the DFT. Compare the computational complexity of a direct DFT implementation versus the FFT algorithm. If a DFT of length N=4096 takes 5 seconds using a direct implementation, approximately how long would it take using an FFT algorithm (assuming the time is proportional to the number of operations)?

Answer:

Computational Complexity:

  • Direct DFT: O(N²) operations - requires N² complex multiplications and additions
  • FFT (Cooley-Tukey): O(N log₂N) operations - requires approximately N log₂N operations

Calculation:

For N = 4096:

  • Direct DFT operations ≈ N² = 4096² = 16,777,216 operations
  • FFT operations ≈ N log₂N = 4096 × log₂(4096) = 4096 × 12 = 49,152 operations
  • Ratio = 16,777,216 / 49,152 ≈ 341 times fewer operations

Time estimation: If direct DFT takes 5 seconds, FFT would take approximately 5 / 341 ≈ 0.0147 seconds (14.7 ms).

This dramatic speed improvement is why FFT is used in real-time applications like Shazam, where rapid audio analysis is essential.

Question 3: Spectral Leakage

What is spectral leakage in the context of Fourier analysis, and how does it affect frequency domain representation? Describe two methods to reduce spectral leakage, and explain how windowing functions help mitigate this problem.

Answer:

Spectral Leakage: Spectral leakage occurs when a non-integer number of signal cycles are captured in the analysis window. This causes the signal energy to "leak" into adjacent frequency bins, creating artifacts in the frequency domain representation. It results from the implicit assumption in DFT that the signal is periodic within the observation window.

Effects:

  1. Broadening of spectral peaks
  2. Reduced frequency resolution
  3. False frequency components appearing in the spectrum
  4. Reduced dynamic range for detecting small signals near large ones

Methods to reduce spectral leakage:

  1. Windowing: Multiply the time-domain signal by a window function (Hamming, Hann, Blackman) that tapers the signal at the edges, reducing discontinuities at window boundaries.
  2. Increasing window length: Using longer observation windows reduces leakage but requires more computational resources.
  3. Synchronous sampling: Ensure the sampling frequency and signal frequency are synchronized (capture integer number of cycles).

How windowing helps: Window functions gradually reduce signal amplitude at the edges, making the signal appear more periodic within the window. Different windows offer trade-offs between main lobe width (frequency resolution) and side lobe suppression (leakage reduction):

  • Rectangular window: Best resolution, worst leakage
  • Hamming window: Good compromise between resolution and leakage
  • Blackman window: Excellent leakage suppression, poorer resolution
Question 4: Shazam Algorithm

Describe how Shazam uses the Fourier Transform for audio fingerprinting. Explain the concept of a "constellation map" in Shazam's algorithm and how it contributes to the robustness of the fingerprinting system against noise and distortions.

Answer:

Shazam's use of Fourier Transform:

  1. Spectrogram creation: Shazam applies Short-Time Fourier Transform (STFT) to create a time-frequency representation (spectrogram) of the audio signal.
  2. Peak detection: The algorithm identifies local energy maxima (peaks) in the spectrogram. These peaks correspond to prominent frequency components at specific times.
  3. Fingerprint generation: Pairs of peaks are combined into "hashes" of the form (f1, f2, Δt), where f1 and f2 are frequencies and Δt is their time difference.
  4. Database matching: These hashes are compared against a precomputed database of song fingerprints.

Constellation Map: A constellation map is a sparse representation of the spectrogram containing only the most prominent peaks (like stars in a constellation). Each peak is represented as a point in the time-frequency plane.

Shazam constellation map

Robustness features:

  1. Invariance to amplitude: Only peak locations matter, not their amplitudes, making the system robust to volume changes.
  2. Time-shift invariance: Using time differences (Δt) rather than absolute times makes fingerprints robust to when the recording starts.
  3. Frequency bin tolerance: Matching allows for small frequency variations (±1-2 bins) to handle pitch shifts and imperfections.
  4. Sparsity: Using only prominent peaks makes the system robust to noise (less prominent peaks affected by noise are ignored).
  5. Combination hashing: Using pairs of peaks creates combinatorial diversity, making fingerprints highly specific even with relatively few peaks.

The constellation map approach allows Shazam to work effectively even with background noise, poor recording quality, or partial song fragments.

Question 5: MATLAB Implementation

Analyze the following MATLAB code snippet that simulates part of the Shazam fingerprinting process. Identify any potential issues or inefficiencies in the implementation and suggest improvements.

function hashes = createFingerprint(audio, Fs)
    % Create audio fingerprints using DFT
    N = length(audio);
    window_size = 1024;
    hop_size = 512;
    
    hashes = [];
    
    for i = 1:hop_size:N-window_size
        % Extract window
        window = audio(i:i+window_size-1);
        
        % Apply DFT
        spectrum = zeros(1, window_size);
        for k = 1:window_size
            sum_val = 0;
            for n = 1:window_size
                sum_val = sum_val + window(n) * exp(-1j*2*pi*(k-1)*(n-1)/window_size);
            end
            spectrum(k) = sum_val;
        end
        
        % Find peaks (simple threshold)
        peaks = find(abs(spectrum(1:window_size/2)) > 0.5*max(abs(spectrum)));
        
        % Create hashes from peak pairs
        for p1 = 1:length(peaks)
            for p2 = p1+1:min(p1+5, length(peaks))
                f1 = peaks(p1) * Fs / window_size;
                f2 = peaks(p2) * Fs / window_size;
                dt = i / Fs;  % Absolute time
                
                % Create hash (simplified)
                hash = [f1, f2, dt];
                hashes = [hashes; hash];
            end
        end
    end
end

Answer:

Issues and inefficiencies:

  1. Inefficient DFT implementation: The nested loops for DFT computation have O(N²) complexity. Should use FFT (fft function) which is O(N log N).
  2. No windowing function: The code uses a rectangular window which causes spectral leakage. Should apply a window function (Hamming, Hann) before DFT.
  3. Peak detection is too simplistic: Using a fixed threshold (0.5*max) may miss important peaks or include noise. Should use adaptive thresholding or prominence-based peak detection.
  4. Inefficient hash storage: Growing array inside loop (hashes = [hashes; hash]) is inefficient in MATLAB. Preallocate array or use cell array.
  5. Absolute time used: Using absolute time (dt = i / Fs) makes fingerprints sensitive to when recording starts. Should use relative time differences between peaks.
  6. No frequency bin quantization: Frequencies are stored as continuous values. Should quantize to frequency bins for more robust matching.
  7. Fixed pairing distance: Pairs peaks within fixed range (p1+5). Could miss important pairs or create too many irrelevant pairs.
  8. No overlap handling: The hop size might be too large, missing temporal details.

Improved implementation:

function hashes = createFingerprintImproved(audio, Fs)
    % Improved fingerprint creation
    window_size = 1024;
    hop_size = 512;
    num_windows = floor((length(audio)-window_size)/hop_size) + 1;
    
    % Preallocate hash array (estimate size)
    max_hashes_per_window = 20;
    hashes = zeros(num_windows * max_hashes_per_window, 3);
    hash_count = 0;
    
    % Create window function
    win = hamming(window_size);
    
    for w = 1:num_windows
        % Extract and window signal
        start_idx = (w-1)*hop_size + 1;
        window_signal = audio(start_idx:start_idx+window_size-1) .* win;
        
        % Compute FFT (much more efficient)
        spectrum = fft(window_signal, window_size);
        mag_spectrum = abs(spectrum(1:window_size/2));
        
        % Better peak detection
        [peaks, peak_locs] = findpeaks(mag_spectrum, ...
            'MinPeakHeight', 0.2*max(mag_spectrum), ...
            'MinPeakProminence', 0.1*max(mag_spectrum), ...
            'MinPeakDistance', 10);
        
        % Convert to frequencies (quantized to bins)
        freq_bins = peak_locs;  % Already in bins
        
        % Create hashes from peak pairs (within time-frequency neighborhood)
        for i = 1:length(freq_bins)
            for j = i+1:min(i+5, length(freq_bins))
                f1 = freq_bins(i);
                f2 = freq_bins(j);
                delta_t = j - i;  % Time difference in windows
                
                % Store hash
                hash_count = hash_count + 1;
                hashes(hash_count, :) = [f1, f2, delta_t];
            end
        end
    end
    
    % Trim unused preallocated space
    hashes = hashes(1:hash_count, :);
end