Fourier Transform Laboratory Exercise

Question

Question 1: DFT Fundamentals

Explain the mathematical relationship between the Discrete Fourier Transform (DFT) and the Continuous Fourier Transform (CFT). What assumptions are made when applying DFT to real-world signals, and how does the sampling frequency affect the frequency resolution?

Answer 1

Answer: The DFT is the discrete version of the CFT, obtained by sampling both the time and frequency domains. The key relationship is:

\(X[k] = \sum_{n=0}^{N-1} x[n] e^{-j\frac{2\pi}{N}kn}\) (DFT)

\(X(f) = \int_{-\infty}^{\infty} x(t) e^{-j2\pi ft} dt\) (CFT)

Assumptions made when applying DFT:

The signal is periodic with period N (implicit periodicity assumption)
The signal is bandlimited (Nyquist theorem)
The signal is time-limited to the observation window

Sampling frequency effect: The frequency resolution Δf = Fs/N, where Fs is the sampling frequency and N is the number of samples. Higher Fs increases the Nyquist frequency (Fs/2) but doesn't improve resolution. To improve resolution, increase N (longer observation time).

Answer 2

Answer:

Computational Complexity:

Direct DFT: O(N²) operations - requires N² complex multiplications and additions
FFT (Cooley-Tukey): O(N log₂N) operations - requires approximately N log₂N operations

Calculation:

For N = 4096:

Direct DFT operations ≈ N² = 4096² = 16,777,216 operations
FFT operations ≈ N log₂N = 4096 × log₂(4096) = 4096 × 12 = 49,152 operations
Ratio = 16,777,216 / 49,152 ≈ 341 times fewer operations

Time estimation: If direct DFT takes 5 seconds, FFT would take approximately 5 / 341 ≈ 0.0147 seconds (14.7 ms).

This dramatic speed improvement is why FFT is used in real-time applications like Shazam, where rapid audio analysis is essential.

Answer 3

Answer:

Spectral Leakage: Spectral leakage occurs when a non-integer number of signal cycles are captured in the analysis window. This causes the signal energy to "leak" into adjacent frequency bins, creating artifacts in the frequency domain representation. It results from the implicit assumption in DFT that the signal is periodic within the observation window.

Effects:

Broadening of spectral peaks
Reduced frequency resolution
False frequency components appearing in the spectrum
Reduced dynamic range for detecting small signals near large ones

Methods to reduce spectral leakage:

Windowing: Multiply the time-domain signal by a window function (Hamming, Hann, Blackman) that tapers the signal at the edges, reducing discontinuities at window boundaries.
Increasing window length: Using longer observation windows reduces leakage but requires more computational resources.
Synchronous sampling: Ensure the sampling frequency and signal frequency are synchronized (capture integer number of cycles).

How windowing helps: Window functions gradually reduce signal amplitude at the edges, making the signal appear more periodic within the window. Different windows offer trade-offs between main lobe width (frequency resolution) and side lobe suppression (leakage reduction):

Rectangular window: Best resolution, worst leakage
Hamming window: Good compromise between resolution and leakage
Blackman window: Excellent leakage suppression, poorer resolution

Answer 4

Answer:

Shazam's use of Fourier Transform:

Spectrogram creation: Shazam applies Short-Time Fourier Transform (STFT) to create a time-frequency representation (spectrogram) of the audio signal.
Peak detection: The algorithm identifies local energy maxima (peaks) in the spectrogram. These peaks correspond to prominent frequency components at specific times.
Fingerprint generation: Pairs of peaks are combined into "hashes" of the form (f1, f2, Δt), where f1 and f2 are frequencies and Δt is their time difference.
Database matching: These hashes are compared against a precomputed database of song fingerprints.

Constellation Map: A constellation map is a sparse representation of the spectrogram containing only the most prominent peaks (like stars in a constellation). Each peak is represented as a point in the time-frequency plane.

Robustness features:

Invariance to amplitude: Only peak locations matter, not their amplitudes, making the system robust to volume changes.
Time-shift invariance: Using time differences (Δt) rather than absolute times makes fingerprints robust to when the recording starts.
Frequency bin tolerance: Matching allows for small frequency variations (±1-2 bins) to handle pitch shifts and imperfections.
Sparsity: Using only prominent peaks makes the system robust to noise (less prominent peaks affected by noise are ignored).
Combination hashing: Using pairs of peaks creates combinatorial diversity, making fingerprints highly specific even with relatively few peaks.

The constellation map approach allows Shazam to work effectively even with background noise, poor recording quality, or partial song fragments.

Answer 5

Answer:

Issues and inefficiencies:

Inefficient DFT implementation: The nested loops for DFT computation have O(N²) complexity. Should use FFT (fft function) which is O(N log N).
No windowing function: The code uses a rectangular window which causes spectral leakage. Should apply a window function (Hamming, Hann) before DFT.
Peak detection is too simplistic: Using a fixed threshold (0.5*max) may miss important peaks or include noise. Should use adaptive thresholding or prominence-based peak detection.
Inefficient hash storage: Growing array inside loop (hashes = [hashes; hash]) is inefficient in MATLAB. Preallocate array or use cell array.
Absolute time used: Using absolute time (dt = i / Fs) makes fingerprints sensitive to when recording starts. Should use relative time differences between peaks.
No frequency bin quantization: Frequencies are stored as continuous values. Should quantize to frequency bins for more robust matching.
Fixed pairing distance: Pairs peaks within fixed range (p1+5). Could miss important pairs or create too many irrelevant pairs.
No overlap handling: The hop size might be too large, missing temporal details.

Improved implementation:

function hashes = createFingerprintImproved(audio, Fs)
    % Improved fingerprint creation
    window_size = 1024;
    hop_size = 512;
    num_windows = floor((length(audio)-window_size)/hop_size) + 1;
    
    % Preallocate hash array (estimate size)
    max_hashes_per_window = 20;
    hashes = zeros(num_windows * max_hashes_per_window, 3);
    hash_count = 0;
    
    % Create window function
    win = hamming(window_size);
    
    for w = 1:num_windows
        % Extract and window signal
        start_idx = (w-1)*hop_size + 1;
        window_signal = audio(start_idx:start_idx+window_size-1) .* win;
        
        % Compute FFT (much more efficient)
        spectrum = fft(window_signal, window_size);
        mag_spectrum = abs(spectrum(1:window_size/2));
        
        % Better peak detection
        [peaks, peak_locs] = findpeaks(mag_spectrum, ...
            'MinPeakHeight', 0.2*max(mag_spectrum), ...
            'MinPeakProminence', 0.1*max(mag_spectrum), ...
            'MinPeakDistance', 10);
        
        % Convert to frequencies (quantized to bins)
        freq_bins = peak_locs;  % Already in bins
        
        % Create hashes from peak pairs (within time-frequency neighborhood)
        for i = 1:length(freq_bins)
            for j = i+1:min(i+5, length(freq_bins))
                f1 = freq_bins(i);
                f2 = freq_bins(j);
                delta_t = j - i;  % Time difference in windows
                
                % Store hash
                hash_count = hash_count + 1;
                hashes(hash_count, :) = [f1, f2, delta_t];
            end
        end
    end
    
    % Trim unused preallocated space
    hashes = hashes(1:hash_count, :);
end

Fourier Transform Laboratory Exercise

1 Laboratory Overview

Learning Objectives

Background

2 Fourier Transform Fundamentals

Shazam's Audio Fingerprinting Process

Audio Capture

Preprocessing

STFT

Peak Finding

Fingerprinting

3 MATLAB Implementation

Part A: Basic Fourier Transform Implementation

Part B: Audio Signal Analysis

Note

4 Laboratory Procedure

Step-by-Step Instructions

Important Notes

Post-Exercise Questions