6.808 - Lab 4


Lab 4: Detecting Human Gestures via Sound Signals

Assigned: 2022-03-11
Due: 2022-03-30


The goal of this lab is to implement code in Jupyter Notebook to detect human gestures via sound signals. We have provided you with all of the code to show the processed outcome (eg spectrograms, plots etc.), but you will need to implement some parts of the FMCW background subtraction code to be able to visualize human gestures.

This lab is based on a 6.808 project in Spring 2021 by Cooper Jones, Willie Zhu and Jan Wojcik.

Start by downloading the iPython code for this lab.

Known problem: Do not use Google Colab to run the .ipynb. The PortAudio library cannot be used in Colab


Start by downloading the Anaconda software. After installing, open the Anaconda Navigator and click on launch for Jupyter Notebook. It should open in Chrome. Create a new directory called “Lab_4_FMCW” and upload the “6.808 Lab 4 – Gesture Recognition via FMCW.ipynb” notebook file in that directory. Open the file and you should be able to build and run the different code blocks in the file. Specifically, click on the box under “Initialization” then click “Run”. Do the same thing for the box under “Filter Functions”. Now that you have initialized the functions, you are ready to implement the code for the lab.

For those unfamiliar with .ipynb, order in which you run the cells matter, so be aware when going back through different parts of this notebook (as in section 2)

Section 1 — Recording and processing FMCW Signals

The goal of this section is to successfully obtain spectrograms from the recorded FMCW signals. In the next section you will extract gestures using these spectrograms.

The basic sequence is as follows:

  1. Simultaneously transmit and receive FMCW (i.e, chirp) signals from your laptop
  2. Multiply the received FMCW chirp with the transmitted FMCW chirp, for each chirp you should see a peak in the baseband
  3. Pass the output of the multiplication through a low pass filter
  4. Extract the peak and convert it to distance using the slope of the FMCW chirp and the speed of sound
  5. Record distance estimates from different chirps and plot them as a function of time
  6. Observe how the plot changes with different gestures.

Section 1.1: Transmit and receive a single tone

Before you can start transmitting and receiving FMCW chirp signals, you need to know how to transmit a single frequency, what it sounds like, what it looks like in the time and frequency domains, and what its spectrogram looks like.

To do this you will need to implement the “play_and_record” function. Given a frequency, sampling rate, and duration as input, this function should play the corresponding sound (using the speaker) and record it (using the microphone); it should also return the recorded sound.

You should write code to do the following:

  1. Create the time domain signal x of frequency. It is a single tone (or frequency), i.e., a single sine or cosine wave.
  2. Play that sound and record it. You can use "y = sd.playrec(x)" to play the signal x, which will be recorded by the microphone and stored in array y.

Once you have successfully implemented this function, execute the code that comes after the function definition. You should hear a loud 10kHz tone for 2 seconds. When you run the following code blocks, you should be able to see what the signal looks like in time and frequency and what the spectrogram of the signal looks like.

Known issue for Mac Laptops: When running the “sd.playrec()” command, Google Chrome might ask for permission to get access to the microphone, please give access when you get this message otherwise, this command will only play the sound but it will not record it. If you didn’t give access then you will need to go to microphone settings in Mac and there you can give microphone access to Chrome.

Once everything is working fine, you should run subsequent blocks to see the received signal in the time domain, frequency domain, and the spectrogram.

Section 1.2: Transmit and receive a FMCW chirp

In this task, you will transmit and record a FMCW chirp signal. To do this you will need to implement the "play_and_record_chirp" function. Specifically you need to do the following:

  1. Create a chirp signal using the "chirp(t,f0=x,f1=x,t1=x,method='linear').astype(np.float32)" command where t is the time vector for your chirp, f0 and f1 are the starting and ending frequencies of your chirp, t1 is the chirp duration, and "method" tells you how the frequency should change as a function of time.
  2. Create a variable which has multiple repetitions of this chirp so that the total duration of the signal matches “total_duration”.
  3. Play that sound and record it. You can use "rx = sd.playrec(tx)" to play the signal tx, which will be recorded by the microphone and stored in array rx

Note: If the microphone is not able to record a good signal, try increasing the amplitude of your transmitted signal (tx) by scaling the entire signal by a constant factor i.e. 500 or 100

After you have successfully implemented this function, run the subsequent block. You should hear a periodic sweep from your speakers.

Subsequent code blocks do the following:

  1. Segment the transmitted and received FMCW signal into small chunks so that each chunk contains one FMCW chirp signal, this segmented data is stored as an array in “rx_data “and “tx_data”
  2. Plots the FFT (frequency domain representation) of the first segment , i.e “rx_data_sample”
  3. Mixes the first segment with the transmitted FMCW segment and plots the FFT of the mixed signal i.e. “multiplied_fft” (you should see a peak here below 5000 Hz
  4. Filters the mixed signal using a low pass filter, plots the FFT again and obtains the peak location (stored in “peak_location”)
  5. Repeats the same steps for all segments (“all_multiplied” and “all_multiplied_ffts”) and shows how the spectrogram looks like for different consecutive segments.

Section 2 — Extracting gestures from FMCW signals

Notice that the spectrograms look relatively similar. In this section, we will see how performing background subtraction allows us to track movements in the environment.

Section 2.1: Implement background_subtract

In this task you will perform background subtraction from the received FMCW chirp signals. To do this you need to implement the background_subtract function that takes in a series of mixed chirp segment FFTs (i.e. all_multiplied_ffts) as input. Specifically you need to do the following:

  1. Subtract consecutive mixed chirp signal segments in the frequency domain (i.e subtract consecutive segments in “all_multiplied_ffts”)
  2. Return the segmented array after subtraction

Subsequent code blocks do the following:

  1. Get the peak locations for all of the segments and obtain the median peak location
  2. Use the median peak location to zoom in on the subtracted FFTs (output of “background_subtract()” and store it in subtracted_filtered
  3. Plots the spectrogram of the zoomed in subtracted FFT
  4. Get the peak location in the subtracted FFT segments and plot them (stored in “argmaxes”)

Section 2.2: Implement idx_to_distance

In this task you will estimate the distance using the peak location. To do this you need to implement the “idx_to_distance” function which takes the peak location as input. Specifically you need to do the following:

  1. Implement the following equation:


    where "v" is the speed of sound in air, "slope" is the slope of your FMCW chirp and " Δ𝑓 " corresponds to the peak location.

  2. Store the result in the “distance” variable and return it as an output of the function.

Hint: The index of the peak is not equal to Δ𝑓 because it is not in Hz, how can you convert it to Hz?

Once you have successfully implemented this function, and if you run the next code block. It should plot the distance variation as a function of time.

Section 2.3: Recording Gestures

In this task, you will record different hand gestures. To do this, run the code for Task 1.2 again but this time while the chirp sound is playing, do the following separately:

  1. Move your hand up, bring it down, and then move it back up
  2. Slowly move your hand up and down 4-5 times

For each case your hand movement should be aligned with the direction of the sound.

Submission and Checkoff Instructions

Write up your answers to the following items in a single PDF file and name it lab4_kerberos.pdf or lab4_kerberos1+kerberos2.pdf (e.g. lab4_bnagda.pdf or lab4_bnagda+fadel.pdf). Upload the file to your private channel in Slack by Mar 30, 11:59 PM. If you work with a partner, you only have to submit once. You do not need to submit your code, but we may ask to look at your code during the checkoff.

  1. Names and MIT emails (including your lab partner, if available).
  2. In the plot entitled “Frequency Domain - Recieved FMCW signal”, why does the output look the way it does?
  3. In the plot entitled “Frequency Domain - Downconverted FMCW signal”, why is there a single lower-frequency peak? What does this peak correspond to?
  4. Attach a screenshot of:
    1. The spectrogram
    2. The distance vs time plot

    Do this for both movement patterns (i.e., 4 plots in total)

  5. Estimated number of hours you spent on this lab per person.
  6. Any comments/suggestions for the lab? Any questions for the checkoff? (Optional)

Bonus questions (Optional for extra credit)

  1. Why do we need to send a whole sweep (with a time interval of T) to estimate the gesture, why cant we just send two frequencies that are spaced by T seconds?
  2. Why do we need to subtract the first strongest peak from the remaining peaks to get the distance variation?

A checkoff will require successfully showing the two plots for the two different hand gestures, as well as explaining how each of the functions were implemented.