The objective of the hachathon was to create an audio data classification algorithm to predict the emotion of a person. The audio data was converted into spectrograms and MFCCs(Mel-frequency cepstral coefficients) which were used to train the CNN model.
A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams.
In sound processing, the mel-frequency cepstrum is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.