Digital Signal and Image Management project

Overview

The project consists in the development of an application for the recognition of one-dimensional signals (audio) and two-dimensional signals (images). Specifically we have developed three different task:

Processing-1D: Recognize the identity of the group member starting from a two-second audio with ML and DL models. For solve this task we have tried different models and different configuration of features (zero crossing rate, standard deviation, mfcc, spectrogram, etc...)
Processing-2D: Recognize the identity of the group member starting from an image with DL models. In this case we tried different pretrained architecture with weights based on general task (ImageNet) and on face recognition task (VGGFace).
Retrieval: Find the ten most famous VIP faces for each member of the group

Data

All the data used for this project were collected directly in the following ways:

Processing-1D: recording 100 audios for each five-second person. These audio were subsequently cut every 2 seconds and a data augmentation was applied, modifying their pitch and speed to increase the available data.
Processing-2D: taking 100 photos with variations of light and expression
Retrieval: three photos taken from the previous task were used

If you have question about the data or you need them please write me!

Notebook

Processing-1D. For this task we have developed three different notebook:
- 1_AudioAcquisition: This notebook must be executed locally. It uses the default microphone for automatically registering all audios needed for the project.
- 2_AudioRecognition: This notebook contains all the ML and DL models developed for solve this task. It also contains the code used for splitting and augmenting the data starting from the original five seconds audios.
- 3_DemoLive: This notebook must be run locally, it uses the microphone and the camera to create a sort of live demo in which to demonstrate the effectiveness of the models developed for the voice recognition task.
Processing-2D. Again we have developed three different notebook with the same purpose but ready for images processing:
- 1_ImageAcquisition: This notebook must be executed locally, as before it automatically snaps all the images needed by using the default camera.
- 2_FaceRecognition: This notebook contains all the ML and DL models developed for solve this second task. In this folder you can also find a link to dowload the weights used for the VGGFace model.
- 3_DemoLive: This notebook must be run locally, it uses the camera to create a sort of live demo in which to demonstrate the effectiveness of the models developed for the face recognition task.
Retrieval. This folder contains only one notebook that implement all the code necessary for solve the retrieval task. The dataset used with vip's faces can be download here.

You can also find the report and presentation made for the exam. Both in italian language.
If you need the trained models that we implemented please feel free to write me because their weights exceed the GitHub maximum allowed.

How to run code

Unless otherwise specified in the notebook section all codes can be runned in Google Colaboratory platform. All notebooks all already setted to import the necessary packages and also in this way you can easily use a GPU!

Unfortunately for the notebook that performs live demo and automatic acquisition you will need to use local environment because their required cams and microphone, for this notebook you need to install all the packages reported in the requirements file that you can find in each different folder.
Anyway if you have any problem just contact me for further information!

Results Table

Comparative result of models based on test set created by subsampling the original dataset:

Processing-1D: The first three models used the mfcc features, while the last CNN model use spectrogram image

Architectures Accuracy Precision Recall F1-score

SVM 0.83 0.83 0.83 0.83

RandomForest 0.81 0.86 0.81 0.82

CNN 0.97 0.97 0.97 0.97

CNN on spectrogram 0.88 0.88 0.88 0.88
Processing-2D:

Architectures Accuracy Precision Recall F1-score

VGG16 0.94 0.94 0.95 0.94

MobileNet-V2 0.98 0.98 0.98 0.98

VGGFace 1.00 1.00 1.00 1.00

Architectures	Accuracy	Precision	Recall	F1-score
SVM	0.83	0.83	0.83	0.83
RandomForest	0.81	0.86	0.81	0.82
CNN	0.97	0.97	0.97	0.97
CNN on spectrogram	0.88	0.88	0.88	0.88

Architectures	Accuracy	Precision	Recall	F1-score
VGG16	0.94	0.94	0.95	0.94
MobileNet-V2	0.98	0.98	0.98	0.98
VGGFace	1.00	1.00	1.00	1.00

References

[1] S. Bianco, “Dispense e slide del corso digital signal and image management” 2021.
[2] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition", 2015.

rconfa / digital-signal-and-image-management-project Goto Github PK

digital-signal-and-image-management-project's Introduction

Digital Signal and Image Management project

Overview

Data

Notebook

How to run code

Results Table

References

About us

Riccardo Confalonieri - Data Science Student @ University of Milano-Bicocca

Lorenzo Mora - Data Science Student @ University of Milano-Bicocca

Ginevra Mariani - Data Science Student @ University of Milano-Bicocca

digital-signal-and-image-management-project's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent