Giter VIP home page Giter VIP logo

rconfa / digital-signal-and-image-management-project Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 22.96 MB

The project consists in the development of an application for the recognition of one-dimensional signals (audio), two-dimensional signals (images) and retrieval of the 10 images most similar to a given query.

License: GNU General Public License v3.0

HTML 57.98% Jupyter Notebook 42.02%
face-recognition voice-recognition vggface keras neural-networks svm-classifier unimib digital-signal-processing image-classification audio-classification

digital-signal-and-image-management-project's Introduction

Digital Signal and Image Management project

Overview

The project consists in the development of an application for the recognition of one-dimensional signals (audio) and two-dimensional signals (images). Specifically we have developed three different task:

  • Processing-1D: Recognize the identity of the group member starting from a two-second audio with ML and DL models. For solve this task we have tried different models and different configuration of features (zero crossing rate, standard deviation, mfcc, spectrogram, etc...)
  • Processing-2D: Recognize the identity of the group member starting from an image with DL models. In this case we tried different pretrained architecture with weights based on general task (ImageNet) and on face recognition task (VGGFace).
  • Retrieval: Find the ten most famous VIP faces for each member of the group

Data

All the data used for this project were collected directly in the following ways:

  • Processing-1D: recording 100 audios for each five-second person. These audio were subsequently cut every 2 seconds and a data augmentation was applied, modifying their pitch and speed to increase the available data.
  • Processing-2D: taking 100 photos with variations of light and expression
  • Retrieval: three photos taken from the previous task were used

If you have question about the data or you need them please write me!

Notebook

  • Processing-1D. For this task we have developed three different notebook:
    • 1_AudioAcquisition: This notebook must be executed locally. It uses the default microphone for automatically registering all audios needed for the project.
    • 2_AudioRecognition: This notebook contains all the ML and DL models developed for solve this task. It also contains the code used for splitting and augmenting the data starting from the original five seconds audios.
    • 3_DemoLive: This notebook must be run locally, it uses the microphone and the camera to create a sort of live demo in which to demonstrate the effectiveness of the models developed for the voice recognition task.
  • Processing-2D. Again we have developed three different notebook with the same purpose but ready for images processing:
    • 1_ImageAcquisition: This notebook must be executed locally, as before it automatically snaps all the images needed by using the default camera.
    • 2_FaceRecognition: This notebook contains all the ML and DL models developed for solve this second task. In this folder you can also find a link to dowload the weights used for the VGGFace model.
    • 3_DemoLive: This notebook must be run locally, it uses the camera to create a sort of live demo in which to demonstrate the effectiveness of the models developed for the face recognition task.
  • Retrieval. This folder contains only one notebook that implement all the code necessary for solve the retrieval task. The dataset used with vip's faces can be download here.

You can also find the report and presentation made for the exam. Both in italian language.
If you need the trained models that we implemented please feel free to write me because their weights exceed the GitHub maximum allowed.

How to run code

Unless otherwise specified in the notebook section all codes can be runned in Google Colaboratory platform. All notebooks all already setted to import the necessary packages and also in this way you can easily use a GPU!

Unfortunately for the notebook that performs live demo and automatic acquisition you will need to use local environment because their required cams and microphone, for this notebook you need to install all the packages reported in the requirements file that you can find in each different folder.
Anyway if you have any problem just contact me for further information!

Results Table

Comparative result of models based on test set created by subsampling the original dataset:

  • Processing-1D: The first three models used the mfcc features, while the last CNN model use spectrogram image

    Architectures Accuracy Precision Recall F1-score
    SVM 0.83 0.83 0.83 0.83
    RandomForest 0.81 0.86 0.81 0.82
    CNN 0.97 0.97 0.97 0.97
    CNN on spectrogram 0.88 0.88 0.88 0.88
  • Processing-2D:

    Architectures Accuracy Precision Recall F1-score
    VGG16 0.94 0.94 0.95 0.94
    MobileNet-V2 0.98 0.98 0.98 0.98
    VGGFace 1.00 1.00 1.00 1.00

References

[1] S. Bianco, “Dispense e slide del corso digital signal and image management” 2021.
[2] O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition", 2015.

About us

Riccardo Confalonieri - Data Science Student @ University of Milano-Bicocca

Lorenzo Mora - Data Science Student @ University of Milano-Bicocca

Ginevra Mariani - Data Science Student @ University of Milano-Bicocca

digital-signal-and-image-management-project's People

Contributors

rconfa avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.