Giter VIP home page Giter VIP logo

va_project's Introduction

1. Target of This Repo

This repo is set up to assist you to finish your final project of the class "Introduction to Visual-Auditory Information System". This repo mainly consists of three parts: the auditory feature extractor ( afeat_extractor), visual feature extractor (vfeat_extractor) and also a simple project demo (proj_demo) which is used to predict the similarity of the audio and silent video.

2. Code Description

2.1 Feature Extractors

afeat_extractor and vfeat_extractor are respectively used to extract the visual and auditory features of the video. Specifically, in our project, we extract the 128d auditory feature and 1024d visual feature every second, and we totally extract 120 seconds of features. Therefore, every video corresponds to 120×128 auditory feature and 120×1024 visual feature, which are respectively saved as the numpy compressed file (*.npy).

  • The audio feature is extracted by a Vgg-like CNN model (implemented in tensorflow).

  • The visual feature is extracted by the inception v3 model (implemented in pytorch).

2.2 How to use the feature extractors

Before using them to extract features, you should firstly download the pretrained vggish models and pretrained inception models, and then respectively put them under the folder "afeat_extractor/" and folder "vfeat_extractor/pretrained/".

Moreover, you should also install the required dependencies, such as pytorch and tensorflow. The detailed requirements can be found in the subfolders "afeat_extractor" and "vfeat_extractor".

2.3 Project Demo

proj_demo provides one simple example to learn the similarity metric between the 120×1024 visual feature and 120×128 auditory feature. Note: the provided demo was implemented in pytorch.

3. Dataset

The provided training dataset includes 1300 video folders, each of which contains five parts:

  • frames: containing 125 video frames, where are sampled at the rate 1 frame per second
  • *.mp4: the 125 seconds of video file without audio
  • *.wav: the 125 seconds of audio file
  • afeat.npy: the numpy compressed auditory feature (120*128)
  • vfeat.npy: the numpy compressed visual feature (120*1024)

Note: we extract 125 seconds of video and audio file just to ensure that we can obtain 120 seconds of features.    

The following pic shows one example

The total dataset containing all the five parts takes about 60GB memory, and can be downloaded through the campus network. If you only use the extracted auditory feature and visual feature, then you can download the feature-only-dataset (about 150MB) from Baidu Yun.

4. Acknowlegdements

  • The original implementation of the visual feature extractor could be found from this link.

  • The original implementation of the auditory feature extractor could be found from this link.

5. Q&A

If you have any question, just contact us through e-mails or add a new issue under this repo!

va_project's People

Contributors

uzeful avatar

Forkers

shanzhao14

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.