The plap from eryk-urbanski

plap's Issues

Klasa Preprocessor

Basic Time-Domain Features Extraction

Creation of pipelines for different basic time domain features extraction. Possible parameterizations to implement:

Amplitude Envelope (AE)
Root Mean Square (RMS)
Zero-crossing Rate (ZCR)
Temporal Centroid (TC)

Steps:

Function for simple preprocessing using plap (read audio, framing, windowing)
Functions for calculating features (they should return the result)

In the next iteration the created functions will be rewritten in a way so they can be integrated with the existing plap code.

Guidelines:
Obviously these calculations mustn't use external libraries like Librosa. Only scipy and numpy can be used. Try prompting gpt4 for 'how to calculate feature name using an algorithm having already passed the input signal through block processing (framing) with overlapping and windowing' or something similar - experiment with prompts, because gpt may be helpful.
The results should be validated -> unit tests have to be written.

SPRINT 20.05

Zadanie kod - Eryk

klasa Parameterizer

Zadanie prezka - Natalia

Zadanie

MFCC Extraction Pipeline

The MFCC extraction pipeline aims to develop a method for extractic Mel-Frequency Cepstral Coefficients from audio signals. It should be implemented as a modular component allowing easy integration of other cepstral features in the future. Unit tests should be conducted. Validation of output data should be conducted as a comparison with values obtained using existing solutions (a module for obtaining those values is connected with another Issue).

MPEG-7 Low-Level Descriptors

Preprocessing SPRINT 16.04

Zadania kod - Eryk

Zadania research - Natalia + Antek

Jak zazwyczaj wygląda feature vector?
Zadajcie chatowi pytania (najlepiej po angielsku):
- Jak wygląda feature vector w systemach ASR (Automatic Speech Recognition), systemach Music Genre Classification, itd.
- Jak wygląda przekazywanie wartości wynikających z procesu nazywanego audio feature extraction do kolejnego bloku w systemach ASR...
  Czyli ogólnie praktycznie takie samo pytanie, ale inaczej sformułowane, żeby jak najwięcej różnych odpowiedzi wyciągnąć od chata. Zmieniajcie nazwy systemów, może być jeszcze np. Acoustic Scene Classification, Emotion in Speech Recognition. Jak dodacie, że te systemy mają być konkretnie w Pythonie zrobione, to spodziewam się odpowiedzi w stylu po prostu pythonowa lista z otrzymanymi wartościami, itd. Możecie też jeszcze jedno pytanie ułożyć, że macie teoretyczny zapis na temat doboru parametryzacji i chcecie to przenieść na kod w Pythonie i wtedy zobaczyć co wypluje. W załączniku zamieszczam jakiś papier autorstwa naszego opiekuna - skopiujcie treść rozdziału 3.2 (Parameter selection) - to jest ten 'teoretyczny zapis'.
  ranking-speech-features-for-their-usage-in-singing-emotion-classification_50094.pdf

Zadania dokumentacja - Michał

Schemat blokowy
Część z preprocessingiem jako pierwsza, bo idzie do dokumentacji etapu, więc wyeksportować jako obraz i wstawić do template'u. Całość z kolei pójdzie do dokumentacji technicznej całego projektu, ale templatem do tego się zajmę kiedy indziej, więc pełny schemat blokowy na razie zachować w modyfikowalnej formie w canvie/lucidcharcie (preferowałbym canvę, jakoś wydaje mi się, że ma więcej możliwości, pomimo bycia bardziej uniwersalnym narzędziem).

Baseline Functionality: Add Core Functionalities

Add fundamental functionalities:

reading .wav files
block processing
fft

Integracja wszystkich okien z scipy

MFCC From Existing Libraries

Creation of a module that allows the calculation of Mel-Frequency Cepstral Coefficients using existing open-source libraries. The libraries to use in order of importance: Librosa, Spafe, pyAudioAnalysis. The goal of this module is to allow our own implementation's results to be compared with values obtained from other libraries. This module should be written in a way so the values from different calculations are easily accessible. For example a function called mfcc_librosa returns a numpy array containing mfcc values. This module will be expanded in the future as we implement more features and so it should be constructed as modular and extendible component,

Steps:

Simple version: audio file loaded using i.e. librosa, mfcc results saved to variables, usage of different libraries for the calculations.
Functional enhancement: seperate function for audio input, seperate functions for mfcc calculations for each used library

Important!
Different libraries may perform mfcc calculation differently. They may take as input only a frame (a part of the whole signal) and some may take the whole signal. These details should be checked and documented in some way (i.e. comments in code), so it is much more clear on how the results should be compared with our own implementation.

LPC From Existing Libraries

Creation of a module that allows the calculation of LPC coefficients using existing open-source libraries. The libraries to use in order of importance: Librosa, Spafe. The goal of this module is to allow our own implementation's results to be compared with values obtained from other libraries. This module should be written in a way so the values from different calculations are easily accessible. For example a function called mfcc_librosa returns a numpy array containing mfcc values. This module will be expanded in the future as we implement more features and so it should be constructed as modular and extendible component,

Steps:

Simple version: audio file loaded using i.e. librosa, lpc results saved to variables, usage of different libraries for the calculations.
Functional enhancement: seperate function for audio input, seperate functions for lpc calculations for each used library

Important!
Different libraries may perform lpc calculation differently. They may take as input only a frame (a part of the whole signal) and some may take the whole signal or even something else. These details should be checked and documented in some way (i.e. comments in code), so it is much more clear on how the results should be compared with our own implementation.

Parametryzacja

Zadania kod - Eryk

klasa FeatureVector

Zadania prezentacja - Natalia

Zmienić w stosunku do prezki z pierwszego semu:

Założenia i cele

punkt 'Współpraca' zamienić na 'intuicyjność w wykorzystywaniu', ale w jednym słowie, które lepiej brzmi XD

Wyróżniające cechy

w różnorodności zostawić tylko 'Wiele różnych typów parametryzacji sygnałów audio'
w tym po prawej zapisać, że uporządkowana struktura biblioteki pozwala na wygodne rozszerzanie funkcjonalności czy coś takiego i zmienić odpowiednio tego tytuł

Przed Etapami Projektu wstawić slajd, na którym umieścimy schemat blokowy plapa
Etapy Projektu podzielić na dwa slajdy, bo etapów będzie chyba 7 (nazwy na razie wstawić te, co są na spg, ale możliwe, że będę trochę je zmieniał), fajnie w sumie jakby je nazywać na slajdach sprintami, to podczas prezentacji wspomnimy coś o Scrumie, żeby się podlizać komisji
Plan na przyszły semestr zostawić, pozostałe slajdy na razie zatrzymać

Zadania okno Kaisera (extension preprocessingu) - Michał

do omówienia na specjalnym spotkaniu

eryk-urbanski / plap Goto Github PK

plap's Introduction

plap's People

Contributors

Stargazers

Watchers

plap's Issues

Zadanie kod - Eryk

Zadanie prezka - Natalia

Zadanie

Zadania kod - Eryk

Zadania research - Natalia + Antek

Zadania dokumentacja - Michał

Zadania

Zadania kod - Eryk

Zadania prezentacja - Natalia

Zadania okno Kaisera (extension preprocessingu) - Michał

Recommend Projects

Recommend Topics

Recommend Org