emkor / audiopyle Goto Github PK

View Code? Open in Web Editor NEW

2.0 3.0 1.0 2.62 MB

Audio feature extraction engine based on VAMP plugins

License: GNU General Public License v3.0

Python 84.33% Shell 2.91% Makefile 1.05% HTML 9.17% JavaScript 2.53% CSS 0.02%

python python3 python-3 vamp-plugins mysql docker docker-compose celery rabbitmq audio-analysis

audiopyle's People

Contributors

Stargazers

Watchers

Forkers

hadryan

audiopyle's Issues

Coordinator - config storing in Redis

Coordinator may store its config in Redis. Example config is backblaze bucket access parameters. Also, it should read config stored in Redis at startup

Implement redis config client

Implement client for storing config in Redis

Implement basic audio converter as service, phase 1

The next component is converter. Service should convert a given mp3 file in local filesystem to the mono wav file with 16 bit and 44.1 kHz spec, without splitting file. It should be configurable (dir with input files, dir for output files) and it should take filename as an argument. Also, during conversion, it should indicate that output file is still in writing phase - maybe append _conversion to its filename and remove it after process ends.

Xtracer - write tests for AudioConverter

add some tests to verify AudioConverter from #7 is actually working

Coordinator - passing config

Add option of passing config (at container creation) to coordinator with multiple backblaze buckets access parameters

PostFeature: take a look at usability of each feature

Some of the features may not be as usable as the other ones. Research which one are time-consuming in analysis and may not be worth computing.

Find a way to serialize internal project models

Since we have some complex objects (containing other, nested objects) in our project and we need to serialize them in orders to store object instances in Redis etc, we need to find a way to do it effectively. One way is to implement to_json() in each of model-classes, other way would be to create some service to do it in one place, other one - create method which will map objects to JSONs recursively.

Pull constants out of classes

Most of constants currently are under classes (AudiopyleConst, for example). Time to get rid of those classes and define constants on the top level of a file, not inside class, and also fix calls to those constants.

Research which monitoring metrics system we could use

Do some googling and find what system we could use for monitoring performance / metrics of our system. Of course, we are aiming at something with docker support while being free.

Replace Docker ADD with COPY

Read some docs on Docker if my assumption is correct and COPY is more relevant than ADD, do changes accordingly and check the results with building docker images.

Xtracter - config passing

Xtracter should read config on container startup. Config may contain blacklisted vamp plugins, available Redis binds etc

Xtracter: expand converter class to handle more codecs

Add some methods for handling files other than just mp3, just as in this comment:
#60 (comment)

Xtracter #2 - connect with task queue

Xtracter containers should look at tasks in given Redis queue and start popping them and doing actual analysis

Persister #2: create basic persister docker image

Create basic persister docker image

Coordinator - state storing in Redis

Coordinator may store it's state in Redis as timestamp indicating what files were already added to task queue

Implement Redis client

implement client for redis queues

Lack of GLIBC_2.14 required by some VAMP Plugins

While building on travis, few warnings come up about lack of GLIBC_2.14 and GLIBC_2.15. Although tests are fine, some of the plugins may not work properly. Try to find out how to update GLIBC or find more suitable docker image.

Introduce proper logging

For now we are using only simple print() to log some messages and events, but it is highly unprofessional and we should introduce some proper logging (python has this built-in). So the task is to replace print with proper logging and add more logs where its necessary.

Check if commons-docker is actually needed

Our docker build looks like this:
-- docker-base-image
-----commons
-----------coordinator
-----------xtracter
-----------persister

So, the coordinator, xtracter and persister are all based on commons image, which is based on docker-base-image. Building commons image is time-consuming, so maybe it would be better to actually just install commons python package while building coordinator, xtracter and persister. It would look like this:
-- docker-base-image
-----commons+coordinator
-----commons+xtracter
-----commons+persister

Scraper: create specification, research potential sources

Scraper is the next app for audiopyle system.
Base idea is to download song information (like album, artist etc., maybe tags?) from internet sources, like MusicBrainz, Last.fm, Spotify etc and store it in specified format. It would be useful for ML and for feature analysis.

Persister #1: research python frameworks to access SQL

Find out which python framework would be OK to use while implementing access do DB (MySQL)

Visualizer: select track, draw waveform

Add ability to select track from DB and draw waveform for it

Add type annotations

Since our project is growing quite rapidly, soon we will feel the pain of Python's dynamic types. The fix would be to add type annotations in each public method and function - this way, IntelliJ can infer types of arguments and outputs.

PostFeature: select granularity for each audio feature

With Visualizer it should be easy to determine how precise should feature extraction be. By default, most of features are extracted from blocks size of 1024 frames (about 1/40 of a second of track), which gives us ~12000 feature values per 5-minute track, and its obvious we do not need so many for most of the features.
Select the right precision for each of the track.

Finish implementing xtracter

Prepare xtracter - complete service which reads local .wav file and extracts features using vamp and store them in json file in preconfigured directory.

Persistence #3: Make use of liquibase to track DB schema changes

liquibase is a system for managing database schemas it would be nice to integrate it into project since schemas may evolve quickly in early development phases

Xtracter #1: change models

Change models in Xtracter module so they better reflect actual results from VAMP plugins extraction

Optimize build & test process on Travis

Current build/test process is highly inefficient. It goes like this:

build commons docker image locally basing on pulled base debian 8 image
build xtracter docker image from DOWNLOADED commons docker image

We could make it more efficient by reusing locally built commons image.

Second thing is: lets assume in one push someone made changes to commons AND xtracter. During build process, commons docker image would be affected by just pushed changes, but xtracter image would be based on commons image downloaded from docker hub - so it would be outdated.

Xtracter: make use of converter

For now, xtracter can analyze only wav files, despite converter is ready. Time to make use of it.

Persistence #1: MySQL DB

Prepare script to run docker locally with MySQL db instance accessible from outside the docker.

Protip:
docker run -p 3306:3306 --name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:latest

Xtracter #4 - connect with results queue

Xtracter containers should send analysis results to results queue (Redis) immediately after results are available

Visualizer #1: create specification

We should have some GUI tool for visualizing extracted features, since there will be lot of work with them. The tool may not be perfect, since it would be our internal, but should have some basic features:

choosing track from db
show its waveform on timeline (load this as feature from DB and draw with html5)
allow user to select which feature to show for track and draw them accordingly on top of the waveform
maybe allow to play the track somehow(?)

Change B2 resources bucket structure

Create more logical resources bucket structure (having test resources and vamp plugins)

Implement audio file selector

This is temporary service (will be used only on non-scalable installation) which should select next audio file to analyze, based on list of bucket files and list of already analyzed files in MySQL

Add tests to Redis client

Write some integration level tests to for Redis client #15

Research docker linking mechanism

First concept assumes that once we run any docker container, we map its port with host port, and then pass that port to next docker container to make communication possible. For example, we map Redis Task Queue port to XXXX and Redis Results Queue port to YYYY and MySQL port to ZZZZ, and pass that ports to coordinator (XXXX), xtracters (XXXX to get tasks, YYYY to enable results submitting) and persister (YYYY to read results from cache, and ZZZZ to store results in mysql).

But maybe theres simpler way with usage of docker linking mechanisms?