Giter VIP home page Giter VIP logo

medleydb's People

Contributors

bmcfee avatar ejhumphrey avatar elanatee avatar faroit avatar nils-werner avatar rabitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

medleydb's Issues

Issue executing on python3.5.2

Installs fine but when I run import medleydb....

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nick/Downloads/medleydb-1.1/medleydb/__init__.py", line 20
    the top level Audio folder for MedeleyDB."""
                                               ^
SyntaxError: Missing parentheses in call to 'print'

Docstring completeness

In trying to parse the data in the pitch annotations, I found read_csv_file... it seems like what I want, but the docstring doesn't really give me an idea of what I'm going to get back.

In [18]: M.multitrack.read_csv_file?
Type:       function
String Form:<function read_csv_file at 0x108e8c1b8>
File:       /Library/Python/2.7/site-packages/medleydb/medleydb/multitrack.py
Definition: M.multitrack.read_csv_file(fpath, maxcols=None)
Docstring:
Read a csv file.

can not download the dataset

In order to run some code based on this dataset,i want to download this dataset,but the website seems can not be reached.
Screenshot from 2020-05-09 13-59-09
i wonder if the website has been changed?

Code has incorrect taxonomy.yaml path

Acknowledging that this is probably a result of how I've "installed" the code, I'm having an issue with the auto-derived taxonomy path.

To install, I cloned the repository locally and moved the whole dealy into my site-packages folder so I could access it system-wide. After configuring where the dataset lives, I tried to get files for an instrument voice. This results in an error, reporting that it's looking for the taxonomy here:

/blah/blah/site-packages/medleydb/medleydb/taxonomy.yaml

while the file is actually in ...

/blah/blah/site-packages/medleydb/taxonomy.yaml

I made a symlink into the subdirectory where it's looking because I'm a hack (and proud of it), but I'd up-vote either more specific install instructions so that this is avoided, or a setup installer that makes this moot.

Duplicate raw track

MusicDelta_Country2_RAW_04_01 seems to be a (not exact) duplicate of
MusicDelta_Country2_RAW_03_01

Add duration to metadata

Add the mix duration to the top level of the metadata files to make it possible to work with the annotations without the audio.

Expose/document interface to track list

There doesn't seem to be an easy (documented) way to get a list of the tracks without calling load_all_multitracks, which is expensive.

There is a TRACK_LIST array in the package, but it's not documented.

It would be useful to have access to this if I want to load an arbitrary track, but don't have the names/indices pre-computed somewhere.

Converting instrument activations to segments?

I re-read the medleydb paper, and it doesn't actually say how the instrument segment annotations were computed from the activation functions. Is it just threshold at 0.5 and then run-length encode samples to intervals? Or is there some smoothing involved?

Update ShuffleLabelsOut to use sklearn.model_selection

Currently, medleydb.utils.artist_conditional_split relies on the ShuffleLabelsOut class, which is using the soon to be deprecated sklearn.cross_validation. It needs to be updated to use sklearn.model_selection but things have changed enough that a simple switch breaks the code. @bmcfee I took a stab at this myself but don't understand what you did in the original version well enough to troubleshoot why my changes weren't working. Can you give it a go?

Missing Annotations for Allegria_MendelssohnMovement1

I am iterating over all of the tracks to get their annotations and I get a FileNotFoundError. I double and tripple-checked and there are no annotation files for this particular track. Do the files exist somewhere?

File loading error

#14 seems to break the track loading

a simple

mtrack_list = mdb.load_all_multitracks()

for track in mtrack_list:
    print track

now results in:

(.env)$ py test.py
Traceback (most recent call last):
  File "medleydb/utils.py", line 62, in load_multitracks
    yield M.MultiTrack(multitrack)
  File "medleydb/multitrack.py", line 75, in __init__
    self.title = _path_basedir(mtrack_path).split('_')[1]
IndexError: list index out of range

Add has_bleed annotations to stems

After discussing with @lostanlen, it makes sense to have has_bleed annotations at the stem level.

Open questions

  • do we leave the multitrack level has_bleed annotations, and if so, what is the criterion (at least one stem with bleed, majority of stems with bleed, etc.)
  • Can we make a reliable semi-automatic method to estimate for the database which stems have bleed? (@lostanlen, any ideas here?)

Environment variable name inconsistency

In medleydb/medleydb/__init__.py, there's a tiny typo in the assertion, and it's not clear which it should be (L11):

AssertionError: The environment variable MEDLEYDB_PATH
is not set. Set the value of MEDLEYDB_DIR to your local path to MedeleyDB.

Add TablaBreakbeatScience_WhoIsIt RAW files to Errata

TablaBreakbeatScience_WhoIsIt_RAW_03_01.wav
TablaBreakbeatScience_WhoIsIt_RAW_04_01.wav
TablaBreakbeatScience_WhoIsIt_RAW_04_02.wav
all have a lot of bleed.

They should probably be added to ERRATA.md

Not sure if the correct error code would be

1 Stem/Raw contains bleed, track not tagged as has_bleed
or
4 Raw does not match Stem

both seem to apply.

Non-alphanumerics in ACTIVATON_CONF file names

Hi,
Some instrument activation confidence file names โ€” in Annotations/Instrument Activations/ACTIVATION_CONF โ€” have discrepancies with the audio files they refer to. I think this is due to the presence of non-alphanumeric characters: parentheses, hyphens, and apostrophes.
Here is the list of before / after names.

CroqueMadame_Pilot(Lakelot)
CroqueMadame_Pilot

JoelHelander_IntheAtticBedroom(SuitePartThree)
JoelHelander_IntheAtticBedroom

Phoenix_BrokenPledge-ChicagoReel
Phoenix_BrokendPledge

Phoenix_Elzic'sFarewell
Phoenix_ElzicsFarewell

Phoenix_LarkOnTheStrand-DrummondCastle
Phoenix_LarkOnTheStrandDrummodCastle

Phoenix_SeanCaughlin's-TheScartaglan
Phoenix_SeanCaughlinsTheScartaglen

Stem Activations

Currently this package does not support to return the track activations. Is this of interest for you? I can implement this and send a PR.

Annotations in data dir

Recently the annotations have been added to the data dir of this repo

  • Could you elaborate (in the readme) why this has been done?
  • What about the annotations in the original medleydb folder? I guess the ones within the data folder are more up to date. So they will be discarded, correct?
  • If there are different version of the dataset around it would make sense to version (and add MD5 hashes) the tar.gz files as well and provide more information on the medleydb website.

activation_conf_from_stem returns time twice, no confidence

Quick example:

... or maybe the reverse. I'm not sure.

>>> t = next(medleydb.load_all_multitracks())
>>> t.activation_conf_from_stem(0)[:10]
[[0.0, 0.0],
 [0.0464, 0.0464],
 [0.0929, 0.0929],
 [0.1393, 0.1393],
 [0.1858, 0.1858],
 [0.2322, 0.2322],
 [0.2786, 0.2786],
 [0.3251, 0.3251],
 [0.3715, 0.3715],
 [0.418, 0.418]]

This seems to come from an error at this line, or possibly in the parsing of the confidence file.

Support python 3

In [1]: import medleydb
  File "/home/bmcfee/data/medleydb/medleydb/__init__.py", line 20
    the top level Audio folder for MedeleyDB."""

^
SyntaxError: Missing parentheses in call to 'print'

.. or at least do a from __future__ import print_function. But I suspect there are other less obvious gotchas kicking around.

(also Medley is misspelled :))

Download links to MedleyDB 2.0, where?

The MedleyDB website makes no mention of the ISMIR 2016 paper yet, and the link in the paper isn't working. I'm wondering where I can find the new MedleyDB 2.0 dataset?

I get that everyone has a lot on their plate which is great for MIR but I'd really like to hear the new tracks. ๐Ÿ˜„

Access to the EXTRA dataset audio

I noticed that there is plenty more songs in the EXTRA dataset. I also noticed that there is a download script pointing to a private Google Drive. Is it possible to gain access to this extra data?

Publish Python tools to PyPI?

Even though pip can be used with GitHub it would be neat to have the Python tools available on PyPI for easier installation with conda for example.

Partial loading of multitracks for speed

Loading multitracks (e.g. via the load_multitracks generator) is quite slow, a random timing of loading 20 tracks took on average 0.7 s per track, with one track even taking 1.6 s to load. When you only need a single bit of information about each track this becomes quite penalizing (especially during a dev phase where you iterate over your code).

It would be helpful if either the loading time was improved across the board somehow (if possible?) or, alternatively, there was the option to only load partial information about each multitrack (e.g. via an optional parameter that takes a list of the things you want to load) so that loading can be made more agile when not all the multitrack info is needed.

How much do the new automatic annotations differ from what a human listening test would conclude?

The initial release of MedleyDB contained human-generated melody annotations using the Tony tool [4]. However, the process was difficult to sustain in the long term, thus for this iteration of the dataset we rely primarily on automatic annotations. The automatic annotations include instrument activations and synthetic melody, multi-f0 and bass annotations.

(source)

I'm looking forward to MedleyDB 2.0 (when will it be available?) and intend to use it for various MIR tasks. However, I read that the new annotations are automatically generated and worry that this causes a chicken and egg problem. My hope was to use the annotations for training multi-f0 estimation models, but surely an upper bound on f-measure will be introduced by the fact that the annotations have been automatically generated themselves.

Could you expand a little on how the new annotations have been developed? How much do they differ compared to what human listeners would annotate? Particularly multi-f0 annotations are difficult to get right but I'm also concerned about onset annotations (and even melody annotations to some extent).

stem_activations needs documentation

Two points:

  • mtrack.stem_activations is a list of lists, but the docs say it's an ndarray
  • The shape does not line up to the number of stems; it's off by one. It looks like the first column is reserved for a time index, not activations.

The first point is an easy fix.

The second point is confusing, and it's generally not good style to mix indexing/addressing (ie timestamps) with observation data. (If you do so, it should definitely be documented.) I recommend refactoring this so that the time index is stored separately.

incorrect RANKINGS file - TheDistricts_Vermont

(Thank you to Yukara Ikemiya for reporting this mistake.)

The rankings for the two melody stems in TheDistricts_Vermont are both 1, and the resulting melody annotations are incorrect as a result.

Fix:
stem 05 should have rank 1, stem 07 should have rank 2
rerun generate melody annotations script

stem track list might better be a track dict

this way one could easily sort the stems or load specific stem tracks. Also that way special stems like the predominant just be pointer to the stems dict and wouldn't need to create a new object.

Proposal

mytrack.stems['1'] should output stem_idx 1 as a track object.

using a list is prone to errors since mytrack.stems[1] does not necessarily exist.

Some raw tracks still have clearly audible effects processing applied

AClassicEducation_NightOwl_RAW_13_01.wav and AClassicEducation_NightOwl_RAW_13_02.wav have some kind of modulation univibe-like effect applied to the vocals, and should not be considered raw.

I'm also guessing many raw tracks are dynamic range compressed and frequency equalized etc. (guessing artists tend to use channel-strips in their recording chain maybe) but that might be less of an issue in most applications. Are there any clear definitions on what constitutes a raw track in MedleyDB?

For reference, compare with AClassicEducation_NightOwl_RAW_13_04.wav which doesn't have the modulation effect.

Missing activation for Wolf_DieBekherte

There's no activation lab file for Wolf_DieBekherte here:
https://github.com/marl/medleydb/tree/master/medleydb/data/Annotations/Activation_Confidence

There's a file here though:
https://github.com/marl/medleydb/blob/master/medleydb/data/Annotations/Activation_Confidence/original_annotations/Wolf_DieBekherte_ACTIVATION_CONF.lab

This is breaking some code of mine that iterates over every mdb mix and does stuff with the activations, since track.stem_activations for this track is empty.

Why is the file missing?

pip failed to build medleydb

When running pip install . inside the root medleydb directory I get this error

Installing collected packages: medleydb
  Running setup.py install for medleydb ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-3m5VSF/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-jCG1KO/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/mix.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/version.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/multitrack.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/__init__.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/utils.py -> build/lib.linux-x86_64-2.7/medleydb
    copying medleydb/download.py -> build/lib.linux-x86_64-2.7/medleydb
    creating build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/tracklist_bach10.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/taxonomy.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/instrument_f0_type.json -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/mixing_coefficients.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/mixing_coefficients_version2.yaml -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/tracklist_v1.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/client_secrets.json -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/tracklist_extra.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/artist_index.json -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/tracklist_v2.txt -> build/lib.linux-x86_64-2.7/medleydb/resources
    copying medleydb/resources/pyin.n3 -> build/lib.linux-x86_64-2.7/medleydb/resources
    creating build/lib.linux-x86_64-2.7/medleydb/data
    error: can't copy 'medleydb/data/Metadata': doesn't exist or not a regular file
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-3m5VSF/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-jCG1KO/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-3m5VSF/

It feels like a similar error that is thrown when you attempt to cp a directory without specifying --recursive. I recommend a fix according to this Stackoverflow thread

--- Relevant Specifications ---
Ubuntu 16.04
Python 2.7.12
setuptools 20.7.0

ImportError: No module named 'medleydb.sql'

Also the command :

pip install -e .[sql]

gives the following:

medleydb 1.2.9 does not provide the extra 'sql'

although the medleydb gets installed fine by this command and medleydb can be imported without any error. Further the command

medleydb-export

gives the following error:

bash: medleydb-export: command not found

My OS is Centos 6.4 and Python version is 3.5.2.

warning: no files found matching 'medleydb/taxonomy.yaml'

Installing a Python package with setuptools, that has medleydb as a dependency, results in the following issue:

warning: no files found matching 'medleydb/taxonomy.yaml'

Here's the setup.py for reference:

setup(
    ...
    install_requires=["medleydb==1.2"],
    dependency_links=["git+git://github.com/marl/medleydb.git@medleydb_v1.2#egg=medleydb-1.2"]
)

Installing directly from git with pip places metadata and annotations directories incorrectly

The init script expects metadata to be placed in a repository root, but if building with pip directly from git such as

pip install git+git://github.com/marl/medleydb.git

the directory structure becomes different and the MedleyDB Python tools won't work.

Since using pip with git is a fairly common way of installing from source it would be nice if the directory structure wasn't assumed. A neat and transparent fix would be to look for all resources (taxonomy.yaml, tracklist_v1.txt, Annotations/, Metadata/, etc.) in the MEDLEYDB_PATH as well, in the init script.

Raw track has bleed

The meta data entry for Creepoid_OldTree_RAW says it has no bleed, but
Creepoid_OldTree_RAW_02_01.wav
contains bass and drums instead of bass only

Bleed label issue

(Thanks @lostanlen for noting this)

The following stems have bleed, but are not labeled as having bleed:

BrandonWebster_DontHearAThing_STEM_02.wav
ClaraBerryAndWooldog_Boys_STEM_05.wav
LizNelson_ImComingHome_STEM_02.wav

activation_conf_from_stem errors if not all stems have annotation

There are (rare) cases where the activation confidence annotations have only a subset of the stems annotated. Specifically, any stem labeled as instrument='Main System' is not annotated with stem activations. Morevover, the subset might not be ordered numerically, breaking the assumption in activation_conf_from_stem that all stems are listed and in order.

An example where this occurs:

>> import medleydb as mdb
>> mtrack = mdb.MultiTrack("Phoenix_ScotchMorris")
>> mtrack.stems.keys()
    [1, 2, 3, 4]
>> mtrack.activation_conf_from_stem(4)
    IndexError: list index out of range

Problem installing latest commit

Setting up a new machine with this and it seems there are some issues with the latest installer. It crashes on copying over the information:

Installing collected packages: medleydb
  Running setup.py install for medleydb: started
    Running setup.py install for medleydb: finished with status 'error'
    Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-1gmlwvfa-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-0tsnxo8i-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib
    creating build/lib/medleydb
    copying medleydb/version.py -> build/lib/medleydb
    copying medleydb/mix.py -> build/lib/medleydb
    copying medleydb/download.py -> build/lib/medleydb
    copying medleydb/multitrack.py -> build/lib/medleydb
    copying medleydb/__init__.py -> build/lib/medleydb
    copying medleydb/utils.py -> build/lib/medleydb
    creating build/lib/medleydb/resources
    copying medleydb/resources/taxonomy.yaml -> build/lib/medleydb/resources
    copying medleydb/resources/tracklist_bach10.txt -> build/lib/medleydb/resources
    copying medleydb/resources/mixing_coefficients_version2.yaml -> build/lib/medleydb/resources
    copying medleydb/resources/tracklist_v1.txt -> build/lib/medleydb/resources
    copying medleydb/resources/instrument_f0_type.json -> build/lib/medleydb/resources
    copying medleydb/resources/pyin.n3 -> build/lib/medleydb/resources
    copying medleydb/resources/artist_index.json -> build/lib/medleydb/resources
    copying medleydb/resources/client_secrets.json -> build/lib/medleydb/resources
    copying medleydb/resources/mixing_coefficients.yaml -> build/lib/medleydb/resources
    copying medleydb/resources/tracklist_extra.txt -> build/lib/medleydb/resources
    copying medleydb/resources/tracklist_v2.txt -> build/lib/medleydb/resources
    creating build/lib/medleydb/data
    error: can't copy 'medleydb/data/Annotations': doesn't exist or not a regular file

My guess is it is trying to copy the Annotations directory as a file, rather than the contents.

The latest release version installed fine though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.