Giter VIP home page Giter VIP logo

xeno-canto-py's Introduction

xeno-canto API Wrapper

xeno-canto-py is an API wrapper designed to help users download xeno-canto.org recordings and associated information in an efficient manner. Download requests are processed concurrently using the asyncio, aiohttp and aiofiles libraries to optimize retrieval time. The wrapper also offers delete and metadata generation functions for recording library management.

Created to aid in data collection and filtering for the training of machine learning models.

Installation

xeno-canto-py is available on PyPi and can be downloaded with the package manager pip to install xeno-canto-py.

pip install xeno-canto

The package can then be used straight from the command-line:

xeno-canto -dl Bearded Bellbird

Or imported into an existing Python project:

import xenocanto

For users who want more control over the wrapper, navigate to your desired file location in a terminal window and then clone the repository with the following command:

git clone https://github.com/ntivirikin/xeno-canto-py

The only file required for operation is xenocanto.py, so feel free to remove the others or move xenocanto.py to another working directory.

WARNING: Please exercise caution using test.py as executing the tests via unittest or other test harness will delete any dataset folder in the working directory following completion of the tests.

Usage

The xeno-canto-py wrapper supports the retrieval of metadata and audio from the xeno-canto database, as well as library management functions such as deletion of recordings matching input tags, removal of folders with an insufficient amount of audio recordings and generation of a single JSON metadata file for a given path containing xeno-canto audio recordings. Examples of command usage are given below.


Metadata Download xeno-canto -m [parameters]

Downloads metadata as a series of JSON files and returns the path to the metadata folder.

Example: Metadata retrieval for Bearded Bellbird recordings of quality A

xeno-canto -m Bearded Bellbird q:A


Audio Recording Download xeno-canto -dl [parameters]

Retrieves the metadata for the request and uses it to download audio recordings as MP3s from the database.

Example: Download Bearded Bellbird recordings from the country of Brazil

xeno-canto -dl Bearded Bellbird cnt:Brazil


Delete Recordings xeno-canto -del [parameters]

Delete recordings with ANY of the parameters given as input.

Example: Delete ALL quality D recordings and ALL recordings from Brazil

xeno-canto -del q:D cnt:Brazil


Purge Folders

Removes any folders within the dataset/audio/ directory that have less recordings than the input value num.

xeno-canto -p [num]

Example: Remove recording folders with less than 10 recordings (not inclusive)

xeno-canto -p 10


Generate Metadata

Generates metadata for the xeno-canto database recordings at the input path, defaulting to dataset/audio/ within the working directory if none is given.

xeno-canto -g [path]

Example: Generate metadata for the recordings located in bird_rec/audio/ within the working directory

xeno-canto -g bird_rec/audio/


parameters are given in tag:value form in accordance with the API search guidelines. For help in building search terms, consult the xeno-canto API guide and this article. The only exception is when providing English bird names as an argument to the delete function, which must be preceded with en: and have all spaces be replaced with underscores.

Directory Structure

Files are saved in the working directory under the folder dataset/. Metadata and audio recordings are separated into metadata/ and audio/ folders by request information and bird species respectively. For example:

dataset/
    - audio/
        - Indigo Bunting/
            - 14325.mp3
        - Northern Cardinal/
            - 8273.mp3
    - metadata/
        - library.json
        - IndigoBuntingcnt_Canada/
            - page1.json
        - NorthernCardinalq_A/
            - page1.json

Metadata is retrieved as a JSON file and contains information on each of the audio recordings matching the request parameters provided as input. The metadata also contains the download links used to retrieve the audio recordings. The library.json file is generated by running the metadata generation command -g.

Error 503

If an Error 503 is given when attempting a recording download, try passing a value lower than 4 as the num_chunks value in download(filt, num_chunks). This can either be done by changing the default value in the function definition for download(filt, num_chunks), or by passing a value into download(params) in the body of main() as shown below.

# Running with default 4 locks on semaphore
asyncio.run(download(params))

# Running with 3 locks rather than default
asyncio.run(download(params, 3))

Alternatively, you can try experimenting with higher values for num_chunks to see some performance improvements.

Contributing

All pull requests are welcome! If any issues are found, please do not hesitate to bring them to my attention.

Acknowledgements

Thank you to the team at xeno-canto.org and all its contributors for putting together such an amazing database.

License

MIT

xeno-canto-py's People

Contributors

bilzard avatar dandavison avatar ntivirikin avatar rehroman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

xeno-canto-py's Issues

correct syntax to pass a location to xenocanto.download?

Hi,
when I have locations with more than one word, I'm not being successfull to pass them as 'params' to xenocanto.dowload.

For example... something like

params = ["loc:Manhattan"]
asyncio.run(xenocanto.download(filt=params))

works well.

But something like:
params = ["loc:New", "York"]
asyncio.run(xenocanto.download(filt=params))

does not. Have tried to change params to a single string "loc:New York", to ["loc:New York"], to ["loc:New\ York"], but all those options fail...

Any clues?

Empty download

Hi,

I'm trying to use your nice tool from command line but always got an empty dataset:

  • On my browser this query (https://xeno-canto.org/api/2/recordings?query=pinson ) returns 8083 records
    {"numRecordings":"8083","numSpecies":"4","page":1,...}
  • With xeno-canto -m pinso I got an empty dataset
    {'numRecordings': '0', 'numSpecies': '0', 'page': 1, 'numPages': 1, 'recordings': []}
  • Trying directly using your code or another one with requests (see below):
    {'numRecordings': '0', 'numSpecies': '0', 'page': 1, 'numPages': 1, 'recordings': []}
import json
from urllib import request, error
url = 'https://www.xeno-canto.org/api/2/recordings?query=pinson'
try:
    r = request.urlopen(url)
except error.HTTPError as e:
    print("An error has occurred: " + str(e))
    exit()

data = json.loads(r.read().decode('UTF-8'))
print(data)

got

  {'numRecordings': '0', 'numSpecies': '0', 'page': 1, 'numPages': 1, 'recordings': []}

Same with

import requests
url = 'https://xeno-canto.org/api/2/recordings'
headers = {"User-Agent": "Mozilla/5.0" }
params = {"query" : "pinson", "page" : 1 }
r = requests.get(url, params=params)
r.json()

I think there is another security on the server preventing use outside browser, but I do not find what it is

If you have an idea ?

Thanks for helping

How would this be used with a "Basic Query"?

Hi, thanks for this code. I'm wondering how it would be used with a basic query for the species as well as the advanced queries?, e.g.:

xenocanto.get_rec(['Oxyura jamaicensis','cnt:Colombia','q>:D'])

This throws an error:

`---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
in
18 # Country: Colombia, quality > class D (A-C)
19 print(str(i))
---> 20 xenocanto.get_rec(['Oxyura jamaicensis','cnt:Colombia','q>:D'])

~/anaconda3/lib/python3.7/site-packages/xenocanto.py in get_rec(search)
123 # Combines get_json and get_mp3 for convenience
124 def get_rec(search):
--> 125 json_list = get_json(search)
126 mp3_list = get_mp3(json_list)
127

~/anaconda3/lib/python3.7/site-packages/xenocanto.py in get_json(search)
47 url = url_builder(search)
48 try:
---> 49 r = request.urlopen(url)
50 except error.HTTPError as e:
51 err_log(e)

~/anaconda3/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223
224 def install_opener(opener):

~/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
529 for processor in self.process_response.get(protocol, []):
530 meth = getattr(processor, meth_name)
--> 531 response = meth(req, response)
532
533 return response

~/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
639 if not (200 <= code < 300):
640 response = self.parent.error(
--> 641 'http', request, response, code, msg, hdrs)
642
643 return response

~/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
567 if http_err:
568 args = (dict, 'default', 'http_error_default') + orig_args
--> 569 return self._call_chain(*args)
570
571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
501 for handler in handlers:
502 func = getattr(handler, meth_name)
--> 503 result = func(*args)
504 if result is not None:
505 return result

~/anaconda3/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 400: Bad Request'`

Support for multithreaded downloads.

Would it be possible to get support for downloading multiple audio files concurrently?

I would be happy to write this myself and submit a pull request! Without this it takes a lot of time to download the files.

Cannot download audio file

Minimum Reproducible Example

$ xeno-canto -dl Erckel\'s Francolin
Downloading all recordings for query...
Retrieving metadata...
Downloading metadate page 1...
Found 11 recordings for given query, downloading...
Downloading 410277.mp3...
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_x86/bin/xeno-canto", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_x86/lib/python3.8/site-packages/xenocanto.py", line 284, in main
    download(params)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_x86/lib/python3.8/site-packages/xenocanto.py", line 130, in download
    request.urlretrieve(url, audio_path + audio_file)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_x86/lib/python3.8/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_x86/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_x86/lib/python3.8/urllib/request.py", line 522, in open
    req = meth(req)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_x86/lib/python3.8/urllib/request.py", line 1278, in do_request_
    raise URLError('no host given')
urllib.error.URLError: <urlopen error no host given>

downloads WAV files with MP3 extension

All downloaded files appear to get a .mp3 filename regardless of their format. I've checked the media file headers, this example is for sure a WAV.

I download datasets for Southern Hairy-legged Myotis using:

xeno-canto -dl Southern Hairy-legged Myotis

which among others downloads the file:
819608.mp3

If I use the web api for the same file:
https://xeno-canto.org/838712/download

I get a file with the correct WAV extension.
XC819608 - Southern Hairy-legged Myotis - Myotis keaysi.wav

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.