Giter VIP home page Giter VIP logo

realzza / xenopy Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 3.0 56 KB

XenoPy: Python wrapper for Xeno-canto API 2.0. Supports multiprocessing.

Home Page: https://ziangzhou.com/project/BirdData-Multithread-Python-Wrapper-for-Xeno-canto-api-2-0/

License: GNU General Public License v3.0

Python 99.48% Shell 0.52%
xeno-canto bird-detection birdsong python ebird bird-detection-dataset bird bird-species birding xenocanto

xenopy's Introduction

XenoPy

PyPI  GitHub  GitHub last commit  GitHub top language  CodeFactor  DOI

XenoPy is a python library that builds upon xeno-canto API 2.0.

Install

Install from pip.

pip install xenopy

Checkout the birdData branch to implement XenoPy from source. (ps: birdData is the former name of XenoPy)

Usage Snippet

You can directly search for bird data for a specific species. For instance, we retrieve data for African Silverbill whom's quality better than C since 2020-01-01.

from xenopy import Query

q = Query(name="African silverbill", q_gt="C", since="2020-01-01")

Retrieve Metafiles

# retrieve metadata
metafiles = q.retrieve_meta(verbose=True)

Retrieve Recordings

# retrieve recordings
q.retrieve_recordings(multiprocess=True, nproc=10, attempts=10, outdir="datasets/")

The retrieved recordings will be located in datasets/, organized by bird species names.

The default downloading mode is single-threaded. multiprocess flag controls the usage of multiple downloading processes. nproc is only applicable when the multiprocess flag is on. The saving directory can be specified at outDir.

Two files will be generated while running retrieve_recordings, kill_multiprocess.sh, and failed.txt. To interrupt multiprocess data retrieval, one can run bash kill_multiprocess.sh in the terminal. 'failed.txt' contains recordings that failed the retrieval, if any. The two files will be removed automatically removed after downloading finishes. failed.txt will preserve if not empty so that you can check the failed recordings out.

Define a Query

As you can tell from the Usage Snippet, defining a query is the most important step in communicating with the API. We determined the following interface to form a query based on the xeno-canto search tips.

name: Species Name. Specify the name of bird you intend to retrieve data from. Both English names and Latin names are acceptable.
gen: Genus. Genus is part of a species' latin name, so it is searched by default when performing a basic search (as mentioned above).
ssp: subspecies
rec: recordist. Search for all recordings from a particular recordist.
cnt: country. Search for all recordings from a particular country.
loc: location. Search for all recordings from a specific location.
rmk: remarks. Many recordists leave remarks about the recording,and this field can be searched using the rmk tag. For example, rmk:playback will return a list of recordings for which the recordist left a comment about the use of playback. This field accepts a 'matches' operator.
lat: latitude.
lon: longtitude
box: search for recordings that occur within a given rectangle. The general format of the box tag is as follows: box:LAT_MIN,LON_MIN,LAT_MAX,LON_MAX. Note that there must not be any spaces between the coordinates.
also: To search for recordings that have a given species in the background.
type: Search for recordings of a particular sound type, e.g., type='song'
nr: number. To search for a known recording number, use the nr tag: for example nr:76967. You can also search for a range of numbers as nr:88888-88890.
lc: license.
q: quality ratings. 
q_lt: quality ratings less than
q_gt: quality ratings better than
    Usage Examples:
          Recordings are rated by quality. Quality ratings range from A (highest quality) to E (lowest quality). To search for recordings that match a certain quality rating, use the q, q_lt, and q_gt tags. For example:
            - q:A will return recordings with a quality rating of A.
            - q:0 search explicitly for unrated recordings
            - q_lt:C will return recordings with a quality rating of D or E.
            - q_gt:C will return recordings with a quality rating of B or A.
len: recording length control parameter.
len_lt: recording length less than
len_gt: recording length greater than
    Usage Examples:
        len:10 will return recordings with a duration of 10 seconds (with a margin of 1%, so actually between 9.9 and 10.1 seconds)
        len:10-15 will return recordings lasting between 10 and 15 seconds.
        len_lt:30 will return recordings half a minute or shorter in length.
        len_gt:120 will return recordings longer than two minutes in length.
area: continents. Valid values for this tag: africa, america, asia, australia, europe.
since: 
    Usage Examples:
        - since=3, since the past three days
        - since=YYYY-MM-DD, since the particular date
year: year
month: month. year and month tags allow you to search for recordings that were recorded on a certain date. 

Citation

If XenoPy is helpful in your project or research in any form, you can cite this software as the following

@software{ziang_zhou_2022_6545294,
  author       = {Ziang Zhou},
  title        = {realzza/xenopy: XenoPy v0.0.4},
  month        = may,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {v0.0.4},
  doi          = {10.5281/zenodo.6545294},
  url          = {https://doi.org/10.5281/zenodo.6545294}
}

Update History

🎉 v0.0.4

  • Support Query by bird name.
  • Cut inessential processes in query traffic.
  • Optimized query assignment strategy in recording retrieval.

todo

  • create query object for single species, containing features like
    • retrieve metedata
    • retrieve bird songs
  • add multiprocessing downloading feature

Open Source

The first generation of xenocanto package is hard to use also inefficient. Thus I wrapped the 2.0 API version in a more straightforward and efficient interface. Feel free to file an issue had you encountered any bugs, or prompt a PR to XenoPy to join me in maintenance and optimization.

xenopy's People

Contributors

jinchengheryan avatar realzza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

xenopy's Issues

MERCI!!!

非常感谢大佬提供这样的好工具,真的帮我节省了很多时间!
祝学业进步,工作顺利!

ZeroDivisionError

您好,我在尝试批量下载的时候,发现当查询结果为0个的时候,会出现 ZeroDivisionError: integer division or modulo by zero 的错误,查询trackback,错误源头在utils.py文件的第4行portion = len(lst) //n的位置。

除此之外,我还发现当查询以下物种名称的时候也会出现同样的error。我的查询时间范围是从1971-01-01:

['Maleo', 'Moluccan Megapode', 'Nicobar Megapode', 'Long-billed Partridge', 'Black Partridge', 'Udzungwa Forest Partridge', 'Rubeho Forest Partridge', "Roll's Partridge", 'Sumatran Partridge', 'Grey-breasted Partridge', 'Red-billed Partridge', 'Chestnut-necklaced Partridge', 'Crestless Fireback', 'Crested Fireback', 'Siamese Fireback', 'Great Argus', 'Madagascan Pochard']

以下这个list是对应以上的common_name的查询结果数量

[7, 6, 2, 18, 3, 6, 34, 10, 1, 31, 8, 35, 1, 17, 24, 83, 5]

期待回复,感谢!

Something about Download Timeout

您好,我在下载的时候发现一些音频文件在某些情况下(可能是网络的问题)下载时间过长,而且进度条一直停滞不前,有时甚至停滞十分钟到二十分钟不等,有时候我只能手动中断使之进入下一个循环。

后来我在stackoverflow上找到一个非常非常简单的解决方案,就是在执行请求的代码前加入socket.setdefaulttimeout(30),控制socket打开的时间,比如此处我设置为30秒。

socket.setdefaulttimeout(30) 
q.retrieve_recordings(multiprocess=True, nproc=10, attempts=10, outdir="/mnt/database/xcdata/")

同时我加入一些其他设置,让代码在触发socket.timeout错误后,可以将对应的请求参数记录下来,并继续执行下一个循环。由此可以避免下载停滞的问题。当我完成所有循环以后,会将socket打开的时间再增加(比如增加到120秒),对先前记录下的请求参数再次执行。由此循环多次以将所有参数请求完毕。

https://stackoverflow.com/questions/32763720/timeout-a-file-download-with-python-urllib

希望能帮助到大家。

Hello, when I was downloading, I found that the download time of some audio files was too long under certain circumstances (may be a problem with the network), and the progress bar has been stagnant, sometimes even for ten to twenty minutes, there are Sometimes I can only manually interrupt it to enter the next cycle.

Later, I found a very, very simple solution on stackoverflow, which is to add socket.setdefaulttimeout(30) before executing the requested code to control the opening time of the socket. For example, I set it to 30 seconds here.

socket.setdefaulttimeout(30) 
q.retrieve_recordings(multiprocess=True, nproc=10, attempts=10, outdir="/mnt/database/xcdata/")

At the same time, I added some other settings so that after the code triggers the socket.timeout error, it can record the corresponding request parameters and continue to execute the next loop. This can avoid the problem of download stagnation. When I finish all the loops, I will increase the socket opening time (for example, to 120 seconds), and execute again on the previously recorded request parameters. Loop for multiple times to complete the request for all parameters.

https://stackoverflow.com/questions/32763720/timeout-a-file-download-with-python-urllib

I Hope that can help everyone.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.