joeweiss / birdnetlib Goto Github PK

View Code? Open in Web Editor NEW

39.0 4.0 13.0 384.68 MB

A python api for BirdNET-Lite and BirdNET-Analyzer

Home Page: https://joeweiss.github.io/birdnetlib/

License: Apache License 2.0

Dockerfile 0.10% Python 99.90%

birdnet birds birdsong tensorflow python3

birdnetlib's People

Contributors

Stargazers

Watchers

Forkers

ethan-nelson limitlessgreen jurriaan mjweldy elementechemlyn mrichar1 floreencia spohlenz giojacuzzi ttopholm rosulucian dsgt-kaggle-clef bbennett80

birdnetlib's Issues

Add ability to pass a custom/defined species list as a python list

It is currently possible to pass a custom species file path to Analyzer, similarly to how BirdNET-Analyzer handles species lists.

This feature would make it easier to pass this dynamically via python, or via an API like AudioSpotter.

All the helper classes should use BirdNet-Analyzer by default

Currently, only DirectoryMultiProcessingAnalyzer uses BirdNet-Analyzer by default. All the other helpers use older (deprecated) BirdNET-Lite as their default.

[Question] How do I use `birdnetlib` with BirdNet's models and labels?

I am trying to use birdnetlib as a library to identify birds instead of using the scripts included in BirdNet. I am having trouble pointing birdnetlib to the models and labels included in BirdNET.

Installation steps performed

I am on Ubuntu 22.04.

sudo apt-get update
sudo apt-get upgrade
<Python 3.10 was already installed>
pip3 install tensorflow
pip3 install librosa resampy
pip3 install birdnetlib
sudo apt-get install ffmpeg
git clone https://github.com/kahst/BirdNET-Analyzer.git

I tried following BirdNet's recommendation of installing tflite-runtime but got

>>> from birdnetlib.analyzer import Analyzer
Traceback (most recent call last):
  File "/home/<user>/.local/lib/python3.10/site-packages/birdnetlib/analyzer.py", line 8, in <module>
    import tflite_runtime.interpreter as tflite
ModuleNotFoundError: No module named 'tflite_runtime'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/<user>/.local/lib/python3.10/site-packages/birdnetlib/analyzer.py", line 10, in <module>
    from tensorflow import lite as tflite
ModuleNotFoundError: No module named 'tensorflow'

So this is why I just installed the other option of tensorflow.

Code

from birdnetlib import Recording
from birdnetlib.analyzer import Analyzer
from datetime import datetime

model="<path>/BirdNET-Analyzer/checkpoints/V2.4/BirdNET_GLOBAL_6K_V2.4_Model_FP32.tflite"
labels="<path>/BirdNET-Analyzer/checkpoints/V2.4/BirdNET_GLOBAL_6K_V2.4_Labels.txt"

analyzer=Analyzer(
    classifier_labels_path=labels,
    classifier_model_path=model
)

recording = Recording(
    analyzer,
    "2023-07-03 19_27.wav",
    lat=42,
    lon=71,
    date=datetime(year=2023, month=7, day=4),
    min_conf = 0
)

But the analysis fails:

>>> recording.analyze()
read_audio_data
read_audio_data: complete, read  16 chunks.
analyze_recording 2023-07-03 19_27.wav
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/<user>/.local/lib/python3.10/site-packages/birdnetlib/main.py", line 61, in analyze
    self.analyzer.analyze_recording(self)
  File "/home/<user>/.local/lib/python3.10/site-packages/birdnetlib/analyzer.py", line 206, in analyze_recording
    pred = self.predict_with_custom_classifier(c)[0]
  File "/home/<user>/.local/lib/python3.10/site-packages/birdnetlib/analyzer.py", line 316, in predict_with_custom_classifier
    C_INTERPRETER.invoke()
  File "/home/<user>/.local/lib/python3.10/site-packages/tensorflow/lite/python/interpreter.py", line 917, in invoke
    self._interpreter.Invoke()
RuntimeError: tensorflow/lite/kernels/concatenation.cc:159 t->dims->data[d] != t0->dims->data[d] (1 != 0)Node number 92 (CONCATENATION) failed to prepare.

This worked fine with the built-in model and labels in birdnetlib using just analyzer=Analyzer(), but those are apparently not correct for my area as the results were not accurate.

How do I use birdnetlib with BirdNet's models and labels? Have I done something wrong here?

Thanks for any help! Excited to try and get this system going.

Overlap parameter

I'd like to be able to change the BirdNET overlap parameter in addition to changing parameters like min_conf. I am using the DirectoryAnalyzer class defined in batch.py but overlap doesn't appear as an option? Is there a way of doing this and can you provide an example please?
Thanks!

Update BirdNET-Analyzer model to 2.2

Refactor analyzers to clarify what/when/how

Analyzer
LiteAnalyzer

Some of the code uses terse naming and not commented. Need to improve the readability a bit.

Update to BirdNET-Analyzer 2.4 model

2.4 is out. kahst/BirdNET-Analyzer@b32cdc5

Also, I'm considering an eventual separation of the library from the models. Perhaps something like birdnet-analyzer-model==2.5.0 would be better as a dependency of birdnetlib rather than including the models files in the library itself.

That would allow users to easily pin their model during a long-term analyzation project rather pin the library itself. I've had questions offline about that already, and I've pointed users to the new "custom classifier" options.

I would have to mirror the models (under BirdNET's current NC license) in a new package.

Any thoughts?

RuntimeError with birdnetlib and Custom Model Analysis

Issue Description:

I am encountering a runtime error when attempting to use a custom model trained with birdnet-analyzer and performing analysis with birdnetlib. The specific error message I am receiving is:

RuntimeError: tensorflow/lite/kernels/concatenation.cc:162 t->dims->data[d] != t0->dims->data[d] (1 != 0)Node number 92 (CONCATENATION) failed to prepare.

Python Code:

from birdnetlib.analyzer import Analyzer
from birdnetlib import Recording

model_path = '/path_to_file/test_2.tflite'
labels_path = '/path_to_file/test_2_Labels.txt'

analyzer = Analyzer(classifier_labels_path=labels_path, classifier_model_path=model_path)

recording = Recording(
    analyzer,
    'path_to_audio_file',
    min_conf=0.1
)

recording.analyze()

Environment:

Operating System: Ubuntu 23.10
Python Version: 3.11.5
Birdnetlib Version: 0.12.3

Thank you for your attention to this matter. Let me know if you need further details or clarification.

Add example that uses a predefined species list

Add support for BirdNET's embeddings output

Rename to birdnet

The most obvious name is still available on PyPI:

https://pypi.org/search/?q=birdnet

Add a polling option to DirectoryWatcher for when watchdog fails

Mostly due to this documented limitation with Docker:
https://docs.docker.com/desktop/mac/apple-silicon/

In addition, filesystem change notification APIs (inotify) do not work under qemu emulation. Even when the containers do run correctly under emulation, they will be slower and use more memory than the native equivalent.

Watcher needs a method for parsing the files datetime from the filename or filepath

Add error handling for unreadable audio files

Add code contributions standards with python linting preferences

As part of this, implement a lint checker with Black in Github actions. It should run on all pull requests.

Sudden code issue with GUI

Hello,
I've been using BirdNET to run classifiers for months now as I work though building some custom models. My computer restarted, and now when I try to open the GUI (using python3 py.gui like usual), it brings up error lines in the codes including
Traceback (most recent call last):
File "/Users/me/BirdNET-Analyzer/gui.py", line 969, in
build_single_analysis_tab()
File "/Users/me/BirdNET-Analyzer/gui.py", line 717, in build_single_analysis_tab
) = species_lists(False)
File "/Users/me/BirdNET-Analyzer/gui.py", line 655, in species_lists
species_file_input = gr.File(file_types=[".txt"], info="Path to species list file or folder.", visible=False)
File "/Users/me/anaconda3/envs/birdnet-analyzer/lib/python3.10/site-packages/gradio/component_meta.py", line 157, in wrapper
return fn(self, **kwargs)
TypeError: File.init() got an unexpected keyword argument 'info'

Have there been updates? Why would this all of a sudden have issues?

Thanks in advance.

Raise file size limit with PyPi

The BirdNET-Analyzer 2.4 model is 13MB larger than the 2.3 model file, which pushed birdnetlib over PyPi's 100MB total limit.

I've requested PyPi raise the limit to 200MB.
pypi/support#2912

The 0.7.0 release is blocked until this is resolved.

For the moment, you can install 0.7.0 with:
pip install https://github.com/joeweiss/birdnetlib/archive/main.zip

... or by including the following in your requirements.txt file:
birdnetlib @ https://github.com/joeweiss/birdnetlib/archive/main.zip

DirectoryAnalyzer and DirectoryWatcher should support multiprocessing (or GPU)

I love the idea of DirectoryAnalyzer and DirectoryWatcher. But as long as they only run on one CPU core, it's not usable for me.
I helped myself with multiprocessing, but that should only be a temporary solution as long as it isn't built directly into birdnetlib.

import os
from multiprocessing import Pool
from birdnetlib import Recording
from birdnetlib.analyzer import Analyzer
from datetime import datetime


def analyze_file(file_path):
    try:
        analyzer = Analyzer()
        recording = Recording(
            analyzer,
            file_path,
            lat=35.4244,
            lon=-120.7463,
            date=datetime(year=2022, month=5, day=10),
            min_conf=0.25,
        )
        recording.analyze()
        print(f"Processed {file_path}")
        return recording
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return []


def analyze_directory(dir_path):
    pool = Pool()
    recordings = []
    for subdir, dirs, files in os.walk(dir_path):
        for file in files:
            if file.lower().endswith(".wav"):
                file_path = os.path.join(subdir, file)
                recordings.append(pool.apply_async(analyze_file, args=(file_path,)))
    pool.close()
    pool.join()

    return recordings


if __name__ == "__main__":
    dir_path = "./workdir/"
    recordings = analyze_directory(dir_path)
    print(f"Finished analyzing {len(recordings)} files")

Add an example that saves to SQLite

Add an example of processing the results of multiple files

I'd like an example of a python script that looks at a directory or series of directories and runs and saves the results.

This issue does not include a command-line interface.

A fully functional CLI is outside the scope of this project.

Very large audio files should be loaded in sections rather than loading entirely into memory

Currently (as of 0.12.3), segmenting large audio files (> 1h) into more manageable segments is left to the user. If a user was to attempt to process a large audio file, the entire file would be pulled into memory before analyzing. This can lead to OOM killer events, or process crashes.

The library needs to have a method for processing very large audio files.

Relicense the birdnetlib project with a more permissive open source software license

Related to #72, I'm proposing to relicense this repo and project as Apache 2.0. This would encourage the inclusion and usage of birdnetlib in other projects without imposing the copyleft conditions of the GPL-3.0 license.

Note: This would not apply to the models that are included in this repo, as they are included for redistribution under BirdNET-Analyzer and BirdNET-Lite's CC-NC-BY-SA 4.0 license.

The plan is to change the license in the 0.9.0 release in August. I'll leave this issue open for discussion until then.

Add action for publishing package on new release

Action should run tests before publishing.

Add a clean tuple based species list option for SpeciesList

species = SpeciesList()
species_list = species.return_list(
    lon=-120.7463, lat=35.4244, date=datetime(year=2022, month=5, day=10)
)
print(species_list)  # [('Haemorhous mexicanus', 'House Finch'), ('Aphelocoma californica', 'California Scrub-Jay') ...]

Fix issue with loading custom classifiers

Add support for processing wavs in memory without passing a file path string.

I have a number of scripts that would need to be able to process in memory wavs instead of wavs saved to disk so it would be nice to add an argument flag for passing the audio vector itself.

Relicense as CC-NC-BY-SA 4.0

I would like to merge this repository with the main BirdNET repository for easier maintenance of both parts. However, the GPL v3 license does not allow that without relicensing BirdNET-Analyzer under GPL v3, too. Therefore, please relicense this code under the term of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Clarify the license in a README in the models directory

https://github.com/joeweiss/birdnetlib/tree/main/src/birdnetlib/models

The README should appear at the above directory.

The BirdNET models here are Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License.

Can't pass only custom classifier labels path to Analyzer

I'm trying to pass just a custom classifier label path to the Analyzer constructor but not a custom model and it was failing. I found this in the Analyzer init function in the analyzer.py file

self.classifier_model_path = classifier_model_path
self.classifier_labels_path = classifier_labels_path
self.use_custom_classifier = (
    self.classifier_labels_path and self.classifier_labels_path
)

classifier_labels_path is checked both times instead of self.classifier_model_path and self.classifier_labels_path

Add ability to use a custom classifier

Now that training is documented as part of Birdnet-Analyzer, birdnetlib should have an option to use a custom classifier.

Repo fails to clone on Windows due to invalid filename paths in test files

Thank you for this great project :)
Sadly, I can't clone this repo under Windows, because Windows does not allow colons in file names.

error: invalid path 'tests/test_files/2022-08-15-birdnet-21:05:51.wav'

Add documentation in the README 0.10.0 features

RecordingFileObject
Analyzer version argument

Fix bug with DirectoryWatcher

DirectoryWatcher should process new files only when they close.

class MultiProcessRecording drops various recording methods

When using DirectoryMultiProcessingAnalyzer, it's not clear why it internally uses the MultiProcessRecording class. This seems to just return a subset of methods/attrs of a passed-in Recording instance (config, detections, path etc), meaning that access to other Recording methods is lost, e.g. extract_detections_as_audio.

Rather than iterating over the list of Recording instances, generating a list of MultiProcessRecording instances, would it not be better to just return the initial list, as happens in DirectoryAnalyzer ?

I'm happy to work out a PR for this, if you think the above suggestion is a good idea.

Drop support for Python 3.7 as it reaches end-of-life

Python 3.7 will reach end of life on 2023-06-27.
The plan is to drop support for 3.7 on the next release after that date.

Locate a custom classifier and corresponding audio file from the community for using with testing

I have a working test case for the custom classifier functions, but I don't have a custom classifier itself for testing.

If you have a classifier, a suitable .wav or .flac file, and you're willing for your work to be used in birdnetlib's test cases, please let me know.

As per the project license, any shared files will be released as part of this project under the open source GPL-3 license.

Add an example that watches a directory and handles results via a handler

def on_analyze_complete(recording):
    print(recording.path)
    print(recording.detections)

analyzer = LiteAnalyzer()

dir = "."

watcher = DirectoryWatcher(
   dir, analyzer=analyzer, lon=-120.7463, lat=35.4244, week=18, min_conf=0.4
)
watcher.on_analyze_complete = on_analyze_complete
watcher.watch()

recording.detections contains non-JSON serializable numpy float32s

recording.detections are intended to be saved in databases and for template renders. It makes sense to convert those values to floats rather than leave that task to the user. Django's JSONField in particular can't save the detections as-is without pre-conversion of the numpy float32s to floats.

Note, this is a fix, so wouldn't be a breaking change.

Spectrogram extraction is slower than expected

Confirmed on Ubuntu and MacOS. Spectrogram extraction gets slower as python iterates through the detection list.

Watchers need a method for always using the current date

Watchers can and will be used in long-running processes. Therefore, we need a method for using the current date with each new recording.

Streaming WAV data

Hi. Thanks for the work on this library - it's fantastic.

I have a raspberry pi zero out in the garden sending wav data back to a pi 4 running the library. Rather than copying files around, I made a few updates to the library so it can handle in memory buffers of WAV data. The changes let me run it in a simple TCP server that receives a constant stream of WAVs.

There is an example of how it works here:

https://github.com/elementechemlyn/birdnetlib/blob/wavbuffer/examples/simple_tcp_server.py

The changes are pretty rudimental but are non-breaking and have been working well for me over the last few months. If you think they might be useful for others and would like a pull request then let me know.

Cheers,
E.

Add a utility class for generating species list based on lat/long/date using BirdNET-Analyzer

This is partially implemented in Analyzer.return_predicted_species_list method here.

Current implementation only uses the uncommon week_48 date format. Also, the current implementation initializes the entire Analyzer model, which has some unnecessary startup processes as related to species list generation.

Add Python 3.11 to the Github Actions

This needs to be added to the test and release actions.

Add the ability to use specific BirdNET-Analyzer models

This would help mostly with reproducibility, but also aid in benchmarking the various model's performance for specific species.

analyzer22 = Analyzer(version=2.2)
analyzer23 = Analyzer(version=2.3)

# continue with processing audio ...

This is a work-in-progress. Expect a PR soon.

Add formal documentation using mkdocs

Add ability to annotate detections if species is expected

When providing the lat/long and datetime, Analyzer filters results by an BirdNET determined expected species list. This filters out all non-bird sounds like engine, coyote, frogs, etc.

It would be helpful to have a detection return with a "in_species_list" boolean or something similar, rather then filter out non-avian sounds entirely.

Perhaps it should also include the occurrence frequency value that's output by BirdNET's species.py function.

[{'common_name': 'House Finch',
  'confidence': 0.5744,
  'end_time': 12.0,
  'in_species_list': true,
  'in_species_list_freq': 0.10,
  'scientific_name': 'Haemorhous mexicanus',
  'start_time': 9.0,
  'label': 'Haemorhous mexicanus_House Finch'},
 {'common_name': 'Coyote',
  'confidence': 0.4496,
  'end_time': 9.0,
  'in_species_list': false,
  'in_species_list_freq': null,
  'scientific_name': 'Canis latrans',
  'start_time': 6.0,
  'label': 'Coyote_Coyote'}
 {'common_name': 'House Finch',
  'confidence': 0.4496,
  'end_time': 15.0,
  'in_species_list': true,
  'in_species_list_freq': 0.40,
  'scientific_name': 'Haemorhous mexicanus',
  'start_time': 12.0,
  'label': 'Haemorhous mexicanus_House Finch'}]

Move handler from the library to the examples and tests

There is concern that the handler should be an example rather than part of the library. It's probably more important to provide working code as an example rather than prescribe a preferred schema for the database.

Add a multi analyzer directory watcher

I would like the ability to watch a folder, and automatically have more than one analyzer run the files and return results. Watcher should provide method for returning recording detections from each analyzer, and for returning when all analyzers are complete.

Make week more clear (BirdNET Analyzers are based on 48 'weeks' in a year)

Context: The BirdNET analyzers use a week value that's ranged 1-48.

Week of the year when the recording was made. Values in [1, 48] (4 weeks per month). Set -1 for year-round species list. ref

Some users find this ambiguous (is there a preferred python function for converting a date to week-48 format?).

Also, since it's a non-standard interpretation of "weeks", I'd like to standardize with an eye towards the API staying the same for future-proofing. I'd hate to lock in a 48 week standard that has to be reversed if another analyzer ever becomes available.

Options are:

Change week to accept a 52 weeks value, and convert to BirdNET's required 48-week based value in the background
Add a date argument, and use that to convert to BirdNET's required 48-week based value in the background
Above options, but still accept a week_48 parameter as well to provide consistency to BirdNET users

I'm leaning on option 2 with option 3 implemented as a fallback for users that adhere to the legacy BirdNET usage.

ModuleError: tensorflow.python.distribute.load_context

After installing birdnetlib with tensor flow, librosa and all other needed dependencies. I've run into an error I've attached below. This is happening when I'm trying to import birdnetlib into my python file. I'm using tensorflow version 2.15.0. Does anyone know a work around for this. Am I using the wrong version? When I look at the tf documentation I cant even find a module called distribute.load_context :

ModuleNotFoundError: No module named tensorflow.python.distribute.load_context