addictedcs / soundfingerprinting Goto Github PK

Open source audio fingerprinting in .NET. An efficient algorithm for acoustic fingerprinting written purely in C#.

License: MIT License

C# 99.95% Batchfile 0.03% Shell 0.02%

audio fingerprints algorithm acoustic-fingerprints recognition locality-sensitive-hashing nearest-neighbor-search shazam c-sharp audio-processing

soundfingerprinting's Introduction

Audio/Video fingerprinting and recognition in .NET

soundfingerprinting is a C# framework designed for companies, enthusiasts, researchers in the fields of digital signal processing, data mining and audio/video recognition. It implements an efficient algorithm which provides fast insert and retrieval of acoustic and video fingerprints with high precision and recall rate.

Documentation

Full documentation is available on the Wiki page.

Below code snippet shows how to extract acoustic fingerprints from an audio file and later use them as identifiers to recognize unknown audio query. These fingerprints will be stored in a configurable datastore.

private readonly IModelService modelService = new InMemoryModelService(); // store fingerprints in RAM
private readonly IAudioService audioService = new SoundFingerprintingAudioService(); // default audio library

public async Task StoreForLaterRetrieval(string file)
{
    var track = new TrackInfo("GBBKS1200164", "Skyfall", "Adele");

    // create fingerprints
    var avHashes = await FingerprintCommandBuilder.Instance
                                .BuildFingerprintCommand()
                                .From(file)
                                .UsingServices(audioService)
                                .Hash();
								
    // store hashes in the database for later retrieval
    modelService.Insert(track, avHashes);
}

Querying

Once you've inserted the fingerprints into the datastore, later you might want to query the storage in order to recognize the song those samples you have. The origin of query samples may vary: file, URL, microphone, radio tuner, etc. It's up to your application, where you get the samples from.

public async Task<TrackData> GetBestMatchForSong(string file)
{
    int secondsToAnalyze = 10; // number of seconds to analyze from query file
    int startAtSecond = 0; // start at the begining
	
    // query the underlying database for similar audio sub-fingerprints
    var queryResult = await QueryCommandBuilder.Instance.BuildQueryCommand()
                                         .From(file, secondsToAnalyze, startAtSecond)
                                         .UsingServices(modelService, audioService)
                                         .Query();
    
    return queryResult.BestMatch.Track;
}

Fingerprints Storage

The default storage, which comes bundled with soundfingerprinting NuGet package, is a plain in-memory storage, available via InMemoryModelService class. If you plan to use an external persistent storage for fingerprints Emy is the preferred choice. Emy provides a community version which is free for non-commercial use. More about Emy can be found on wiki page.

Supported audio/video formats

Read Supported Media Formats page for details about processing different file formats or realtime streams.

Video fingerprinting support since version 8.0.0

Since v8.0.0 video fingerprinting support has been added. Similarly to audio fingerprinting, video fingerprints are generated from video frames, and used to insert and later query the datastore for exact and similar matches. You can use SoundFingerprinting to fingerprint either audio or video content or both at the same time. More details about video fingerprinting are available here.

Version Matrix

If you are using FFmpegAudioService as described in the wiki, follow the below version matrix.

SoundFingerprinting	SoundFingerprinting.Emy	FFmpeg
8.x	8.x	4.x
9.x	9.x	5.x
10.x	10.x	6.x

FAQ

Can I apply this algorithm for speech recognition purposes?

No. The granularity of one fingerprint is roughly ~1.46 seconds.

Can the algorithm detect exact query position in resulted track?

Yes.

Can I use SoundFingerprinting to detect ads in radio streams?

Yes. Actually this is the most frequent use-case where SoundFingerprinting was successfully used.

How many tracks can I store in InMemoryModelService?

100 hours of content with DefaultFingerprintingConfiguration will consume ~5GB of RAM.

Get it on NuGet

Install-Package SoundFingerprinting

How it works

Audio Fingerprinting.

Video Fingerprinting.

Demo

My description of the algorithm alogside with the demo project can be found on CodeProject. The article is from 2011, and may be outdated. The demo project is a Audio File Duplicates Detector. Its latest source code can be found here. Its a WPF MVVM project that uses the algorithm to detect what files are perceptually very similar.

Contribute

If you want to contribute you are welcome to open issues or discuss on issues page. Feel free to contact me for any remarks, ideas, bug reports etc.

License

The framework is provided under MIT license agreement.

soundfingerprinting's People

Contributors

Stargazers

Watchers

Forkers

eugentorica account271828 jorik041 sakshi-mittal maisnazar alexswider amitonline remy22 yuhang2 ronalimi ashkanershadi fiston vonwenm hamishd wbgxx333 samotnij dbernardoj hobbygenie fatihbulut cgratie lvaleriu zecarlosalmeida andrelotto gnqz zhangyuancv avatar29a lokimidgard yomero123 etnos hihiroom lord6 simudream slybard vaidhy karthiks1995 isaactzab johndpope computamike githubjiangtao johnearnshaw thethien gaborp modulexcite versatgrant kamaroly leonardsim gfdittmer dylankdreyer biapar rushtu22 gitter-badger morphx666 zydronium victorchen2045 isaackyuka tablesmit fourpee goldcat bhoipkemier devartblake athivarathan ckyrouac shadovvbeast richardboyewa moorthi07 gecambridge rkproject24 holm-xie hermaneldering legnaro reptarsrage jieveliu mohnkhan vbota1 kazuru-chidumbwe skazuru cauusc jakoss ionux aliang1 cecweijun vincenthdu lkathke pchaozhong viruswevh polytronicgr phaufe mconners renatoasterio priestd09 chenkaigithub kaikyosoft stevesf9 technikempire neuralnoise ryanbekabe behnamgholizade itpreneur voahary digitalplatform

soundfingerprinting's Issues

Detecting a song within a stream of songs

With this current version 2.0, is it possible to detect a particular song within a single stream or a long audio file of various songs by knowing the start and end time of when the song was matched in the single stream of various songs

Remove theoretical documents from the repository

Remove theoretical documents from the repository and create links to them on http://addictedcs.github.io/soundfingerprinting resource.

I'm concerned about their legal status, thus its better to get the potential heat out of the project.

Apply semantic versioning for new releases

http://semver.org/

Add FFTW libraries for x64 platform.

Adjust the Interop classes in order to load both x64 and x86 native libraries.
Within the FFTWService make the call to the appropriate library.

Build a skeleton for NuGet package release of sound fingerprinting library

This will allow others installing and using the library much more efficiently and easily.

Create a description wiki page with the details of how using the library

Currently it is not clear how one should use the library in his/her custom application.
The description page should contain details about database initialization, fingerprinting and querying mechanisms.

Querying Song From Android

Sound fingerprints are obtained for master list of songs, and stored in database using Fill database functionality.

To recognize the songs from Android, possible to match it through api ? How to compute audio samples from android and match against existing fingerprints in database.

Fix read from URL by NAudio service.

NAudio service currently reads at a different sample format from specified URL.

The issue has be described here: http://stackoverflow.com/questions/22385783/aac-stream-resampled-incorrectly.

Waiting for any reliable resolution on that.

Introduce a dependency injection resolver (container) on framework level thus making Soundfingerprinting a completely separate and ready-to-use library

Dependency Resolver should have a default resolver that will wire up all the necessary dependencies, thus not asking the client for interfaces he is not aware of.

Split IAudioService into different interfaces that will handle different work

IAudioServie is defining a contract for too much work. It has to be split into a couple of different interfaces: IWaveFileUtility, IMicrophoneRecordingService, IURLStreamingService.

Provide a seamless way of setting Bass license key when using BassAudioService

Read bass license key from configuration file. In case if it is not present use the defaults.

NAudioService incorrectly seeks to specified start element.

startAt parameter does not work well for NAudioService. It seeks only to the middle of the startAt parameter.

Generation of log frequencies range should be cached

Everytime fingerprints are created, log frequency ranges are generated. Even though it is not an expensive operation, there is no need to do repetitive work.

Affected classes: LogUtility

Implement NAudio ReadMonoFromURL and RecordFromMicrophoneToFile methods.

The following 2 methods from NAudioService have to be implemented

public float[] ReadMonoFromURL(string urlToResource, int sampleRate, int secondsToDownload)
{
    throw new NotImplementedException("Use Bass.NET");
}

public float[] RecordFromMicrophoneToFile(string pathToFile, int sampleRate, int secondsToRecord)
{
     throw new NotImplementedException("Use Bass.NET");
}

Add support for a NoSQL data storage

Current backend, MSSQL, has its limits in terms of performance and scalability. Storing large number of songs in a database requires a data storage which is easily scalable horizontally, and has a good performance index.

After studying a large number of NoSQL solutions, would recommend MongoDB for the winner. REDIS would fit perfect for a caching layer, as it seems to perform not as well as other NoSQL database systems when stored set exceeds the amount of RAM available on dedicated machine. Possible amount of songs and fingerprints will definitely exceed RAM capacity. CouchDB seems very similar to Mongo. Because we don't really need master - master replication, as the data will be mostly inserted (with no frequent updates), would go for Mongo as well. Other database system has also been analyzed, though mainly these 3 have been considered as good candidates.

Use Aggregation capabilities in order to perform main query on HashTable - HashBin
Add complex index on HashTable, HashBin in order to burst performance
Map data models to the same structure as in MSSQL (connect entities via references)
Would store Album Title as a part of Track to minimize lookups for Album. In case if the album is not known, do not store untitled
Sharding (if required) will be performed over the HashTables (25 elements fit perfectly)

Query One Song with Hash Signature

Your code is awesome.

When i try to find a song, your algo uses Hamming Distance calculation which in turn uses fingerprints from original song (binaries). Is there a way to find the song accurately using hashes (int []). If so kindly provide code for it.

Thanks

Show similarities of 2 audio files

If we have 2 audio files A1 and A2 which could be drawed as following :
A1 : XXXXXYYYYZZZZZZZZZZZZZZ (where XXXXXXXXX, YYYYY, ZZZZZZZZZZZ indicate 3 audio regions of differents lengths)
A2 : SSSSSSSSSYYYYYTTTTTTTTT (where SSSSSSSSSS, YYYYY, TTTT indicate 3 audio regions of differents lengths)

then we could call a similarity fonction which will tell us that A1 and A2 have a "similar" region of length N seconds which starts at time t1 in A1 and t2 in A2.

For the moment I'm working myself on this, but feel free to help/contribute if you wish.

Thanks,
Valeriu

Fix the associated problem with SubFingerprint vs Fingerprint.

Fingerprint Service method should return strongly typed object (E.g. Fingerprint) instead of bool []. Don't know yet, whether this is critical.

public interface IFingerprintService
{
   List<bool[]> CreateFingerprints(float[] samples, IFingerprintingConfiguration fingerprintingConfiguration);
}

For FingerprintUnit

RunAlgorithm - return Fingerprints
RunAlgorithmWithHashing - return SubFingerprint

new version versus the one published in CodeProject article

Hi,

When I download the exe published in the code-project article (http://www.codeproject.com/Articles/206507/Duplicates-detector-via-audio-fingerprinting?msg=4829666#xx4829666xx) and evaluate on 250 songs, it is giving accurate results. But when I download the source code from here and test on the same songs, it is not giving the same results. Can I get the source code for the one published in code-project? I want to work on it...

Thanks,
Sateesh

Build perf test to profile the algorithm

Perf tests are required to profile how fast/accurate the algorithm performs, comparing to various versions of the application.

Out of Memory Exception

Following code was throwing an memory exception in my application

[File Name : BassAudioService.cs]
// get re-sampled/mono data
int bytesRead = Bass.BASS_ChannelGetData(mixerStream, buffer, buffer.Length * 4);
if (bytesRead == 0)
{
break;
}

bytes read was returning -1 when the limit was reached but the while loop was running for long time and chunks.Add(chunk) was creating memory exception. Do i need to change

if (bytesRead == -1)
{
break;
}
Am i missing something ?

Upgrade to NAudio 1.7 stable

Show similarities of 2 audio files

then we could call a similarity fonction which will tell us that A1 and A2 have a "similar" region of length N seconds which starts at time t1 in A1 and t2 in A2.

For the moment I'm working myself on this, but feel free to help/contribute if you wish.

Thanks,
Valeriu

Split soundfingerprinting repository into multiple git repos

soundfingerprinting repository has grown immensely over the last years. It's time to split it up according to responsabilities:

soundfingerprinting
- soundfingerprinting.audio
- soundfingerprinting.audio.naudio // as the default audio engine
- soundfingerprinting.infrastructure
- soundfingerprinting.math
- soundfingerprinting.minhash.permutations
- soundfingerprinting.fft.exocortex

List of dependent libraries on soundfingerprinting core (with their own release planning etc.):

soundfingerprinting.audio.bass
soundfingerprinting.sql
soundfingerprinting.mongodb
soundfingerprinting.neuralhasher
soundfingerprinting.soundtools
soundfingerprinting.duplicatesdetector

Dependent libraries should contain their own build cmdlets, as well as unit test suites.

Combine AudioService, StreamingUrlReader and SoundCaptureService under the same interface

AudioService, StreamingUrlReader, SoundCaptureService provide similar functionality which can be abstracted under the same interface, or at least combined under the same naming convention: AudioFileReader, StreamingUrlReader, AudioDeviceReader. This will help develop commands that will fingerprint/query using different underlying readers (source).

Remove bindings for public interfaces

Public interfaces should not contain any default bindings s.t. they will be indadvertedly used without the caller knowing about it.

Only internal interfaces should have them.

Implement Timsort in FingerprintDescriptor class

2 adjacent fingerprints from the same audio snippet are usually very similar in terms of their top 200 wavelet locations. In order to take advantage of this characteristic, we can keep track of indexes of previously sorted array, and reassign them when FingerprintDescriptor.ExtractTopWavelets is called.

A good implementation of Timsort algorithm can be found here.

Below method has to be modified:

public bool[] ExtractTopWavelets(float[][] frames, int topWavelets)
{
       int rows = frames.GetLength(0); /*128*/
       int cols = frames[0].Length; /*32*/
       float[] concatenated = new float[rows * cols]; /* 128 * 32 */
       for (int row = 0; row < rows; row++)
       {
            Buffer.BlockCopy(frames[row], 0, concatenated, row * frames[row].Length * 4, frames[row].Length * 4);
       }

       int[] indexes = Enumerable.Range(0, concatenated.Length).ToArray();
       Array.Sort(concatenated, indexes, absComparator); // TODO Use Timsort
       bool[] result = EncodeFingerprint(concatenated, indexes, topWavelets);
       return result;
}

Error running build.cmd

Hi.
I'm trying to run the build.cmd but ends with the following error:

D:\Documentos\soundfingerprinting-master\build\Build.proj(33,2): error MSB3073: O comand
o "..\tools\MSTest.exe /testcontainer:Release\UnitTests\SoundFingerprinting.Tests.dll" concluiu com o código -532462766.

What can be the reason?
My system is windows 8.1 x64.

Thanks and sorry by the poor english.

Show similarities of 2 audio files

then we could call a similarity fonction which will tell us that A1 and A2 have a "similar" region of length N seconds which starts at time t1 in A1 and t2 in A2.

For the moment I'm working myself on this, but feel free to help/contribute if you wish.

Thanks,
Valeriu

Substitute DirectSound with NAudio

DirectSound service has to be deleted from the project. It is useless in terms of its usability (reading only .wav files).
Substitute DirectSound with NAudio in unit test project and delete DirectSound (amen).

Add SupportedFormats property on IAudioService

SupportedFormats has to return a string list of formats that AudioService is supporting.

Add new methods in FingerprintCommand which will allow generating fingerprints from StreamingUrl

Extend FingerprintCommand so that fingerprints can be generated from StreamingUrlReader. This will allow fingerprint generation directly from a URL resource.

Denormalize Album class and embed it to Track entity

Album model can be safely embedded into Track as it represents mainly read only information (it will be updated only in case if an error is introduced, which is unlikely). Having a typical read only model, would suggest denormalizing it into Track:

Album name
Album release year (is it really useful?)

Mark those fields as nullable (in case when album information is not known).

Audio matching from smartphone records

Hi,

I tried to use your library to build a "Shazam like" app for myself. Only for testing purpose, I recorded an audio sample with my smartphone, imported it on my computer and query the database with it.

Unfortunately, with the default configurations, I never get a correct matching of the song. The record is not too bad, there's no significant noise since it was recorded at home in a quiet room from my main speakers system.

Still, default configurations work great to find songs duplicates, so I'm just wondering if there's any configuration parameter that can be specifically tune to reach my goal.

Thanks

Xavier Q.

Remove option of storing permutations into the database

Permutations don't have to be stored in the database. They rarely (actually never) get changed.

Remove database permutations related code from the SoundFingerprinting project. Create a patch to delete them from the database (as well as their associated stored procedures).

Add new methods to QueryCommand which will allow querying the datasource directly from StreamingUrl

Add an option of querying the underlying datasource with fingerprints generated directly from a streaming url.

Seconds to query has to adjust according to audio snippet length

In case if seconds to query is bigger than audio snippet length, the algorithm has to query only the amount which it can generate the fingerprints from.

Get rid of Bass library dependency because of associated legal issues

Consider any other freely accessible library that will be able to read and down-sample audio files of different format (mandatory .mp3 format).

NAudio seems a good candidate - http://naudio.codeplex.com

New service should implement methods defined in IAudioService and IExtendedAudioService.

Hide methods from IModelService which are database specific

IModelService contains methods which expose underlying database structure. In order to seamlessly integrate NoSQL data support, underlying model structure has to be hidden.
IModelService has to contain only the following methods:

{
    IModelReference InsertTrack(TrackData track);
    void InsertHashDataForTrack(IEnumerable<HashData> hashes, IModelReference trackReference);
    IList<SubFingerprintData> ReadSubFingerprintDataByHashBucketsWithThreshold(long[] buckets, int threshold);
}

Rename ReadSubFingerprintDataByHashBucketsWithThreshold method to better depict underlying functionality.

Include permutations for MinHash algorithm in a compiled class

Currently, permutations are read either from database or from a local file. In order to have a resilient library, include a class with predefined permutations declared in a static array. This will allow deploying the dlls without accompanying files.

Attempting to test audio fingerprinting using v2

Unsure how to test v2 of this library. I attempted to test using the Nuget version, but that seems to be 1.0.1

How do I use version 2.0?

-- Just realised that the github code trunk is currently at version 2...

Remove usage of Windows from the Soundfingerprinting framework

Windows (Hann, Hamming, etc.). did not introduce any benefit in applying them, meaning that they are a useless component for the system.

Removing them will burst performance on any of the FFT services.

NormalizeSignal flag used in a wrong service.

Normalizing audio signal have to be performed transparently, out of SpectrumService. Move it to a separate step within the transformation cycle.

suggestion : replace 3 doors down (Kryptonite) with Open Source track

A variety of songs can be downloaded from http://incompetech.com/. The songs are available under a creative commons licence : http://incompetech.com/music/royalty-free/licenses/

I would suggest that adding a note to the readme.md file would be sufficient to attribute the songs.

I'd suggest going to http://incompetech.com/music/royalty-free/ and using the search tools to pick a song to include.

Rename Soundfingerprinting assembly to SoundFingerprinting to make it acquire to camel case

Rename duplicates detector, sound tools namespace as well.
Do not forget Readme file and NuGet package description.

Create a description with all algorithms and third party libraries used in the project

Create a description page on http://addictedcs.github.io/soundfingerprinting resource with details about all algorithms that have been used in the implementation. Also link all third party libraries that are linked to the project, with links to their pages.

addictedcs / soundfingerprinting Goto Github PK

soundfingerprinting's Introduction

Audio/Video fingerprinting and recognition in .NET

Documentation

Querying

Fingerprints Storage

Supported audio/video formats

Video fingerprinting support since version 8.0.0

Version Matrix

FAQ

Get it on NuGet

How it works

Demo

Contribute

License

soundfingerprinting's People

Contributors

Stargazers

Watchers

Forkers

soundfingerprinting's Issues

Recommend Projects

Recommend Topics

Recommend Org