Hi, I'm interested in finding near-duplicate audio files. My dataset is about 3000

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Comparing short audio files about neural-audio-fp HOT 1 CLOSED

mimbres commented on May 29, 2024

Comparing short audio files

from neural-audio-fp.

Comments (1)

mimbres commented on May 29, 2024 1

@galarlo
Thanks for your interest in our work.

Yes, maybe it is possible. I would start with the default setup first and train the model on the train-set of this repo. Reducing the segment length or increasing the dimension can improve performance, but at the scale of your dataset it doesn't seem necessary. Default segment length of 1 sec can be used for 0.5 seconds input by simple zero-padding.

This repo performs a segment-level search, whereas your scenario is a file-level search. Modifications are needed on the post-processing side. Current search method outputs Top@K list of matching segments for each input first, and then within Top@C candidates it produces a list by match-ranking. Since it does not store 'segment ID'-to-'file ID' pairs info, you may need to construct the info to produce a file-match ranking.

W don't have any threshold parameters, which is directly related to FP and FN. However, the number of segment-search output K, and the number of candidates C are somewhat related to FP and FN.

neural-audio-fp/eval/eval_faiss.py

Line 88 in 058d812

@click.option('--k_probe', '-k', default=20, type=click.INT,

neural-audio-fp/eval/eval_faiss.py

Line 232 in 058d812

pred_ids = candidates[np.argsort(-_scores)[:10]]

K=20 and C=10 by default, and increasing K and C will get less FN. Another issue is that your scenario allows various input lengths while current method uses fixed-lengths for each search. You may need some ideas to summarize various input length results into the final estimate.

from neural-audio-fp.

Recommend Projects

Comparing short audio files about neural-audio-fp HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent