Time-domain synthetic speech detection net (TSSDNet), having the classic ResNet and Inception Net style structures (Res-TSSDNet and Inc-TSSDNet), for end-to-end synthetic speech detection. They achieve the state-of-the-art performance in terms of EER on ASVspoof 2019 challenge and promising generalization capability tested on ASVspoof 2015.
I use two pcs; they have different specs but use the exact same python libraries.
In one of the two pcs, the reading by soundfile library sf.read makes the floating issue for some audio files, ending up with extremely large number over a component of a sample.
I recommend using torchaudio.load instead of sf.read'; this fixes the issue.
Thanks for providing the pre-trained model but I have tested your model for FoR dataset by aptly lab and it is not working at all!
EER = 44 % for eval dataset
Accuracy = 48.64 %
could you please tell me what is the reason behind this and model is working fine for the asvspoof2019 dataset but when I tested for the new or unseen dataset, it is not working as expected?
Also , please provide me your email id for further contact.