Giter VIP home page Giter VIP logo

css_with_conformer's Introduction

Hi, I'm Sanyuan Chen 👋

Homepage GitHub Scholar Badge Linkedin Badge Zhihu Gmail

css_with_conformer's People

Contributors

sanyuan-chen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

css_with_conformer's Issues

模型测试音频出现问题

您的模型在测试音频上出现报错
raise RuntimeError("Got 2D (single channel) input and can "
RuntimeError: Got 2D (single channel) input and can not extract spatial features、
请问我应该如何解决呢?
image

Model Training (dropout, batchsize, STFT?)

Thanks for sharing the code. I have some questions about model training.

  1. What is the batchsize during training? 1? gradients are accumulated for every 4 samples?
  2. Is the dropout deactivated during training? As suggested by "Investigation of Practical Aspects of Single ChannelSpeech Separation for ASR", dropout is not used.
  3. How long does it take to train the model?
  4. STFT configurations? I think the pre-trainied model uses 512-point STFT with half overlap which is a little bit different from the setup shown below and "log" is not applied to the spectorgram?

"The 25 ms frame size with the frame shift of 10 ms is usedfor feature generation. A 512-point FFT size and hamming win-dow are used in (i)STFT, forming the 257-dimentional masksand spectrum. The log spectrogram with utterance-wise meanvariance normalization is extracted as the input feature for allthe separation models."

长语音分离

你好,请问长语音分离的时候怎么设置num_speaker参数呢?

What should be the loss function when training?

您好,请问模型训练时的损失函数是怎么设置的呢?微软相关的文章提示说使用能量谱的mse loss作为损失函数(下图),但我在训练的时候发现仅使用这一损失函数,训练的loss是降不下去的,请问是否引入了其他loss或对此loss做了修改?谢谢。
1638065128(1)

模型应该怎么基于其他自定义数据集进行微调?

非常感谢大大您的工作!!
大大您的模型在我们场景中的数据能够做到十分好的分离效果。
但是模型仍然会产生一部分的噪声,我们希望能基于您的模型进行微调来完成这个效果
请问大大能告诉一下怎么基于自定义数据集合进行微调呢?

Unable to download from Drive and Azure

Hello!

I am unable to download the model weights from both Drive and Azure. Is there any possibility of making them available again? (Preferably uploaded to the repository for stable long-term storage)

Best regards, Ludvig J.

Loss function (MSE or RMSE) & the scale of the loss

When training the conformer, did you use PIT MSE or RMSE?

To compute the MSE is it correct to use the nn.MSELoss in pytorch. By default it divides the loss by the total number of elements. Should I set the reduction to sum?

I think the scale of the loss may influence the training process (https://stats.stackexchange.com/questions/346299/whats-the-effect-of-scaling-a-loss-function-in-deep-learning), so could you please provide details of how MSE is computed?

Looking forward to your reply!
Thanks for your help @Sanyuan-Chen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.