Hello: Really impressed by your work and got a few questions in terms of how you p

Hi Nick,Yes, there are joint SE & ASR training papers:<a href="ht

Dear Yong, " Yes, there are joint SE & ASR training papers: <a href="h

some question of pad_with_border,about yongxuustc/sednn

Nickkk1124 commented on August 27, 2024

Sorry
In addition, I would like to ask if I want to use this speech-enhanced system in the front of the ASR. How do I do this?

Many thanks,
Nick

from sednn.

qiuqiangkong commented on August 27, 2024

Hi Nick, The picture you show is correct. pad_with_border simply extend the left and right border. You may obtain enhanced speech from by running this code. Then ASR may apply post-hoc. Best wishes, Qiuqiang

…

________________________________ From: Nickkk1124 <[email protected]> Sent: 24 April 2018 09:57:30 To: yongxuUSTC/sednn Cc: Subscribed Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Sorry In addition, I would like to ask if I want to use this speech-enhanced system on the front of the asr. How do I do this? Many thanks, Nick — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#8 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMt5ydHnaYUDLH5wENARAsUg_HJAvFJbks5truj6gaJpZM4ThGhz>.

from sednn.

Nickkk1124 commented on August 27, 2024

Hello Qiuqiang,

Mat_2d_to_3d is to convert features to (n_segs, n_concat, n_freq).

The center frame of the first round of stacking frames is t=1, and the center frame of the second round of stacking frames should not be t=2?

But as shown in the following figure, why is the center frame of the second round of stacking frames t=4?

Many thanks,

Nick

from sednn.

yongxuUSTC commented on August 27, 2024

Hi Nick, Yes, you can use the enhanced features for ASR. But maybe you should use retraining or joint-training of your backend acoustic model for ASR. Good luck. Best regards, yong

…

-------------------------------------------------------- Dr. Yong XU https://sites.google.com/view/xuyong/home From: Nickkk1124 Date: 2018-04-24 09:57 To: yongxuUSTC/sednn CC: Subscribed Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Sorry In addition, I would like to ask if I want to use this speech-enhanced system on the front of the asr. How do I do this? Many thanks, Nick — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

from sednn.

Nickkk1124 commented on August 27, 2024

Hi Yong,

Thank you for your replying!
There are some questions I'd like to ask:

The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?
Do you think using recover enhanced wav as ASR input is feasible?
What would you recommend about applying the enhancement system to dealing with the environmental noise?

Many thanks,
Nick

from sednn.

qiuqiangkong commented on August 27, 2024

Hi Nick, In the picture you draw, it is correct. center frame=1 and center frame=4 in your drawing. It also depends on the hop. "The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?" - It means either enhanced spectrogram or log power spectrogram. "Do you think using recover enhanced wav as ASR input is feasible?" It is feasible if the dataset is small. However bare in mind any speech denoising - method will lose some information. Some work did a joint enhancement and recognition. "What would you recommend about applying the enhancement system to dealing with the environmental noise?" - I think applying on environmental noise should be fine, as long as the noise for training covers most environmental noise. Best wishes, Qiuqiang

…

________________________________ From: Nickkk1124 <[email protected]> Sent: 24 April 2018 17:18:58 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hi Yong, Thank you for your replying! There are some questions I'd like to ask: 1. The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram? 2. Do you think using recover enhanced wav as ASR input is feasible? 3. What would you recommend about applying the enhancement system to dealing with the environmental noise? Many thanks, Nick — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#8 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMt5yahThNECOw9f22-pO8B3RIlbgshRks5tr1BxgaJpZM4ThGhz>.

from sednn.

akshayaCap commented on August 27, 2024

Hello Qiuqiang,

This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.

"- method will lose some information. Some work did a joint enhancement and recognition."

I get the point of information loss. Can you please tell more about Joint enhancement and recognition?

Is it like two 2 DNN models interlinked or preprocessing and ASR.

Thank-you.

from sednn.

qiuqiangkong commented on August 27, 2024

Hi Nick, If speech enhancement and ASR are done separately, the ASR performance might be reduced. Because sometimes speech enhancement will also move out some useful information of a speech. However, if they are combined to a single neural network it might be helpful. For example, use speech enhancement as lower layer of a neural network and use ASR as a high layer neural network. The loss function can combine the ASR and speech enhancement. It is just my conjecture and I am not aware if there is such work or not. Best wishes, Qiuqiang

…

________________________________ From: akshayaCap <[email protected]> Sent: 05 July 2018 12:12:45 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hello Qiuqiang, This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion. "- method will lose some information. Some work did a joint enhancement and recognition." I get the point of information loss. Can you please tell more about Joint enhancement and recognition? Is it like two 2 DNN models interlinked or preprocessing and ASR. Thank-you. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#8 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMt5ybZnHdThX_lUbV1r7wLirIbLZnQuks5uDfStgaJpZM4ThGhz>.

from sednn.

yongxuUSTC commented on August 27, 2024

Hi Nick, Yes, there are joint SE & ASR training papers: https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html https://ieeexplore.ieee.org/abstract/document/7178797/ Best regards, yong

…

---------------------------------------------------------- Yong XU https://sites.google.com/view/xuyong/home From: qiuqiangkong Date: 2018-07-06 03:55 To: yongxuUSTC/sednn CC: yong xu @ seattle; Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hi Nick, If speech enhancement and ASR are done separately, the ASR performance might be reduced. Because sometimes speech enhancement will also move out some useful information of a speech. However, if they are combined to a single neural network it might be helpful. For example, use speech enhancement as lower layer of a neural network and use ASR as a high layer neural network. The loss function can combine the ASR and speech enhancement. It is just my conjecture and I am not aware if there is such work or not. Best wishes, Qiuqiang

________________________________ From: akshayaCap <[email protected]> Sent: 05 July 2018 12:12:45 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hello Qiuqiang, This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion. "- method will lose some information. Some work did a joint enhancement and recognition." I get the point of information loss. Can you please tell more about Joint enhancement and recognition? Is it like two 2 DNN models interlinked or preprocessing and ASR. Thank-you. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#8 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMt5ybZnHdThX_lUbV1r7wLirIbLZnQuks5uDfStgaJpZM4ThGhz>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

from sednn.

akshayaCap commented on August 27, 2024

Dear Yong,
"
Yes, there are joint SE & ASR training papers:
https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html
https://ieeexplore.ieee.org/abstract/document/7178797/
"
It was an informative read. It would be great if you could post a link to its implementation (source code)

Thank-you,
Akshaya

from sednn.

some question of pad_with_border about sednn HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent