jusperlee / ctcnet Goto Github PK

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

License: Apache License 2.0

Python 99.03% Shell 0.97%

ctcnet's Introduction

Hey 👋🏽, I'm Kai Li!

My name is Kai Li (Chinese name: 李凯). I'm a second-year master student at Department of Computer Science and Technology, Tsinghua University, supervised by Prof. Xiaolin Hu (胡晓林). I am also a member of TSAIL Group directed by Prof. Bo Zhang (张拨) and Prof. Jun zhu (朱军). I am an intern at Tencent AI Lab, mainly doing research on causal speech separation, supervised by Yi Luo (罗艺).

🤗 These works are open source to the best of my ability.

🤗 I am currently doing research on multimodal speech separation, and am interested in other speech tasks (e.g., pre-training models and neuralscience). If you would like to collaborate, please contact me. Many thanks.

🔖 Homepages

: Kai Li : Jusper Lee : cslikai.cn

📅 News

2023.07: 🎲 One paper is accepted by ECAI 2023.
2023.05: 🧩 Two papers are accepted by Interspeech 2023.
2023.05: 🎉 We won the first prize 🥇 of the Cinematic Sound Demixing Track 23 in the Leaderboard A and B.
2023.05: 🎉 We won the first prize 🥇 of the ASC23 and Best Application Award.
2023.04: 🎲 One paper is appeared by Arxiv.
2023.02: 🧩 One paper is accepted by ICASSP 2023.
2023.01: 🧩 One paper is accepted by ICLR 2023.

📰 Selected Publications:

See Google Scholar for a full list of publications.

Speech Separation

An efficient encoder-decoder architecture with top-down attention for speech separation. Kai Li, Runxuan Yang, Xiaolin Hu. ICLR 2023.
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits. Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu. Arxiv 2022.
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network Xiaolin Hu, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann. NeurIPS 2021.

Neuroscience

Inferring mechanisms of auditory attentional modulation with deep neural networks. Ting-Yu Kuo, Yuanda Liao, Kai Li, Bo Hong, Xiaolin Hu. Neural Computation 2022.

Cloud Removal

PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-Performance Cloud Removal from Multi-temporal Satellite Imagery. Xuechao Zou, Kai Li, Junliang Xing, Pin Tao#, Yachao Cui. ECAI 2023.

Super Resolution

A Survey of Single Image Super Resolution Reconstruction. Kai Li, Shenghao Yang, Runting Dong, Jianqiang Huang, Xiaoying Wang. IET Image Processing 2020.
Single Image Super-resolution Reconstruction of Enhanced Loss Function with Multi-GPU Training. Jianqiang Huang, Kai Li, Xiaoying Wang. ISPA 2019.

ctcnet's People

Contributors

Stargazers

Watchers

Forkers

pandruszkowp alibali920c andres-k3 ernestofgonzalezkfc5w horselord-abrekt6c3n dnlowmanl xibread5qv6qu mreale8 darkred46n9r nanangarsyadw chickenrushkr puddingss suchalongtime runngezhang-jx luhuijun666 sherryyu33

ctcnet's Issues

The result of the loss function is negative

When I tried to train the model, I found that the loss was negative. Is this normal, or do I need to modify the loss function?

A probable typo in the code?

Hello!When I try to train the model, there is an error said AttributeError: 'VideoBlock' object has no attribute 'get_block_block' and there is no get_block_block in class VideoBlock in videosubnetwork.py. Can you help fix this? Thanks in advance.

AVSpeech Dataset

Hi, I have downloaded videos from the AVSpeech Datasets, how can I preprocess that to train this model ?

about frcnn_128_512.backbone.pth.tar

Thanks for sharing your great work.

I was trying to run the model, however I didn't find the pretrained frcnn_128_512.backbone.pth.tar for videonet. Could you please share it? Thanks.

Misspelling in model code

File ctcnet.py contain few "self.video_block.get_block_block" calls. Obviously this is "self.video_block.get_video_block" instead.

mix.json

Hello, when I want to train the model, it has a error that I don't have a mix.json. How can I create the mix.json?

Pretrained model

Can you provide pretrained model please.

ctcnet,py里的def fuse

'VideoBlock' object has no attribute 'get_block_block'. Did you mean: 'get_concat_block'?

关于资料集的下载

请问可以分享一下LRS2-2mix、LRS3-2mix，Voceleb2-mix这三个资料集的链接吗，我看说因为权限问题被移除~感谢。

Error for pytorch-lightning

Thanks for sharing your excellent work! But I confronted an error and thus asking for help. When I run the trainer.fit(system) in train_ctc.py, there is an error said: The LightningModule.on_epoch_end hook was removed in v1.8. Please use LightningModule.on_<train/validation/test>_epoch_end instead.

But I can't find any possible code in train_ctc.py to fix it. I think this happens to the wrong version of pytorch-lightning, but the version is exactly what you wrote in the readme. So can you help me fix this problem?

Alternative to Baidu driver

Hi,

Can you please provide an alternative link for the Baidu driver? Or share the code to generate the test sets (LRS2/LRS3/Vox2)?

Thanks