Giter VIP home page Giter VIP logo

s3prl's Introduction



MIT License CC_BY_NC License Build Codecov Bitbucket open issues

What's New

  • April 2021: Support SUPERB: Speech processing Universal PERformance Benchmark, submitted to Interspeech 2021
  • Jan 2021: Readme updated with detailed instructions on how to use our latest version!
  • Dec 2020: We are migrating to a newer version for a more general, flexible, and scalable code. See the introduction below for more information! The legacy verison can be accessed by checking out to the tag v0.1.0: git checkout v0.1.0.

Introduction and Usages

This is an open source toolkit called s3prl, which stands for Self-Supervised Speech Pre-training and Representation Learning. Self-supervised speech pre-trained models are called upstream in this toolkit, and are utilized in various downstream tasks.

The toolkit has three major usages:

Pretrain

  • Pretrain upstream models, including Mockingjay, Audio ALBERT and TERA.
  • Document: pretrain/README.md

Upstream

  • Easily load most of the existing upstream models with pretrained weights in a unified I/O interface.
  • Pretrained models are registered through torch.hub, which means you can use these models in your own project by one-line plug-and-play without depending on this toolkit's coding style.
  • Document: upstream/README.md

Downstream

Below is an intuitive illustration on how this toolkit may help you:



Feel free to use or modify our toolkit in your research. Here is a list of papers using our toolkit. Any questsion, bug report or improvement suggestion is welcome through opening up a new issue.

If you find this toolkit helpful to your research, please do consider to cite our papers, thanks!

Installation

  • Python >= 3.6
  • Install sox on your OS
  • Install generally used packages for pretrain, upstream and downstream:
git clone https://github.com/s3prl/s3prl.git
cd s3prl/
pip install -r requirements.txt
cd ../

git clone https://github.com/pytorch/fairseq.git
cd fairseq/
pip install -e ./
cd ../
  • Some upstream models require special dependencies. If you encounter error with a specific upstream model, you can look into the README.md under each upsream folder. Eg. upstream/pase/README.md

Development pattern for contributors

  1. Create a personal fork of the main S3PRL repository in GitHub.
  2. Make your changes in a named branch different from master, e.g. you create a branch new-awesome-feature.
  3. Contact us if you have any questions during development.
  4. Generate a pull request through the Web interface of GitHub.
  5. Please verify that your code is free of basic mistakes, we appreciate any contribution!

Reference Repositories

License

The majority of S3PRL Toolkit is licensed under CC-BY-NC, however portions of the project are available under separate license terms: S3PRL is licensed under the MIT license.

Used by

List of papers that used our toolkit (Feel free to add your own paper by making a pull request)

Self-Supervised Pretraining

Explanability

Adversarial Attack

Voice Conversion

Benchmark and Evaluation

  • SUPERB: Speech processing Universal PERformance Benchmark (Yang et al., 2021)

    @misc{yang2021superb,
          title={SUPERB: Speech processing Universal PERformance Benchmark}, 
          author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
          year={2021},
          eprint={2105.01051},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }
    
  • Utilizing Self-supervised Representations for MOS Prediction (Tseng et al., 2021)

    @misc{tseng2021utilizing,
        title={Utilizing Self-supervised Representations for MOS Prediction}, 
        author={Wei-Cheng Tseng and Chien-yu Huang and Wei-Tsung Kao and Yist Y. Lin and Hung-yi Lee},
        year={2021},
        eprint={2104.03017},
        archivePrefix={arXiv},
        primaryClass={eess.AS}
    }
    

}

Citation

If you find our repository useful, please consider citing following papers.

@misc{tera,
  title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
  author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
  year={2020},
  eprint={2007.06028},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}
@misc{superb,
  title={SUPERB: Speech processing Universal PERformance Benchmark}, 
  author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
  year={2021},
  eprint={2105.01051},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

s3prl's People

Contributors

andi611 avatar leo19941227 avatar pohanchi avatar yistlin avatar ftshijt avatar voidism avatar raytzeng avatar tzuhsien avatar simpleoier avatar sungfeng-huang avatar dependabot[bot] avatar godiclee avatar ga642381 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.