Giter VIP home page Giter VIP logo

npda-knn-st's Introduction

NPDA-kNN-ST

Official implementation of EMNLP'2022 paper "Non-Parametric Domain Adaptation for End-to-end Speech Translation".

This codebase is currently a nightly version and is undergoing refactoring, and we will release the refactored code in the future.

Citation

Please cite our paper if you find this repository helpful in your research:

@article{Du2022NonParametricDA,
  title={Non-Parametric Domain Adaptation for End-to-End Speech Translation},
  author={Yichao Du and Weizhi Wang and Zhirui Zhang and Boxing Chen and Tong Xu and Jun Xie and Enhong Chen},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.11211}
}

Instructions

Requirements and Insallation

  • python = 3.6
  • pytorch = 1.8.1
  • torchaudio = 0.8.1
  • SoundFile = 0.10.3.post1
  • numpy = 1.19.5
  • omegaconf = 2.0.6
  • PyYAML = 5.4.1
  • sentencepiece = 0.1.96
  • sacrebleu = 1.5.1
  • faiss-gpu = 1.7.1.post1
  • torch-scatter = 2.0.8

Preparations and Configurations

Pre-trained Model and Data

We use the vocab file and pre-trained ST model provided by Fairseq S2T MuST-C Example.

TSV Data

The TSV manifests we used are different from Fairseq S2T MuST-C Example, as follows:

id	audio	n_frames	speaker	src_lang	src_text	tgt_lang	tgt_text
ted_767_0	/data/mustc/en-fr/fbank80.zip:55688475685:274688	858	spk.767	en	These breakthroughs, we need to move those at full speed, and we can measure that in terms of companies, pilot projects, regulatory things that have been changed.	fr	Ces progrès, il faut que nous les réalisions à toute vitesse, et nous pouvons quantifier cela en termes de sociétés, de projets pilotes, de modifications des dispositions réglementaires.

You can get them by:

cd ./myscripts/prepare_data
bash ./prepare_data/prep_mustc_data.sh
bash ./prep_europarst_data.sh
bash ./prep_europarmt_data.sh

Config File

If the source and target dictionaries are different, we need to declare src_bpe_tokenizer, src_vocab_filename, tgt_bpe_tokenizer, and vocab_filename in config file.

input_channels: 1
input_feat_per_channel: 80
sampling_alpha: 1.0
specaugment:
  freq_mask_F: 27
  freq_mask_N: 1
  time_mask_N: 1
  time_mask_T: 100
  time_mask_p: 1.0
  time_wrap_W: 0
transforms:
  '*':
  - utterance_cmvn
  _train:
  - utterance_cmvn
  - specaugment
src_bpe_tokenizer:
  bpe: sentencepiece
  sentencepiece_model: /data/mustc/en-fr/st_joint_mt_data/spm_unigram_5000.model
src_vocab_filename: /data/mustc/en-fr/st_joint_mt_data/spm_unigram_5000.txt
tgt_bpe_tokenizer:
  bpe: sentencepiece
  sentencepiece_model: /data/mustc/en-fr/st_joint_mt_data/spm_unigram_8000.model
vocab_filename: /data/mustc/en-fr/st_joint_mt_data/spm_unigram_8000.txt

For multilingual experiments, you need to add a new parameter prepend_tgt_lang_tag: True to the configuration yaml file.

Unifying Text and Speech Representation

We input parallel $\langle speech, translation\rangle$ and $\langle speech, transcription\rangle$ into the model to unify text adn speech representation on the decoder side. For convenience, we also provide the well trained model.

Trianing Bilingual Model on MuST-C Corpus:

cd ./myscripts/mustc2europarl
bash ./unify_representation_bilingual.sh

Trianing Multilingual Model on MuST-C Corpus:

cd ./myscripts/mustc2europarl
bash ./unify_representation_multilingual.sh

Inference with $k$NN Retrieval

Create Datastore

When the model with unified text and speech representation is tuned well, we could load the model for creating a cached datastore with the script as follow:

cd ./myscripts/mustc2europarlst
bash ./build_datastore.sh

Build Faiss Index

The FAISS index requires a training stage where it learns a set of clusters for the keys. Once this is completed, the keys must all be added to the index. The speed of adding keys to the index depends on the hardware, particularly the amount of RAM available. When the knn_index is build, we can remove keys.npy and vals.key to save the hard disk space.

cd ./myscripts/mustc2europarlst
bash ./train_datastore.sh

Inference via $k$NN-ST

cd ./myscripts/mustc2europarlst
bash ./eval_via_knn.sh

npda-knn-st's People

Contributors

duyichao avatar

Stargazers

 avatar Zhiwei He avatar  avatar Jorge Iranzo avatar Ruize Gao avatar Weizhi Wang avatar Shudong Liu avatar Jean-Louis Queguiner avatar Nickolay V. Shmyrev avatar  avatar  avatar

Watchers

Nickolay V. Shmyrev avatar  avatar

npda-knn-st's Issues

Pretrain ST model

Would you mind sharing pretrain ST model to debug save_datastore.py, it don's work when I use another fairseq ST model. Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.