Giter VIP home page Giter VIP logo

lyrikor's Introduction

LyriKOR

LyriKOR: English to Korean Song Translation with Syllabic Alignment

πŸ”€2023-1 Natural language processing (COSE461) Term project in Korea Univ.
βœ…Achieve Outstanding Projects (in Top-10 of 50 teams)
βœ…μœ„ λ…Όλ¬Έμ˜ ν•œκ΅­μ–΄ 버전이 HCLT2023 (ν•œκΈ€ 및 ν•œκ΅­μ–΄ μ •λ³΄μ²˜λ¦¬ ν•™μˆ λŒ€νšŒ)에 accepted됨
[report]

β€» μ €μžλ“€μ˜ μ‹œν—˜ κΈ°κ°„ 이슈둜 일단은 λ―Έμ™„μ„± λ ˆν¬μž…λ‹ˆλ‹€.

TODO:

  • commit evaluation.py
  • debugging (일뢀 μ™„λ£Œ)
  • inference code for csv file

Approach

overview
Model structure overview diagram

syllabic_adjustment
Model structure & How to train syllabic adjustment model

syllabic_adjustment
Example of inference

Results

원문 λ²ˆμ—­
I will always remember 항상 κΈ°μ–΅ν• κ»˜μš”
The day you kissed my lips μž…λ§žμΆ€ ν•΄μ£Όλ˜
A hopeless romantic all my life λ‚΄ ν‰μƒμ˜ 희망 μ—†λŠ” λ‚­λ§Œ
Surrounded by couples all the time 늘 μ»€ν”Œμ— λ‘˜λŸ¬μ‹Έμ—¬ 늘

Environment Setup

  1. Install KoBART
    Colabμ—μ„œ μœ„ 링크에 μ†Œκ°œλœ μ„€μΉ˜ 방법(pip install ~)λŒ€λ‘œ KoBARTλ₯Ό μ„€μΉ˜ν•  경우 버전 좩돌 λ•Œλ¬Έμ— μ œλŒ€λ‘œ μ„€μΉ˜λ˜μ§€ μ•ŠλŠ” 문제 λ°œμƒ.
    κ·ΈλŸ¬λ―€λ‘œ pip을 μ΄μš©ν•΄μ„œ μ„€μΉ˜ν•˜λŠ” λŒ€μ‹ ,
    git clone https://github.com/SKT-AI/KoBART
    
    둜 μ½”λ“œλ₯Ό 내렀받은 ν›„ kobart 디렉토리λ₯Ό λ‹€μŒμ˜ κ²½λ‘œμ— μœ„μΉ˜μ‹œμΌœ μ£Όμ„Έμš”.
    Use the command above (git clone~ ) and put the kobart directory to the following path.
    LyriKOR
    └─ kobart
    └─ ...
    
  2. Install the other required modules
    ! TODO: make requeirements.txt file !
    (colab에 κΉ”λ €μžˆλŠ” module + transformer + boto3)
    pip install -r requirements.txt
    

How to Train the Syllabic Adjustment model

  1. Prepare the Lyrics csv file of Korean songs.
    It must have a column named lyrics. For example:

    lyrics
    이 λ°€ κ·Έλ‚ μ˜ λ°˜λ”§λΆˆμ„ λ‹Ήμ‹ μ˜ ...
  2. Make train dataset file. Use the command below.

    python preprocessing_for_train_data.py --lyrics_dataset_path=lyrics_file_name.csv
    				       --save_dataset_path=train_dataset_file_name.csv
    
  3. If there were no abnormalities, the train dataset file would have been created in the dataset directory. If you created a file in a different location, please move the file to the following location.

    LyriKOR
    └─ dataset
    	└─ train_dataset_file_name.csv
    └─ ...
    
  4. Use the command to train the model.

    cd Syllabic_adjustment
    python train.py --train_csv_file=train_dataset_file_name.csv
    
  5. If you want to load the checkpoint of our model and continue to train, use --checkpoint_path option.

    python train.py --checkpoint_path=path/to/load/model
    
  6. If you want to tune the hyperparameter, use those options.

    python train.py ...
    		--batch_size (default=512)
    		--epochs (default=15)
    		--warmup_ratio (default=3e-5)
    		--learning_rate (default=1.0)
    

How to Inference

  1. By a single text line
    python inference.py --input=input_text
           --checkpoint_path=path/to/load/model
    
  2. By a csv file (multiple lines)
    ! TODO !

Reference

KoBART
KoBART Question Generation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.