π€2023-1 Natural language processing (COSE461) Term project in Korea Univ.
β
Achieve Outstanding Projects (in Top-10 of 50 teams)
β
μ λ
Όλ¬Έμ νκ΅μ΄ λ²μ μ΄ HCLT2023 (νκΈ λ° νκ΅μ΄ μ 보μ²λ¦¬ νμ λν)μ acceptedλ¨
[report]
β» μ μλ€μ μν κΈ°κ° μ΄μλ‘ μΌλ¨μ λ―Έμμ± λ ν¬μ λλ€.
TODO:
- commit evaluation.py
- debugging (μΌλΆ μλ£)
- inference code for csv file
Model structure overview diagram
Model structure & How to train syllabic adjustment model
μλ¬Έ | λ²μ |
---|---|
I will always remember | νμ κΈ°μ΅ν κ»μ |
The day you kissed my lips | μ λ§μΆ€ ν΄μ£Όλ |
A hopeless romantic all my life | λ΄ νμμ ν¬λ§ μλ λλ§ |
Surrounded by couples all the time | λ 컀νμ λλ¬μΈμ¬ λ |
- Install KoBART
Colabμμ μ λ§ν¬μ μκ°λ μ€μΉ λ°©λ²(pip install ~)λλ‘ KoBARTλ₯Ό μ€μΉν κ²½μ° λ²μ μΆ©λ λλ¬Έμ μ λλ‘ μ€μΉλμ§ μλ λ¬Έμ λ°μ.
κ·Έλ¬λ―λ‘ pipμ μ΄μ©ν΄μ μ€μΉνλ λμ ,λ‘ μ½λλ₯Ό λ΄λ €λ°μ νgit clone https://github.com/SKT-AI/KoBART
kobart
λλ ν 리λ₯Ό λ€μμ κ²½λ‘μ μμΉμμΌ μ£ΌμΈμ.
Use the command above (git clone~ ) and put thekobart
directory to the following path.LyriKOR ββ kobart ββ ...
- Install the other required modules
! TODO: make requeirements.txt file !
(colabμ κΉλ €μλ module + transformer + boto3)pip install -r requirements.txt
-
Prepare the Lyrics csv file of Korean songs.
It must have a column namedlyrics
. For example:lyrics μ΄ λ°€ κ·Έλ μ λ°λ§λΆμ λΉμ μ ... -
Make train dataset file. Use the command below.
python preprocessing_for_train_data.py --lyrics_dataset_path=lyrics_file_name.csv --save_dataset_path=train_dataset_file_name.csv
-
If there were no abnormalities, the train dataset file would have been created in the
dataset
directory. If you created a file in a different location, please move the file to the following location.LyriKOR ββ dataset ββ train_dataset_file_name.csv ββ ...
-
Use the command to train the model.
cd Syllabic_adjustment python train.py --train_csv_file=train_dataset_file_name.csv
-
If you want to load the checkpoint of our model and continue to train, use
--checkpoint_path
option.python train.py --checkpoint_path=path/to/load/model
-
If you want to tune the hyperparameter, use those options.
python train.py ... --batch_size (default=512) --epochs (default=15) --warmup_ratio (default=3e-5) --learning_rate (default=1.0)
- By a single text line
python inference.py --input=input_text --checkpoint_path=path/to/load/model
- By a csv file (multiple lines)
! TODO !