-
$git clone https://github.com/songys/Chatbot_data.git
(Chatbot_data by songys, MIT License)
-
para_kqc_sim_data.txt
(paraKQC by warnikchow, Creative Commons Attribution Share Alike 4.0 International License)
-
python get_CLS.py
-
python train_model.py
(train paraphrase detection task using para_kqc_sim_data.txt)
before running, you must execute get_CLS.py and train_model.py
- python main.py
- input sentence
- get input sentence`s CLS token using pretrained BERT model
- calculate cosine similarity of input sentence and CLS tokens in Chatbot_data
- get Chatbot_data`s indexs that have top n similarity
- detect input sentence and top n data are similar (using trained model in train_model.py)
- at least one question is similar, answer will be question`s answer that has the highest confidence
- all question is not similar, answer "잘 모르겠어요."