Giter VIP home page Giter VIP logo

level1-semantictextsimilarity-nlp-01's Introduction

๐Ÿ ๋ฉค๋ฒ„ ๊ตฌ์„ฑ ๋ฐ ์—ญํ• 

์ „ํ˜„์šฑ ๊ณฝ์ˆ˜์—ฐ ๊น€๊ฐ€์˜ ๊น€์‹ ์šฐ ์•ˆ์œค์ฃผ
  • ์ „ํ˜„์šฑ
    • ํŒ€ ๋ฆฌ๋”, ensemble ๊ตฌํ˜„, ๋‹จ์ผ ๋ชจ๋ธ ํ•™์Šต
  • ๊ณฝ์ˆ˜์—ฐ
    • Weighted Sampler ๊ตฌํ˜„, ๋‹จ์ผ ๋ชจ๋ธ ํ•™์Šต
  • ๊น€๊ฐ€์˜
    • Loss function ์‹คํ—˜, ๋‹จ์ผ ๋ชจ๋ธ ํ•™์Šต
  • ๊น€์‹ ์šฐ
    • ๋ณตํ•ฉ ๋ชจ๋ธ ์‹คํ—˜, K-Fold ๊ตฌํ˜„, ๋‹จ์ผ ๋ชจ๋ธ ํ•™์Šต
  • ์•ˆ์œค์ฃผ
    • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์ฆ๊ฐ•, ๋‹จ์ผ ๋ชจ๋ธ ํ•™์Šต

๐Ÿ ํ”„๋กœ์ ํŠธ ๊ธฐ๊ฐ„

2023.12.11 10:00 ~ 2023.12.21 19:00

๐ŸŒ ํ”„๋กœ์ ํŠธ ์†Œ๊ฐœ

  • STS(Semantic Text Similarity)๋ž€ ๋‘ ํ…์ŠคํŠธ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ์ง€ ํŒ๋‹จํ•˜๋Š” NLP Task๋กœ, ์ผ๋ฐ˜์ ์œผ๋กœ ๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜๊ณ  ์ด๋Ÿฌํ•œ ๋ฌธ์žฅ ์Œ์ด ์–ผ๋งˆ๋‚˜ ์˜๋ฏธ์ ์œผ๋กœ ์„œ๋กœ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ์ง€๋ฅผ ํŒ๋‹จํ•˜๋Š” ๊ณผ์ œ์ด๋‹ค.
  • ๋ณธ ํ”„๋กœ์ ํŠธ๋Š” ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์…‹์„ ๋ฐ”ํƒ•์œผ๋กœ 0๊ณผ 5์‚ฌ์ด์˜ ์œ ์‚ฌ๋„ ์ ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์— ๋ชฉ์ ์„ ๋‘”๋‹ค.

๐Ÿฅฅ ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

  • Train Data : 9,324๊ฐœ
  • Test Data : 1,100๊ฐœ
  • Dev Data : 550๊ฐœ

๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์กฐ

Column ์„ค๋ช…
id ๋ฌธ์žฅ ๊ณ ์œ  ID. ๋ฐ์ดํ„ฐ์˜ ์ด๋ฆ„, ๋ฒ„์ „, train/dev/test
source ๋ฌธ์žฅ์˜ ์ถœ์ฒ˜ - petition(๊ตญ๋ฏผ์ฒญ์›), NSMC(๋„ค์ด๋ฒ„ ์˜ํ™”), slack(์—…์Šคํ…Œ์ด์ง€)
sentence1 ๋ฌธ์žฅ ์Œ์˜ ์ฒซ๋ฒˆ์งธ ๋ฌธ์žฅ
sentence2 ๋ฌธ์žฅ ์Œ์˜ ๋‘๋ฒˆ์งธ ๋ฌธ์žฅ
label ๋ฌธ์žฅ ์Œ์— ๋Œ€ํ•œ ์œ ์‚ฌ๋„ (0~5, ์†Œ์ˆ˜์  ์ฒซ๋ฒˆ์งธ ์ž๋ฆฌ๊นŒ์ง€ ํ‘œ์‹œ)
binary-label label์ด 2.5 ์ดํ•˜์ธ ๊ฒฝ์šฐ๋Š” 0, ๋‚˜๋จธ์ง€๋Š” 1

Label ์ ์ˆ˜ ๊ธฐ์ค€

label ์„ค๋ช…
5 ๋‘ ๋ฌธ์žฅ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์ด ๋™์ผํ•˜๋ฉฐ, ๋ถ€๊ฐ€์ ์ธ ๋‚ด์šฉ๋“ค๋„ ๋™์ผํ•จ
4 ๋‘ ๋ฌธ์žฅ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์ด ๋™๋“ฑํ•˜๋ฉฐ, ๋ถ€๊ฐ€์ ์ธ ๋‚ด์šฉ์—์„œ๋Š” ๋ฏธ๋ฏธํ•œ ์ฐจ์ด๊ฐ€ ์žˆ์Œ
3 ๋‘ ๋ฌธ์žฅ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์€ ๋Œ€๋žต์ ์œผ๋กœ ๋™๋“ฑํ•˜์ง€๋งŒ, ๋ถ€๊ฐ€์ ์ธ ๋‚ด์šฉ์— ๋ฌด์‹œํ•˜๊ธฐ ์–ด๋ ค์šด ์ฐจ์ด๊ฐ€ ์žˆ์Œ
2 ๋‘ ๋ฌธ์žฅ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์€ ๋™๋“ฑํ•˜์ง€ ์•Š์ง€๋งŒ, ๋ช‡ ๊ฐ€์ง€ ๋ถ€๊ฐ€์ ์ธ ๋‚ด์šฉ์„ ๊ณต์œ ํ•จ
1 ๋‘ ๋ฌธ์žฅ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์€ ๋™๋“ฑํ•˜์ง€ ์•Š์ง€๋งŒ, ๋น„์Šทํ•œ ์ฃผ์ œ๋ฅผ ๋‹ค๋ฃจ๊ณ  ์žˆ์Œ
0 ๋‘ ๋ฌธ์žฅ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์ด ๋™๋“ฑํ•˜์ง€ ์•Š๊ณ , ๋ถ€๊ฐ€์ ์ธ ๋‚ด์šฉ์—์„œ๋„ ๊ณตํ†ต์ ์ด ์—†์Œ

ํ‰๊ฐ€ ์ง€ํ‘œ

  • ํ”ผ์–ด์Šจ ์ƒ๊ด€ ๊ณ„์ˆ˜ PCC(Pearson Correlation Coefficient) : ๋‘ ๋ณ€์ˆ˜ X์™€ Y๊ฐ„์˜ ์„ ํ˜• ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ๊ณ„๋Ÿ‰ํ™”ํ•œ ์ˆ˜์น˜
  • ์ •๋‹ต์„ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค, ๋†’์€ ๊ฐ’์€ ํ™•์‹คํžˆ ๋†’๊ฒŒ, ๋‚ฎ์€ ๊ฐ’์€ ํ™•์‹คํžˆ ๋‚ฎ๊ฒŒ ์ „์ฒด์ ์ธ ๊ฒฝํ–ฅ์„ ์ž˜ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๊ฒŒ ์ž‘์šฉ

๐Ÿคฟ ์‚ฌ์šฉ ๋ชจ๋ธ

  • klue/roberta-small
  • klue/roberta-large
  • rurupang/roberta-base-finetuned-sts
  • monologg/koelectra-base-v3-discriminator
  • BM-K/KoDiffCSE-RoBERTa
  • snunlp/KR-ELECTRA-discriminator

๐Ÿ‘’ ํด๋” ๊ตฌ์กฐ

.
โ”œโ”€โ”€ Readme.md
โ”œโ”€โ”€ wrapup-report.pdf
โ””โ”€โ”€ code
    โ”œโ”€โ”€ KSW
    โ”‚   โ””โ”€โ”€ train_kfold.py
    โ”œโ”€โ”€ KSY
    โ”‚   โ”œโ”€โ”€ train
    โ”‚   โ”‚   โ”œโ”€โ”€ train_kfold_WRS.py
    โ”‚   โ”‚   โ”œโ”€โ”€ train_koelectra.py
    โ”‚   โ”‚   โ”œโ”€โ”€ train_test_aug.py
    โ”‚   โ”‚   โ”œโ”€โ”€ train_test_label.py
    โ”‚   โ”‚   โ”œโ”€โ”€ train_test_WeightedMSE.py
    โ”‚   โ”‚   โ””โ”€โ”€ train_test_WRS.py
    โ”‚   โ””โ”€โ”€ utils
    โ”‚       โ”œโ”€โ”€ data_augmentation.py
    โ”‚       โ”œโ”€โ”€ ensemble.py
    โ”‚       โ””โ”€โ”€ inference_koelectra.py
    โ”œโ”€โ”€ KGY
    โ”‚   โ”œโ”€โ”€ loss_functions.py
    โ”‚   โ”œโ”€โ”€ source_tagging.py
    โ”‚   โ””โ”€โ”€ trainMSE.py
    โ”œโ”€โ”€ AYJ
    โ”‚   โ”œโ”€โ”€ model_test_fin.py
    โ”‚   โ”œโ”€โ”€ model_test_fin2.py
    โ”‚   โ”œโ”€โ”€ inference.py
    โ”‚   โ”œโ”€โ”€ <soon update>
    โ”‚   โ””โ”€โ”€ <soon update>
    โ”œโ”€โ”€ JHW
    โ”‚   โ”œโ”€โ”€ back_translate.py
    โ”‚   โ”œโ”€โ”€ ensemble.py
    โ”‚   โ””โ”€โ”€ make_train_uniform.py
    โ””โ”€โ”€ final
        โ”œโ”€โ”€ data
        โ”œโ”€โ”€ fine-tuned
        โ”œโ”€โ”€ output
        โ”œโ”€โ”€ back_translate.py
        โ”œโ”€โ”€ ensemble.py
        โ”œโ”€โ”€ make_train_uniform.py
        โ”œโ”€โ”€ inference.py
        โ””โ”€โ”€ train.py

๐Ÿธ Leaderboard

pearson
Public 0.9218
Private 0.9311

level1-semantictextsimilarity-nlp-01's People

Contributors

gusdnr122997 avatar github-classroom[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.