Giter VIP home page Giter VIP logo

job_classification's People

Contributors

kazunonda avatar unagi2 avatar

Watchers

 avatar

job_classification's Issues

Fine Tuningに用いるデータの違いによる評価の変化

Fine Tuningに用いるデータの違いによる評価の変化

  • TrainとTestデータをマージ後FineTuneして生成したデータセット
    CV = 0.952277695
    LB = 0.4653222317277211

  • TrainデータのみでFine Tuneして生成したデータセット
    CV = 0.951656414
    LB = 0.4669281

Trainデータのみの方が+0.0016でスコアが良いが,大きな違いは見られなかった.

オリジナルは @Unagi2#3 (comment) にポスト

生成モデル Generate に関して【改善済】

現状、生成データを追加したデータセットでBert学習を行いCV値も0.94を記録しているがLB値が0.44である原因は、生成データの品質が良くないことが考えられる。

生成するテキストデータの制約や、生成時に投入する1センテンス最初の3単語が妥当なのか検討する必要がある。

生成モデル Fine Tuning に関して

当初予定していたgpt2-mediumは、モデルの層のサイズが非常に大きく、計算機サーバ(susanoo:rtx3080 12GB)ではメモリのオーバーフローでFine Tuning が不可能であることが判明

前処理の問題点と改善内容

Preprocessの改善

この前共有したデータにも,下記の問題点が含まれている.(train_generated_cwea_repeat1_7words_gramarcheck.csv)
現在は問題点は解消している.加えて生成モデルのFine Tuningでは,Loss Scoreが -0.0245 と改善し,語彙や文の構築の精度も上がっている.

Trainデータ内の変化

  • 正しく文を分割できておらず,単語が連結してしまい,英文として機能していない.改善後は,ピリオドで分割され単語や文がそれぞれ分離して,学習モデルや生成モデルへの影響を減少することができたと考えられる.

[Before]
SegmentationMeasurementOmnichannelResource Allocation and OptimizationExperimental design (test/control) analysesCollaborate on cross-matrix teams as well as influence across varying levels of leadership, by demonstrating subject matter leadership with other teams/functions to drive efficiencies and seamless decision making.

[After]
Segmentation.Measurement.Omnichannel.Resource Allocation and Optimization.Experimental design (test/control) analyses.Collaborate on cross-matrix teams as well as influence across varying levels of leadership, by demonstrating subject matter leadership with other teams/functions to drive efficiencies and seamless decision making.

生成データ内の変化

  • Inputデータ元が正しく前処理されておらず,HTMLタグが含まれており,生成データにもその影響が現れており,類似文が生成できていない.
  • 句点が無かったり,セミコロンが存在していたが,修正後はピリオドが打たれており不必要な記号を排除している

[Input Data] 生成モデルinput元データ
<span style="font-family:"Calibri",sans-serif">Conceptualize and design innovative models to assess viability of new ongoing initiatives and program offerings.<span style="font-family:"Calibri",sans-serif">Build out enterprise-wide framework for reporting.<span style="font-family:"Calibri",sans-serif">Perform ad-hoc analysis for internal and external stakeholders.<span style="font-family:"Calibri",sans-serif">Other duties as assigned.

[Before] 生成データ
Large lt span style sans font family calibri lt sans Lt span style font family Calibri sans-serif Typescript, Helvetica Mockingbird (Scattletoad, HTML, PDF, TypeScript) Translate web features into HTML Change control functionality with CSS / CSS, CSS Interfaces (HTML, XML) for enhanced layouts Interpreting complex user interactions, and providing a clear and concise and effective interface Effectively manage the deployment of web applications Generate and maintain user-facing notifications, notifications and notifications to alert the user

[After] 生成データ
Conceptualize alternatives and eventually design innovative simulation models to assess the effectiveness of existing technology solutions. Provide support and mentorship to internal groups on business performance analysis, design tools, and solutions using industry trends, trends in research and development, business practices, industry practices and practices. Work with and communicate effectively with cross-functional partners. Participate in the development and delivery of research initiatives that are aimed at solving business problems for the company. Responsible for delivering outstanding research products to audiences around the world.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.