unagi2 / job_classification Goto Github PK
View Code? Open in Web Editor NEWSignate主催である,職種分類コンペのプロジェクト
Home Page: https://signate.jp/competitions/724
Signate主催である,職種分類コンペのプロジェクト
Home Page: https://signate.jp/competitions/724
BERT系以外を実装する
Fine Tuningに用いるデータの違いによる評価の変化
TrainとTestデータをマージ後FineTuneして生成したデータセット
CV = 0.952277695
LB = 0.4653222317277211
TrainデータのみでFine Tuneして生成したデータセット
CV = 0.951656414
LB = 0.4669281
Trainデータのみの方が+0.0016でスコアが良いが,大きな違いは見られなかった.
オリジナルは @Unagi2 が #3 (comment) にポスト
現状、生成データを追加したデータセットでBert学習を行いCV値も0.94を記録しているがLB値が0.44である原因は、生成データの品質が良くないことが考えられる。
生成するテキストデータの制約や、生成時に投入する1センテンス最初の3単語が妥当なのか検討する必要がある。
当初予定していたgpt2-mediumは、モデルの層のサイズが非常に大きく、計算機サーバ(susanoo:rtx3080 12GB)ではメモリのオーバーフローでFine Tuning が不可能であることが判明
この前共有したデータにも,下記の問題点が含まれている.(train_generated_cwea_repeat1_7words_gramarcheck.csv)
現在は問題点は解消している.加えて生成モデルのFine Tuningでは,Loss Scoreが -0.0245 と改善し,語彙や文の構築の精度も上がっている.
[Before]
SegmentationMeasurementOmnichannelResource Allocation and OptimizationExperimental design (test/control) analysesCollaborate on cross-matrix teams as well as influence across varying levels of leadership, by demonstrating subject matter leadership with other teams/functions to drive efficiencies and seamless decision making.
[After]
Segmentation.Measurement.Omnichannel.Resource Allocation and Optimization.Experimental design (test/control) analyses.Collaborate on cross-matrix teams as well as influence across varying levels of leadership, by demonstrating subject matter leadership with other teams/functions to drive efficiencies and seamless decision making.
[Input Data] 生成モデルinput元データ
<span style="font-family:"Calibri",sans-serif">Conceptualize and design innovative models to assess viability of new ongoing initiatives and program offerings.<span style="font-family:"Calibri",sans-serif">Build out enterprise-wide framework for reporting.<span style="font-family:"Calibri",sans-serif">Perform ad-hoc analysis for internal and external stakeholders.<span style="font-family:"Calibri",sans-serif">Other duties as assigned.
[Before] 生成データ
Large lt span style sans font family calibri lt sans Lt span style font family Calibri sans-serif Typescript, Helvetica Mockingbird (Scattletoad, HTML, PDF, TypeScript) Translate web features into HTML Change control functionality with CSS / CSS, CSS Interfaces (HTML, XML) for enhanced layouts Interpreting complex user interactions, and providing a clear and concise and effective interface Effectively manage the deployment of web applications Generate and maintain user-facing notifications, notifications and notifications to alert the user
[After] 生成データ
Conceptualize alternatives and eventually design innovative simulation models to assess the effectiveness of existing technology solutions. Provide support and mentorship to internal groups on business performance analysis, design tools, and solutions using industry trends, trends in research and development, business practices, industry practices and practices. Work with and communicate effectively with cross-functional partners. Participate in the development and delivery of research initiatives that are aimed at solving business problems for the company. Responsible for delivering outstanding research products to audiences around the world.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.