Giter VIP home page Giter VIP logo

business_analytics_ch4's Introduction

Business_Analytics_ch4

Ensemble Learning

๐Ÿ“‚ Contents


  • Background
  • Steps
  • Dataset
  • Hyper-parameter search
  • Stump tree์˜ ๊ฐฏ์ˆ˜์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”
  • Result

๐Ÿ“Œ Background


  • AdaBoost
  • classifier์˜ accuracy๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋‹ค์ˆ˜์˜ weak classifier๋ฅผ ๊ฒฐํ•ฉ์‹œํ‚ด
  • weak classifier(learner) : ๋žœ๋ค ๋ชจ๋ธ์— ๋น„ํ•ด ์•ฝ๊ฐ„์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์žˆ๋Š” ๋ชจ๋ธ
  • ์˜ค๋ฅ˜ ๋ฐ์ดํ„ฐ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜๋ฉฐ boosting์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋Œ€ํ‘œ์  ์•Œ๊ณ ๋ฆฌ์ฆ˜
  • AdaBoost๋Š” ๋‹ค์ˆ˜์˜ weighted training

๐Ÿ“Œ Steps


  • Step 1 : ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ ๋˜๋Š” ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
  • Step 2 : Model fit (AdaBoostClassifier ์ƒ์„ฑ)
  • Step 3 : ๊ฒฐ๊ณผ๊ฐ’ ์˜ˆ์ธก
  • (Step 4 : ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”)

๐Ÿ“Œ Dataset


  • make_classification ํ•จ์ˆ˜ ์ด์šฉํ•˜์—ฌ ์ž„์˜์˜ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ
  • Email Spam Classification Dataset (csv) download
  • breast_cancer

๐Ÿ”Ž [์‹ฌํ™”๊ณผ์ •] Hyperparameter Search


  1. base_estimator : ensemble์„ ํ•  model. ํ•™์Šต์— ์‚ฌ์šฉํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜
  2. n_estimators : ์ƒ์„ฑํ•  ์•ฝํ•œ ํ•™์Šต๊ธฐ ๊ฐฏ์ˆ˜ ์ง€์ • (default = 50)
  3. learning_rate : ํ•™์Šต์„ ์ง„ํ–‰ํ•  ๋•Œ๋งˆ๋‹ค ์ ์šฉํ•˜๋Š” ํ•™์Šต๋ฅ (0~1)/weak learner๊ฐ€ ์ˆœ์ฐจ์ ์œผ๋กœ ์˜ค๋ฅ˜๊ฐ’์„ ๋ณด์ •ํ•ด๋‚˜๊ฐˆ ๋•Œ ์ ์šฉํ•˜๋Š” ๊ณ„์ˆ˜ (default = 1.0)
  4. random_state : ์‹คํ–‰์‹œ ๋™์ผํ•œ ๋žœ๋ค ์ˆซ์ž๊ฐ’์ด ๋‚˜์˜ค๋„๋ก ์„ค์ •
  • max_feature : ๊ฐ๊ฐ์˜ base estimator์—์„œ ์ถ”์ถœํ•˜๋Š” feature ์ˆ˜

๐ŸŒณ [์‹ฌํ™”๊ณผ์ •] Stump tree์˜ ๊ฐฏ์ˆ˜์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”


  • ์ด๋ฒˆ์—๋Š” stump tree์˜ ๊ฐฏ์ˆ˜๊ฐ€ ๋‹ฌ๋ผ์ง์— ๋”ฐ๋ผ ์–ด๋–ป๊ฒŒ ์„ฑ๋Šฅ ๋ณ€ํ™”๊ฐ€ ์ผ์–ด๋‚˜๋Š”์ง€ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

stump tree๋ž€?

decision tree์—์„œ 1๊ฐœ์˜ node์™€ 2๊ฐœ์˜ leaf๋ฅผ ๊ฐ€์ง€๋Š” ๋ชจ์–‘์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. stump๋Š” 1๊ฐœ์˜ node๋ฅผ ๊ฐ€์ง€๊ธฐ์— ์˜ค์ง„ ํ•˜๋‚˜์˜ ๋ณ€์ˆ˜๋งŒ์„ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์œผ๋ฉฐ weak learner ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

AdaBoost๋Š” ๊ฐ ํŠธ๋ฆฌ๋ณ„ ์ค‘์š”๋„์— ์žˆ์–ด ์ฐจ์ด๊ฐ€ ๋‚œ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋‹จ์˜ ๊ทธ๋ฆผ์„ ์ฐธ๊ณ ํ•˜์—ฌ ๋ณด์ž๋ฉด ๊ฐ stump์˜ ํฌ๊ธฐ๊ฐ€ ๋‹ค๋ฅธ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ , boosting์˜ ํŠน์ง•์— ๋”ฐ๋ผ ์ด์ „ stump์˜ ์ •๋ณด๋ฅผ ์ฐธ๊ณ ํ•˜๋ฉฐ ์ข…์†์ ์ด๊ณ  sequentialํ•˜๊ฒŒ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

[Result]


  • make_classification

ํ•ด๋‹น ๊ฒฐ๊ณผ์—์„œ 1์€ ๋ชจ๋‘ ์ž˜ ๋ถ„๋ฅ˜๊ฐ€ ๋˜์—ˆ์ง€๋งŒ, ์ผ๋ถ€ 0๋ผ๋ฒจ์— ํ•ด๋‹น๋˜๋Š” ๋ฐ์ดํ„ฐ๋Š” ์ž˜ ๋ถ„๋ฆฌ๋˜์ง€ ๋ชปํ•œ ๋ชจ์Šต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
  • Email Spam Classification

  • Hyper-parameter search

n_estimator๊ฐ€ 100์ธ ๊ฒฝ์šฐ๋ฅผ ์ œ์™ธํ•œ 3๊ฐ€์ง€ ๊ฒฝ์šฐ learning rate๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ n_estimator๊ฐ€ 256์ธ ๊ฒฝ์šฐ ๊ฐ€์žฅ ์„ฑ๋Šฅ ๋ณ€ํ™”ํญ์ด ๋‘๋“œ๋Ÿฌ์ง์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. learning_rate๋ฅผ ์ค„์ธ๋‹ค๋ฉด ๊ฐ€์ค‘์น˜ ๊ฐฑ์‹ ์˜ ๋ณ€๋™ํญ์ด ๊ฐ์†Œํ•ด์„œ ์—ฌ๋Ÿฌ ํ•™์Šต๊ธฐ๋“ค์˜ decision boundary ์ฐจ์ด๊ฐ€ ์ค„์–ด๋“ค๋ฉฐ ์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•œ๋‹ค๊ณ  ์ถ”๊ฐ€์ ์ธ ํ•ด์„์„ ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

learning_rate๊ฐ€ 0.5์ธ ๊ฒฝ์šฐ n_estimator๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์™€๋Š” ๋ฐ˜๋Œ€๋กœ ๋‹ค๋ฅธ ๊ฒฝ์šฐ์—๋Š” ์ค‘๊ฐ„ n_estimator์— ํ•ด๋‹น๋  ๋–„ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ๋†’์€ ๋ชจ์Šต์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. n_estimators๋ฅผ ๋Š˜๋ฆฐ๋‹ค๋ฉด ์ƒ์„ฑํ•˜๋Š” weak learner์˜ ์ˆ˜ ์ฆ๊ฐ€ํ•˜๊ณ , ๋ณต์žกํ•œ decision boundary๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋˜๋ฉฐ ๋ชจ๋ธ์ด ๋ณต์žกํ•ด์ง„๋‹ค๋Š” ์ ์„ ๊ณ ๋ คํ•ด๋ณด๋ฉด ์œ„์™€ ๊ฐ™์€ ์„ฑ๋Šฅ ๋ณ€ํ™” ๊ฒฐ๊ณผ์˜ ํ•ด์„์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • Stump tree์˜ ๊ฐฏ์ˆ˜์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”
Stump 1 5 10 100 1000
Accuracy 0.895 0.959 0.971 0.982 0.977

๐ŸŒฒ AdaBoost์˜ ์žฅ๋‹จ์ 


  • ์žฅ์ 
    • overfitting์— ๋น„๊ต์  ๋œ ์ทจ์•ฝํ•จ
    • bias์™€ variance๋ฅผ ์ค„์ด๋Š”๋ฐ ๋„์›€์„ ์คŒ
    • ํ•ด๋‹น ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•ด weak classifier์˜ accuracy๊ฐ€ ํ–ฅ์ƒ๋  ์ˆ˜ ์žˆ์Œ
    • ์‚ฌ์šฉ์ด ๋น„๊ต์  ์‰ฌ์›€
  • ๋‹จ์ 
    • ์–‘์งˆ์˜ ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š”ํ•จ
    • outlier์™€ noise์— ๋ฏผ๊ฐํ•จ
    • XGBoost๋ณด๋‹ค ๋Š๋ฆฐ ์†๋„

๐Ÿ“‚ References


business_analytics_ch4's People

Contributors

sunwookimstar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.