Giter VIP home page Giter VIP logo

adaboost's Introduction

A Short Introduction to Boosting implementation -- AdaBoost

This is an implementation of the research paper "A Short Introduction to Boosting" written by Yoav Freund and Robert E. Schapire.

Inspiration

Machine Learning algorithms specially those concerning classification and regression can perform weakly while encountering huge datasets. In order to overcome such inconveniences, a number of optimisation algorithms were developed that could improve a model's performance significantly. Adaboost is one such boosting technique which we have implemented here to analyse the improvement in the performance of our classification model.


Introduction

Boosting refers to a general and provably effective method of producing a very accurate prediction rule by combining rough and moderately inaccurate rules of thumb. The AdaBoost algorithm was introduced in 1995 by Freund and Schapire which solved many of the practical difficulties of the earlier boosting algorithms. The algorithm takes as input a training set (x1,y1),...., (xm,ym) where each xi belongs to some domain or instance space, and each label yi is in some label set assuming Y = {-1, 1}. AdaBoost calls a given weak or base learning algorithm repeatedly in a series of rounds t = 1,..., T whose job is to find a weak hypothesis ht and outputs a final hypothesis H which is a weighted majority vote of the T weak hypotheses.


Dataset

The Dataset used here is a Toy Dataset generated by utilising make_gaussian_quantiles from sklearn.datasets module. This generated our input variable X in a two dimensional plane equivalently explained to have two features and our target variable y which took either -1 or +1.


Model Components

Our model architecture consists of the following components :-

  • The weak learner was decided to be a Decision Tree Classifier with two leaf nodes.
  • The output obtained from the weak learners were combined into a weighted sum that represented the final boosted output.

Adaboost Algorithm

Implementation Details

  • The custom defined datagenerate module is used to generate the input and the target variables.
  • The generated data is plotted in a two dimensional plane through another custom defined module named plot, for visualizing the input.
  • The weak learners are defined as Decision Tree Classifiers with two leaf nodes and are used to make predictions for fifty iterations in our case.
  • The weighted sum of all the weak learners are computed as the final boosting output to study the performance enhancement.
  • Lastly, the decision boundaries are visualized using the visualize module for every iteration.

Results

  • After 1st iteration -

  • After 50 iterations -

  • Training error of Weak Hypothesis vs Training error of Final Hypothesis -


Requirements

scikit-learn==0.24.1
numpy==1.19.2
matplotlib==3.3.4
typing==3.7.4.3


To use repo and obtain the graphs please follow the steps mentioned below

  • Setting up the Python Environment with dependencies:

      pip install -r requirements.txt
    
  • Cloning the Repository:

      git clone https://github.com/srijarkoroy/adaboost
    
  • Entering the directory:

      cd adaboost
    
  • Running the file:

      python3 test.py
    

Contributors

adaboost's People

Contributors

srijarkoroy avatar indiradutta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.