Giter VIP home page Giter VIP logo

qboost's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qboost's Issues

Error from pip's dependency resolver

When running this example with Ocean 4 in Leap IDE, see the following message:

Installing collected packages: tabulate, scikit-learn, matplotlib
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imbalanced-learn 0.8.1 requires scikit-learn>=0.24, but you have scikit-learn 0.23.1 which is incompatible.
Successfully installed matplotlib-3.3.4 scikit-learn-0.23.1 tabulate-0.8.9

Inaccurate labeling of WeakClassifiers/DecisionTree methods

In qboost.py, the WeakClassifiers class is misleading because its fit and predict methods actually implement the AdaBoost algorithm. In demo.py, this method is then compared with AdaBoost from sklearn. In the demo.py output, "Adaboost" refers to the sklearn implementation and "Decision tree" refers to the Adaboost implementation from the WeakClassifiers class. As far as I can tell, the only real difference between the two is that the demo runs the WeakClassifiers AdaBoost method with a tree depth of 3, whereas the call to sklearn's AdaBoost model uses the default, which is a tree depth of 1.

Need to review the best way to address this. Some points to consider:

  • The screen output from demo.py should be updated to use a more accurate description than "Decision tree"
  • Perhaps remove one of the two AdaBoost implementations from demo.py
  • Or, if both are kept, consider using the same tree depth for consistency
  • Use a better class name than WeakClassifiers
  • Update comments/docstrings to describe what is actually being done by the different methods

prediction_proba method for classifiers

Application

It is not obvious how to calculate the ROC_AUC and PR_AUC curves from classifiers WeakClassifiers, QBoostClassifier, and QboostPlus.

Proposed Solution

We could implement a prediction_proba method for each classifier analogous to the prediction_proba method from scikit learn.

Current QBoost implementation is just a diminished AdaBoost

As applied to the given demonstration problems, the QBoost algorithm does essentially nothing. AdaBoost is actually run first to pre-select a set of weak classifiers, which are then provided to the QBoost algorithm. With the current settings and demonstration problems, the results are simply that QBoost includes all of the classifiers that AdaBoost provides to it (confirm by seeing the list of "1" weights in the output for QBoost). In other words, for these problems, QBoost could be replaced by the following anti-algorithm:

  1. Run AdaBoost
  2. Change all of the weights that AdaBoost came up with to 1

Note that the screen output is misleading in terms of the method comparisons. For example, runs with the "mnist" data set often show that "QBoost" is performing better than Adaboost. This is misleading because what is labeled Adaboost is actually sklearn Adaboost running with weak decision tree classifiers at depth 1, whereas QBoost is pulling from a custom Adaboost implementation that uses weak decision tree classifiers with depth 3 (confusingly, this custom Adaboost implementation is labeled "DecisionTree" in the output; see Issue #13).

The key questions are what weak classifiers should be considered in the QBoost algorithm, and what actually is the definition of the QBoost algorithm? The README refers to the earlier 2008 paper (https://arxiv.org/pdf/0811.0416.pdf). Aside from not actually using the QBoost terminology, this paper presents the algorithm as drawing from all possible "degree 1 and 2 decision stumps" (basically decision rules using either a single variable or a product of two variables). As described, this produces a large number of variables if doing a one-shot global optimization: 930 variables for the 30-feature case and many more for the 784-feature case. Compare these numbers to the 35 variables being currently used because QBoost is being fed weak classifiers pre-selected by AdaBoost.

A different version of the method is described in the more recent 2012 paper (http://proceedings.mlr.press/v25/neven12/neven12.pdf), which introduces the name QBoost. This paper reduces the problem size by using "inner" and "outer" loops that pre-select the weak classifiers as detailed by Algorithms 1 and 2. Note that neither paper discusses using AdaBoost to pre-select the weak classifiers, and if one is going to do that, it is unclear what the motivation for QBoost is (the 2008 paper contrasts QBoost as a "global optimization" vs the "greedy" AdaBoost method, but this is nullified if we simply use AdaBoost first).

In conclusion, my suggestions are the following:

  • The code should be reworked to use an implementation of QBoost that does not simply re-use the classifiers selected by AdaBoost. Probably the implementation should draw from the 2012 paper, otherwise it is unclear how to select a set of weak classifiers and achieve a reasonable problem size.
  • Any comparisons against other boosting methods such as AdaBoost should use the same pool of weak classifiers (as done in the papers mentioned above). Otherwise, it is not an apples-to-apples comparison.

Both the 2008 and 2012 papers show improved performance relative to AdaBoost when using the same pool of weak classifiers. It would be nice to illustrate that through this demo as well.

D-Wave results are being used as weight coefficients and are all are 1's

What is happening?

TL;DR - This codebase demonstrates no value in representing D-Wave Quantum for AI application advancements.

Running the examples and exploring the output weights. What is really happening?

It appears that nothing logically substantial is happening from D-Wave.

After running the examples, the D-Wave output weights are 1's [1 1 1...]. That means nothing is really happening here. There's no value in this. Repeated executions result in varied outcomes ( due to RNG initializations ). However the D-wave final weights remain the same. This provides no additive value to the classic model for training AI.

The readme.md file states that:

This code demonstrates the use of the D-Wave system to solve a binary classification problem using the Qboost algorithm.

However there's no clear alignment with the results of the program output.

python demo.py --wisc
python demo.py --mnist

Possible Considerations

There does not appear to be a mistake here. However the value of D-wave is unclear. We can consider the following:

  1. This demo.py example isn't worthwhile and can't demonstrate D-wave/Quantum's value in AI/ML advancements. Would another example work?
  2. The code isn't demonstrating anything useful. The training models are 100% trained classically and the data exchanged with D-Wave returns 1's [1 1 1...]. This represents no value in the model.
  3. Is it possible that the intent here is to demonstrate that the D-Wave QPU is able to verify the classically trained model. If that is the case, then what value does this example bring to ML/AL advancements?

Output Results Screenshot

image

Update to work with latest scikit-learn (0.22.1)

When run with the latest version (scikit-learn==0.22.1), the demo fails with:

$ python demo.py --mnist

======================================
Train#: 3333, Test: 1667
Num weak classifiers: 35
Tree depth: 3
Traceback (most recent call last):
  File "demo.py", line 196, in <module>
    clfs = train_model(X_train, y_train, X_test, y_test, 1.0)
  File "demo.py", line 80, in train_model
    X_train = centerer.fit_transform(X_train)
  File "/usr/local/lib/python3.7/site-packages/sklearn/base.py", line 571, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/usr/local/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 2033, in fit
    .format(K.shape[0], K.shape[1]))
ValueError: Kernel matrix must be a square matrix. Input is a 3333x784 matrix.

Preprocessing should be removed

There are several issues with the current preprocessing that is done in demo.py. Currently there are calls to both the sklearn StandardScaler and the Normalizer, although as currently written only Normalizer is used (it overwrites the preceeding calls to StandardScaler). However, Normalizer is not appropriate in this context as it operates by row, re-scaling each individual sample separately, independently of all of the others. Generally speaking, the StandardScaler would be more appropriate in this context (it scales by column (feature)), but in demo.py all of the code is ultimately using decision tree classifiers, so re-scaling the features would have no effect on the results. (Incidentally, the current usage of StandardScaler in the code, even though it is overwritten by the normalizer, is not correct: the test data should be transformed via scaler.transform, not scaler.fit_transform, which is re-computing the transformation based on the test data).

The main takeaway is that both the normalizer and standard scaler preprocessing should be removed from demo.py, leaving a comment that preprocessing is not necessary because all of the weak classifiers are based on decision trees.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.