Giter VIP home page Giter VIP logo

ahhaque / echo Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 4.0 10.93 MB

ECHO is a semi-supervised framework for classifying evolving data streams based on our previous approach SAND. The most expensive module of SAND is the change detection module, which has cubic time complexity. ECHO uses dynamic programming to reduce the time complexity. Moreover, ECHO has a maximum allowable sliding window size. If there is no concept drift detected within this limit, ECHO updates the classifiers and resets the sliding window. Experiment results show that ECHO achieves significant speed up over SAND while maintaining similar accuracy. Please refer to the paper (mentioned in the reference section) for further details.

License: GNU General Public License v3.0

Batchfile 0.10% Java 99.90%
datastream classification semi-supervised uncertainty-sampling dynamic-programming classifier-confidence

echo's Introduction

ECHO

Efficient Semi-Supervised Adaptive Classification and Novel Class Detection over Data Stream

Synopsis

ECHO is a semi-supervised framework for classifying evolving data streams based on our previous approach SAND. The most expensive module of SAND is the change detection module, which has cubic time complexity. ECHO uses dynamic programming to reduce the time complexity. Moreover, ECHO has a maximum allowable sliding window size. If there is no concept drift detected within this limit, ECHO updates the classifiers and resets the sliding window. Experiment results show that ECHO achieves significant speed up over SAND while maintaining similar accuracy. Please refer to the paper (mentioned in the reference section) for further details.

Requirements

ECHO requires that-

  • Input file will be provided in a ARFF format.
  • All the features need to be numeric. If there is a non-numeric featues, those can be converted to numeric features using standard techniques.
  • Features should be normalized to get better performance.

Environment

  • Java SDK v1.7+
  • Weka 3.6+
  • Common Math library v2.2
  • Apache Logging Services v1.2.15

All of above except java sdk are included inside SRC_ECHO_v_0_1 & DIST_ECHO_v_0_1 folder.

Execution

To execute the program in a windows operating system:

  1. Open a command prompt inside DIST_ECHO_v_0_1 folder folder.
  2. Run the command ``java -jar ECHO_v_0_1.jar [OPTION(S)]''

Option(s):

  • -F
    • Input file path. Do not include file extension .arff in the file path.

Optional option(s):

  • -S
    • Size of warm-up period chunks. Default is 2000 instances.
  • -L
    • Maximum number of models in the ensemble. Default value is 6.
  • -U
    • Value for confidence threshold. Please refer to the paper for description of confidence threshold. Default value is 0.90.
  • -D
    • 1 = ECHO-D; 0 = ECHO-F. Please refer to the paper for description about ECHO-D, and ECHO-F. Default value is 1.
  • -T
    • Labeling delay in number of instances. Default value for classification only is 1. Use appropriate value for novel class detection.
  • -C
    • Classification delay in number of instances. Default value for classification only is 0. Use appropriate value for novel class detection.
  • -W
    • Maximum allowable window size. Default value is 3000.
  • -A
    • Sensitivity (denoted by alpha). Default value is 0.001.
  • -G
    • Value of gamma, which is used to calculate the cushion period. Default value is 0.5.
  • -R
    • Relaxation parameter. It is used in the change detection procedure. Default value is same as the value of Sensitivity.

Output

Console output

  • The program shows progress or any change point detected in console.
  • At the end, it reports percentage of labeled data used.

File output

  1. .log file contains important debug information.
  2. .tmpres file contains the error rates for each chunk. There are six columns as follows:
  • Chunk #= The current chunk number. Each chunk contains 1000 instances.
  • FP= How many existing class instances misclassified as novel class in this chunk.
  • FN= How many novel class instances misclassified as existing class in this chunk.
  • NC= How many novel class instances are actually there in this chunk.
  • Err = How many instances are misclassified (including FP and FN) in this chunk.
  • GlobErr = % Err (cumulative) upto the current chunk.
  1. .res file contains the summary result, i.e., the following error rates:
  • FP% = % of existing class instances misclassified as novel
  • FN% = % of novel class instances misclassified as existing class instances.
  • NC (total) = total number of (actual) novel class instances.
  • ERR% = % classification error (including FP, FN, and misclassification within existing class).

Reference

Efficient Handling of Concept Drift and Concept Evolution over Stream Data

echo's People

Contributors

ahhaque avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.