Giter VIP home page Giter VIP logo

cba's Introduction

CBA

This is my implementation of algorithm CBA (Classification Based on Associations). It's my final project of data mining course, taught by Prof. Chen Lin. It's written by Python 3.6. You can read the paper Integrating Classification and Association Rule Mining by Bing Liu et al. for details.

What's in datasets?

The datasets are chosen from UCI Machine Learning Repository and processed simply. We download 30 datasets including some classical datasets like iris.

Each dataset has 2 files:

  • *.data contains many instances. Each line represents a sample. The attributes of the instance are divided by comma mark (,) without the space ( ). The last attribute is the class label, both number or literal string are vaild.
  • *.names contains two lines. The first line is title line, describing the name of each attribute. The last word must be class, representing the class label. The second line is the type of each attribute, only numerical and categorical are acceptable. The last word must be label. The words in both lines are all divided by comma mark (,) without space ( ).

Moreover, the names of two files must be the same.

You can open iris.data and iris.names under datasets directory to understand the rules above.

How to run it?

The entry of code is in the file validation.py. You can modify the test_data_path and test_scheme_path at the end of file. All datasets can be found in datasets directory. I provide 4 running modes: CBA-CB M1/M2 with/without pruning. Choose one mode you want to run, keep that line available and comment out the other three lines.

For example, if you want to take iris dataset as test data, just let test_data_path = datasets/iris.data and test_scheme_path = datasets/iris.names. And if you want to test CBA-CB M1 without pruning, you can prefix the last three lines with hash mark (#), like

# just choose one mode to experiment by removing one line comment and running
cross_validate_m1_without_prune(test_data_path, test_scheme_path)
# cross_validate_m1_with_prune(test_data_path, test_scheme_path)
# cross_validate_m2_without_prune(test_data_path, test_scheme_path)
# cross_validate_m1_with_prune(test_data_path, test_scheme_path)

Then you can run the program.

Acknowledgment

Dasong Chen and Lujing Xiao assisted me to complete this project. Thanks for their effort.

Reference

[1] Liu, Bing, W. Hsu, and Y. Ma. "Integrating Classification and Association Rule Mining." Proc of Kdd (1998):80--86.

cba's People

Contributors

liulizhi1996 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.