gsm1011 / data-mining-algorithms Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/data-mining-algorithms
Automatically exported from code.google.com/p/data-mining-algorithms
- HOWTO Compile and Run this program. Compile this project by typing "make" in the current working dir. Run the program by typing: ./AssoRuleMiner p2eqbindata.txt 0.8 0.3 150 3 Which means: ./AssoRuleMiner datafile minSup minConf g k datafile is provided in current dir, minSup should be within (0,1], minConf should also be within (0,1], gene is the total number of genes (columns) to process, and k is the number of top association rules to print out by standard of sup*conf. WARNING: Please don't run this program with very low minSup, it might consume large resources and even crash your system in the worst case. - Format of p2ItemMap.txt. This file contains mapping of orgiginal discretized data to unique ids. Each row represents a transaction of the original data. Each column is seperated by ",", and each item is composed of the original data and the mapped unique id, which are seperated by space. Please see sample below: c 0,b 4,b 8,b 12,n 16 c 0,c 5,b 8,b 12,p 17 a 1,c 5,c 9,b 12,n 16 b 2,c 5,c 9,b 12,p 17 a 1,b 4,b 8,a 13,n 16 - Format of p2FreqItemsets.txt. This file contains all the frequent itemsets generated by the APRIORI algorithm. Each line is an frequent item set, with the format freqset:support. The output is the level order traversal of the hash tree. Example content of this file is as follows: 4,8:0.33871 4,13:0.306452 4,8,16:0.209677 6,10,13:0.209677 - Format of the top k Association rule output. Top k association rules are selected according to sup*conf. It will be printed at the end of the program execution. The format of these Association Rules are: [anteset]-->[conset] [supxy] [supx] [conf] [supxy*conf] Below are some examples of such output: 222-->232 0.903226 0.903226 1 0.903226 122-->222,232 0.887097 0.887097 1 0.887097 122,222-->232 0.887097 0.887097 1 0.887097 122,228-->222,230 0.870968 0.870968 1 0.870968 76,228,230-->232 0.870968 0.870968 1 0.870968 - Files. README - This file. Makefile - The project orgnization file. defs.h - File used to define commonly used MACROS and / or functions such as hash function, itoa function used for transforming integer to string. Item.h - Definition and implementation of the Item class. Item class is the representation of the item class of the gene data, and it is also the element that combined to form itemsets. Itemset.h, Itemset.cpp - Definition and implementation of the Itemset class. Itemset class is the representation of the Frequent itemset that we need to generate with the APRIORI algorithm. Generally, it is a composition of a group of items. A join method is provided for this class. Still, an association rule generation method is provided to generated association rules from frequent itemsets. HashTree.h, HashTree.cpp (deprecated) - Definition and implementation of the HashNode and HashTree class. HashNode class is container of frequent itemsets, which are generated by joining, scaning and pruning. The frequent itemsets are stored as hash map within the HashNode class. And the HashTree class iteratively produces according to different level (length) of frequent itemsets. HashNode are orgnized into HashTree and map data structure is used to facillatate the search of a specific node and itemset within a node. DataSet.h, DataSet.cpp - Definition and implementation of the DataSet class. This class is responsible for loading data from file, doing Item mapping from discretized data to unique integer IDs, doing APRIORI algorithm over the mapped gene data sets, and finally save all the results to files or print out to screen. AssoRule.h, AssoRule.cpp - Definition and implementation of the AssoRule class. This class represents association rules we are supposed to generate from the frequent itemsets. The format the output is: [anteset]-->[conset] [supxy] [supx] [conf] [supxy*conf]. - Abbreviations of HashTree output (deprecated): To output the content of hashtree(level order tree traversal), you need to open a switch in Makefile, which is "MACROS += -DDEBUG_APRIORI_TRAVERSAL". CN - Create Node. -> - Parent of node. : - Seperator. II - Insert Itemset. NN - New Node. ND - NoDe. VN - Visit Node. NC - Number of Children. NFI - Number of Frequent Itemsets. FIS - Frequent ItemSets. - Documentation. You can use doxygen to generate API reference for this project. You need to install doxygen and dot in order to generate the document. To generate document, use "doxygen Doxygen". - About Debugging of project. In this project, I used a lot of conditional compilation macros for the purpose of debugging. You can open a debugging by removing the "#" in Makefile for a specific feature. And I hope it will be useful. - Proof of Correctness. This APRIORI program has been verified by setting the support to the smallest value close to 0 (with small input) so that the hash tree will generate all the transactions within the dataset. BUT, please be alert that, don't use large amount of data, as it will consume all the memory and even halt your system. - Copyright Notice. This is free software, so you can change and redistribute it. But please keep the headlines in the file when doing so or contact with through [email protected]. Or you can check out online at: svn co http://fall-2010.googlecode.com/svn/fall-2010/data_mining/proj2 proj2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.