vivrass / hackreduce Goto Github PK
View Code? Open in Web Editor NEWThis project forked from hackreduce/hackathon
Home Page: http://www.hackreduce.org
This project forked from hackreduce/hackathon
Home Page: http://www.hackreduce.org
HackReduce http://www.hackreduce.org Prerequisites ------------- - Java 1.6+ - Ant - Git Run an example job locally -------------------------- 1) git clone git://github.com/hoppertravel/HackReduce.git - Note: You should periodically run "git pull" from within the project directory to update your code. 2) cd HackReduce 3) ant 4) Try running an example from the list below Examples -------- Run any of the following commands in your CLI, and after the job's completed, check the /tmp/* folder for the output. Bixi: > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.bixi.RecordCounter datasets/bixi /tmp/bixi_recordcounts NASDAQ: > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.stockexchange.HighestDividend datasets/nasdaq/dividends /tmp/nasdaq_dividends > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.stockexchange.MarketCapitalization datasets/nasdaq/daily_prices /tmp/nasdaq_marketcaps > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.stockexchange.RecordCounter datasets/nasdaq/daily_prices /tmp/nasdaq_recordcounts NYSE: > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.stockexchange.HighestDividend datasets/nyse/dividends /tmp/nyse_dividends > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.stockexchange.MarketCapitalization datasets/nyse/daily_prices /tmp/nyse_marketcaps > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.stockexchange.RecordCounter datasets/nyse/daily_prices /tmp/nyse_recordcounts Flights: > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.flights.RecordCounter datasets/flights /tmp/flights_recordcounts Wikipedia: > java -classpath ".:dist/hackreduce-0.1.jar:lib/*" org.hackreduce.examples.wikipedia.RecordCounter datasets/wikipedia /tmp/wikipedia_recordcounts Note: The jobs are made for the specific datasets, so pairing them up properly is important. The second argument (/tmp/*) is just a made up output path for the results of the job, and can be modified to anything you want. Datasets -------- - Bixi (Courtesy of Fabrice) - Flights (Courtesy of Hopper) - NASDAQ daily prices and dividends (http://www.infochimps.com/datasets/daily-1970-2010-open-close-hi-low-and-volume-nasdaq-exchange) - NYSE daily prices and dividends (http://www.infochimps.com/datasets/daily-1970-2010-open-close-hi-low-and-volume-nyse-exchange) - Wikipedia XML (http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia) Take a look at the datasets/ folder to see samples subsets of these datasets. Running on a Hadoop cluster --------------------------- If you'd like to set up an actual Hadoop cluster (single or multi node), then follow the instructions on the Hadoop wiki: http://wiki.apache.org/hadoop/QuickStart If you're running the HackReduce virtual image, or you're trying to run the example job against an actual install of Hadoop, the process should be similar. However, you'll need to upload the datasets folder into HDFS (if you're running it), or just make a note of the path. Example: > bin/hadoop jar <path_to_hackreduce_jar>/hackreduce-0.1.jar org.hackreduce.examples.stockexchange.RecordCounter <path to>/datasets/nyse/daily_prices /tmp/nyse_recordcounts
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.