Giter VIP home page Giter VIP logo

kassandramrhelper's Introduction

#KassandraMRHelper

Build Status Coverage Status Maven Central

##Short Summary The KassandraMRHelper library provides necessary Record Readers, InputFormats and Mapper classes to help you with the process of reading data directly from Cassandra SSTables. By using this library you will avoid the sstable2json step. This library does not require a live Cassandra cluster.

KassandraMRHelper is compatible with Cassandra versions 2.x.

Relevant blog post

##Building the Project To build the example you can just run

mvn clean package

However there's three other maven profiles that may be of interest to you. The default one is set for EMR but there's a HadoopMapReduce (for running the example in a regular non EMR Hadoop cluster) and a local option for running it in eclipse or on a local Hadoop cluster.

Choosing a profile will write a property to knewton-site.xml named "com.knewton.mapreduce.environment" and you can then access it from the configuration and using the MREnvironment enum type.

To select a profile you can do:

mvn clean package -P HadoopMapReduce

##Usage You can use com.knewton.mapreduce.cassandra.WriteSampleSSTable in the test source packages to generate a sample SSTable with student events to use as input. To run it from the command line you can use:

java -cp ./KassandraMRHelper-0.1.jar:./KassandraMRHelper-0.1-tests.jar \
	com.knewton.mapreduce.cassandra.WriteSampleSSTable
usage: WriteSampleSSTable [OPTIONS] <output_dir>
 -e,--studentEvents <arg>   The number of student events per student to be
                            generated. Default value is 10
 -h,--help                  Prints this help message.
 -s,--students <arg>        The number of students (rows) to be generated.
                            Default value is 100.

##Things To Watch Out For

  1. There's a property in knewton-site.xml named com.knewton.cassandra.backup.compression. You should set this to true only if you are reading from a backup SSTable location that has extra snappy compression on top of any cassandra compression scheme. If you're using Priam, for example, and have enabled compression in Cassandra then your tables are probably double compressed. You DO NOT need to set this property to true if you are using only the Cassandra compression since the library will auto detect that by the presence of the CompressionInfo.db file.

##Contributing Contributions are always welcome and encouraged!

If you would like to contribute to this project, please contact the current project maintainer, or use the Github pull request feature.

The project maintainer is Giannis Neokleous

You can find the style files here

###Future Work

  1. Add SSTable RecordWriters and OutputFormats. This will directly write the SSTables without the need of having a live cluster.
    • Create cluster partitioners for partitioning keys based on the Cassandra ring topology.

##Author Giannis Neokleous

www.giann.is

License

Licensed under the Apache License 2.0. See the LICENSE file for more details.

kassandramrhelper's People

Contributors

jbooth avatar jordanlewis avatar psastras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kassandramrhelper's Issues

Question: Help debugging CorruptSSTableException

I have attempted to repurpose some of the example code to run on some of my own SSTable files. I have added the Data.db, Index.db, and CompressionInfo.db files into a resources directory and have changed some of the mappers and reducers to work with our own data format. However, when the SSTableRowRecordReader calls the tablescanner.next() method I continue seeing a CorruptSSTableException caused by a CorruptBlockException.. Any insights into why this could be happening? Or hints on how to debug exactly why a corrupt block is being detected?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.