Giter VIP home page Giter VIP logo

sequencefile's Introduction

Sequencefile

Go Reference

This is a native Go implementation of Hadoop's SequenceFile format.

Usage

sf, err := sequencefile.Open("foo.sequencefile")
if err != nil {
  log.Fatal(err)
}

// Iterate through the file.
for sf.Scan() {
  // Do something with sf.Key() and sf.Value()
}

if sf.Err() != nil {
  log.Fatal(err)
}

Reading files written by Hadoop

Hadoop adds another layer of serialization for individual keys and values, depending on the class used, like BytesWritable. By default, this library will return the raw key and value bytes, still serialized. You can use the following methods to unwrap them:

func BytesWritable(b []byte) []byte
func Text(b []byte) string
func IntWritable(b []byte) int32
func LongWritable(b []byte) int64

sequencefile's People

Contributors

colinmarc avatar dvrkps avatar kitchen avatar morzaria-stripe avatar praboud-stripe avatar vasi-stripe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sequencefile's Issues

License for this project?

Hi,
the source tree doesn't contain a LICENSE or COPYING file, and none of the files have a copyright header. Who owns the copyright on this code and under what license is it being published?

Thanks :)

reader chokes on uncompressed value when compression set?

While working on compression for the writer, I wrote a test which, when green, would write out a sequence file with RecordCompression. Of course, I hadn't actually implemented that, so I was expecting a red test. But I got this instead:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0xe93d2]

goroutine 19 [running]:
panic(0x25aba0, 0xc4200140f0)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
testing.tRunner.func1(0xc420096180)
        /usr/local/go/src/testing/testing.go:579 +0x25d
panic(0x25aba0, 0xc4200140f0)
        /usr/local/go/src/runtime/panic.go:458 +0x243
compress/gzip.(*Reader).Close(0x0, 0x245e80, 0xc420073848)
        /usr/local/go/src/compress/gzip/gunzip.go:287 +0x22
github.com/colinmarc/sequencefile.(*Reader).close(0xc4200b0b40, 0x3b0ac0, 0xc420072030)
        /Users/jeremy/golang/src/github.com/colinmarc/sequencefile/reader.go:260 +0x84
github.com/colinmarc/sequencefile.(*Reader).scanRecord(0xc4200b0b40, 0xc420036d50)
        /Users/jeremy/golang/src/github.com/colinmarc/sequencefile/reader.go:167 +0x44a
github.com/colinmarc/sequencefile.(*Reader).Scan(0xc4200b0b40, 0xc420096180)
        /Users/jeremy/golang/src/github.com/colinmarc/sequencefile/reader.go:83 +0x54
github.com/colinmarc/sequencefile.TestWriteRecordCompressed(0xc420096180)
        /Users/jeremy/golang/src/github.com/colinmarc/sequencefile/writer_test.go:94 +0xb60
testing.tRunner(0xc420096180, 0x2c5c38)
        /usr/local/go/src/testing/testing.go:610 +0x81
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:646 +0x2ec
exit status 2
FAIL    github.com/colinmarc/sequencefile       0.015s

I would like to work up a proper test case for this to reproduce, but it seems to me that gzip is panicking because the "compressed data" is invalid, and that's bubbling all the way back up. If this is expected behavior from gzip when we pass malformed compressed data to it, we should capture that. If not, we should probably figure out what went wrong here :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.