Giter VIP home page Giter VIP logo

apacheorcdotnet's Introduction

ApacheOrcDotNet

C# Port of the Apache ORC File Format

Build status

apacheorcdotnet's People

Contributors

ddrinka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

apacheorcdotnet's Issues

Support Enum serialization

Serialization of enums is currently not supported.
It should be added and it should be configurable to serialize them either is int or as string.

use for reading .ORC files without knowing there schema

Hi, Thanks for writing lib!
I am trying to understand if I can use this lib to read.orc files.
My service receives many .orc files from costumers, and I wish to read the first n columns and the first k rows from each file without knowing the file schema.
Can I do that with this Lib?

OrcReader?

Thanks for creating this library. It will be very useful for the .net community.

Is there an OrcReader in the works?

Decimal serialization is not working for Null values

Hi again,

I serialized a class with nullable decimal fields and some rows had null as values for these fields. While reading data from the serialized file using Hive, it threw the following error.

Caused by: java.io.EOFException: Read past end of bit field from bit reader current: 0 bits left: 0 bit size: 1 from byte rle literal used: 0/0 from compressed stream Stream for column 8 kind PRESENT position: 0 length: 0 range: 0 offset: 0 limit: 0
at org.apache.orc.impl.BitFieldReader.readByte(BitFieldReader.java:52)
at org.apache.orc.impl.BitFieldReader.next(BitFieldReader.java:63)
at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextVector(TreeReaderFactory.java:180)
at org.apache.orc.impl.TreeReaderFactory$DecimalTreeReader.nextVector(TreeReaderFactory.java:1111)
at org.apache.orc.impl.ConvertTreeReaderFactory$DecimalFromDecimalTreeReader.nextVector(ConvertTreeReaderFactory.java:1432)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1776)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1071)

This even prevents Hive to read data from other rows which does have non-null values.

Error: A code error has led to negative space remaining

I am doing multiple line insertion into a file from BD but sometimes I get an error like this, I would like to know what it means

ArithmeticException: A code error has led to negative space remaining at ApacheOrcDotNet.Compression.OrcCompressedBuffer.Write(Byte[] buffer, Int32 offset, Int32 count) at ApacheOrcDotNet.ColumnTypes.StringWriter.AddBlock(IList1 values) at ApacheOrcDotNet.Stripes.StripeWriter.<>c__DisplayClass33_01.b__1() at ApacheOrcDotNet.Stripes.StripeWriter.CompleteStride() at ApacheOrcDotNet.Stripes.StripeWriter.RowAddingCompleted() at ApacheOrcDotNet.OrcWriter`1.Dispose() at Workers.ArchivingWorker.Jobs.ArchivingEndpointRequestFull.ExecuteAsync(IServiceProvider serviceProvider) in /src/Workers.ArchivingWorker/Jobs/ArchivingEndpointRequestFull.cs:line 187

StringWriter throwing error while serializing null if encoding is DictionaryV2

Hi,

Let me take a moment to thank you for your work, it has helped us a lot.

There is this issue I have found while serializing a string column which has enough duplicates to get encoding selected as DictionaryV2 and has a null value too. StringWriter.WriteDictionaryEncodedData() method throws error in such case on below line since value is null.
var stringValue = sortedDictionary[value.Id];

I tried replacing the above code with
var stringValue = value != null ? sortedDictionary[value.Id] : null;

It did serialize the data, but showed empty string when I query it through Hive.

Thanks again, let me know if I can be of any help in fixing this issue.

Could not load file or assembly protobuf-net

I cloned the repo, and had tried to run either WriterTest or ReaderTest but I got following exception:

Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'protobuf-net, Version=3.0.0.0, Culture=neutral, PublicKeyToken=257b51d87d2e4d67' or one of its dependencies. The system cannot find the file specified.

I am using VS 2019, tried to restore nuget packages, uninstall and install them all, clean/rebuild/build solution, nothing seems to help. I also tried to create new project that references ApacheOrcDotNet base project, just to open ORC file - still the same. I honestly have no idea what might wrong - maybe VS version? Which version you use that works fine?

I've tried to use library on 3 different machines (well actually two physical, but one with two Windows 10 instances).

Log file from Assembly Binding Log Viewer (fuslovw) attached:
fuslogvw_log_protobuf.txt

Make SerializationTypeConfiguration<T>.AddConfiguration public

Please make the method AddConfiguration of the SerializationTypeConfiguration<T> class public.
I need to serialize models based on reflection (basically include all properties that can be serialized, like int, string and so on).
For that I'm getting the properties through reflection and then just add the PropertyInfo instance to AddConfiguration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.