C# Port of the Apache ORC File Format
ddrinka / apacheorcdotnet Goto Github PK
View Code? Open in Web Editor NEWC# Port of the Apache ORC File Format
License: MIT License
C# Port of the Apache ORC File Format
License: MIT License
Serialization of enum
s is currently not supported.
It should be added and it should be configurable to serialize them either is int
or as string
.
Hi, Thanks for writing lib!
I am trying to understand if I can use this lib to read.orc files.
My service receives many .orc files from costumers, and I wish to read the first n columns and the first k rows from each file without knowing the file schema.
Can I do that with this Lib?
Thanks for creating this library. It will be very useful for the .net community.
Is there an OrcReader in the works?
Hi again,
I serialized a class with nullable decimal fields and some rows had null as values for these fields. While reading data from the serialized file using Hive, it threw the following error.
Caused by: java.io.EOFException: Read past end of bit field from bit reader current: 0 bits left: 0 bit size: 1 from byte rle literal used: 0/0 from compressed stream Stream for column 8 kind PRESENT position: 0 length: 0 range: 0 offset: 0 limit: 0
at org.apache.orc.impl.BitFieldReader.readByte(BitFieldReader.java:52)
at org.apache.orc.impl.BitFieldReader.next(BitFieldReader.java:63)
at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextVector(TreeReaderFactory.java:180)
at org.apache.orc.impl.TreeReaderFactory$DecimalTreeReader.nextVector(TreeReaderFactory.java:1111)
at org.apache.orc.impl.ConvertTreeReaderFactory$DecimalFromDecimalTreeReader.nextVector(ConvertTreeReaderFactory.java:1432)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1776)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1071)
This even prevents Hive to read data from other rows which does have non-null values.
I am doing multiple line insertion into a file from BD but sometimes I get an error like this, I would like to know what it means
ArithmeticException: A code error has led to negative space remaining at ApacheOrcDotNet.Compression.OrcCompressedBuffer.Write(Byte[] buffer, Int32 offset, Int32 count) at ApacheOrcDotNet.ColumnTypes.StringWriter.AddBlock(IList1 values) at ApacheOrcDotNet.Stripes.StripeWriter.<>c__DisplayClass33_0
1.b__1() at ApacheOrcDotNet.Stripes.StripeWriter.CompleteStride() at ApacheOrcDotNet.Stripes.StripeWriter.RowAddingCompleted() at ApacheOrcDotNet.OrcWriter`1.Dispose() at Workers.ArchivingWorker.Jobs.ArchivingEndpointRequestFull.ExecuteAsync(IServiceProvider serviceProvider) in /src/Workers.ArchivingWorker/Jobs/ArchivingEndpointRequestFull.cs:line 187
Hi,
Let me take a moment to thank you for your work, it has helped us a lot.
There is this issue I have found while serializing a string column which has enough duplicates to get encoding selected as DictionaryV2 and has a null value too. StringWriter.WriteDictionaryEncodedData() method throws error in such case on below line since value is null.
var stringValue = sortedDictionary[value.Id];
I tried replacing the above code with
var stringValue = value != null ? sortedDictionary[value.Id] : null;
It did serialize the data, but showed empty string when I query it through Hive.
Thanks again, let me know if I can be of any help in fixing this issue.
Hello, is there any official NuGet package?
I've found this one: https://www.nuget.org/packages/ApacheOrcDotNet
Is it the official package for this repo?
If so, could you please update NuGet properties to include license and link to this repo as source site?
I cloned the repo, and had tried to run either WriterTest or ReaderTest but I got following exception:
Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'protobuf-net, Version=3.0.0.0, Culture=neutral, PublicKeyToken=257b51d87d2e4d67' or one of its dependencies. The system cannot find the file specified.
I am using VS 2019, tried to restore nuget packages, uninstall and install them all, clean/rebuild/build solution, nothing seems to help. I also tried to create new project that references ApacheOrcDotNet base project, just to open ORC file - still the same. I honestly have no idea what might wrong - maybe VS version? Which version you use that works fine?
I've tried to use library on 3 different machines (well actually two physical, but one with two Windows 10 instances).
Log file from Assembly Binding Log Viewer (fuslovw) attached:
fuslogvw_log_protobuf.txt
Please make the method AddConfiguration
of the SerializationTypeConfiguration<T>
class public.
I need to serialize models based on reflection (basically include all properties that can be serialized, like int
, string
and so on).
For that I'm getting the properties through reflection and then just add the PropertyInfo
instance to AddConfiguration
.
As an addition to ZLib, please also add Snappy compression.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.