Comments (10)
this is the schema generated by parquet.js for a list of elements
{
mylist:[{"foo":"abc", "bar":"abc"}, {"foo":"abc", "bar":"abc"} ]
}
message root {
repeated group mylist {
required binary foo (UTF8);
required binary bar (UTF8);
}
}
and expected schema for PrestoDB/Hive is
message root {
required group mylist (LIST){
repeated group list {
required group element {
required binary foo (UTF8);
required binary bar (UTF8);
}
}
}
}
from parquetjs.
Hey @dg3feiko,
have you found a working solution for that problem?
from parquetjs.
@shyim @dg3feiko Did you check out the #67 - might be related
from parquetjs.
I have installed your version like mentioned in the comment with
npm install zjonsson/parquetjs#07fb2fd8fc03bf2b57243531eaf91f2d60f5e460
Generated new files and copied that to the S3 bucket, still problems with the athena query..
from parquetjs.
there is also #43 you could try to install a fork that has all my outstanding PRs here merged to master (including the 43)
npm install zjonsson/parquetjs
from parquetjs.
I can select simple fields in the first tier, but when i select a struct Athena crashes with message: HIVE_CURSOR_ERROR: Can not read value at 0 in block 0 with your latest fork
from parquetjs.
i used 0.8.0 to convert a flat json file to parquet. Verified that im able to write and read it back. Uploaded it to s3 and used glue to create the athena table. Im unable to query the data for some reason though, getting a GENERIC_INTERNAL_ERROR: 0
Anybody else using this converter for athena?
from parquetjs.
I gave this a try recently in AWS with Athena + Presto using the latest from zjonsson/parquetjs
.
Root level primitives worked but nested lists failed:
Expected LIST column column to only have one field, but has x fields
from parquetjs.
I gave this a try recently in AWS with Athena + Presto using the latest from
zjonsson/parquetjs
.Root level primitives worked but nested lists failed:
Expected LIST column column to only have one field, but has x fields
+1
Anyone with a answer?
from parquetjs.
So I encountered the same issue and spend some time getting it to work. Here is a solution that seems to work at least for my case of lists with structs: ZJONSSON#34
Test case from parquetjs to Athena can be found here: https://github.com/ZJONSSON/parquetjs/blob/9cee1592ce41e8dbca088fa2330b48ceb2d1de1a/test/list.js
from parquetjs.
Related Issues (20)
- Unable to mock module using Jest HOT 1
- ParquetTransformer stream doesn't emit error HOT 1
- Cannot write a parquet file having a comma in one of its headers HOT 1
- Parquet file parsing error HOT 2
- Any way to Read parquet records before appending to file HOT 2
- os.close is not a function HOT 3
- parquet write to s3 is not queryable by Athena HOT 4
- Streaming new records into an existing parquet file in S3 HOT 5
- Issue with decodeRunRepeated
- is it a unmaintained Package? HOT 2
- invalid parquet version - parquet-cpp
- How to upload the parquet file to s3? HOT 3
- How to read the latest value?
- Streaming read HOT 1
- How to preview/read head of a compressed parquet file?
- Write multiple rows in bulk
- React native HOT 1
- Cannot use 'in' operator to search error HOT 1
- [NodeJS] RangeError [ERR_OUT_OF_RANGE] when reading a parquet file HOT 4
- invalid parquet version error for parquet files generated via python script HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquetjs.