Comments (2)
The geo
metadata in the parquet footers may not be the same for all written geoparquet files, especially the bbox field, this makes the default parquet footer metadata merging process fail with the following exception:
java.lang.RuntimeException: could not merge metadata: key geo has conflicting values: [{"version":"1.0.0","primary_column":"geom","columns":{"geom":{"encoding":"WKB","geometry_types":["Polygon"],"bbox":[1.0,1.0,9998.0,9998.0],"crs":null}}}, {"version":"1.0.0","primary_column":"geom","columns":{"geom":{"encoding":"WKB","geometry_types":["Polygon"],"bbox":[0.0,0.0,10000.0,10000.0],"crs":null}}}]
at org.apache.parquet.hadoop.metadata.StrictKeyValueMetadataMergeStrategy.merge(StrictKeyValueMetadataMergeStrategy.java:36)
at org.apache.parquet.hadoop.metadata.GlobalMetaData.merge(GlobalMetaData.java:106)
at org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:1451)
at org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:1422)
at org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:1383)
at org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:84)
at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:50)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:192)
We have to implement an output committer for GeoParquet to merge geo
metadata properly. If your usecase do not need to read the geo metadata from _common_metadata or _metadata file, we can simply ignore geo metadata when generating such files.
from sedona.
@Kontinuation Thanks. I think it would be totally fine to leave off the geo
metadata in the combined _metadata
and/or _common_metadata
files - as long as it is still present in the individual geoparquet files.
Since GeoParquet doesn't define these single _metadata summary files, I don't think it would be any issue at all - of course in the future it may standardize on a definition but I think for now it'll only be used for row group filtering and the geo metadata is not needed.
from sedona.
Related Issues (20)
- Map RDD and GeometryType (not an instance of type GeometryType()) HOT 2
- When I resample a raster to very width*height raster , Serde.serialize is limited in 2GB HOT 2
- When I use Bicubic alg to resample a raster with no data pixel , the output raster would be filled unexpected data in no data pixel HOT 3
- fieldNames - AttributeError: Not available before 1.0.0 sedona version HOT 4
- Undeclared IPython dependency HOT 1
- GeometryType(geom) triggered an exception HOT 6
- ST_Difference function crashes while working on some geometries HOT 3
- executor logs : WARN factory: Can't load a service for category "Operation"
- Preserve Spatial Partitioning From RDD to Dataframe HOT 2
- issue about function RS_SetBandNoDataValue HOT 1
- ST_Pixelize small polygon error HOT 6
- ST_Pixelize drawing polygon perimeter rather than all pixels in polygon HOT 3
- try 1-N-N performance tuning with LATERAL subquery HOT 2
- ST_SubDivide (Snowflake) fails even on documentation example HOT 8
- create or replace function sedona.* duplication
- Unknown user-defined function SEDONASNOW.SEDONA.ST_DUMP HOT 1
- confuse about ST_DistanceSphere and ST_DistanceSpheroid HOT 1
- sedona-vis for 1.5.1 ? HOT 2
- Suggested edit to documentation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sedona.