Comments (15)
+1 on adding such an optional field.
We briefly discussed a potential optional "geometry_type"
field in the initial version (geoarrow/geoarrow#2), but in the end didn't add it in the first version and left it as a follow-up, as we didn't have a direct need for it (geopandas itself doesn't track this, or doesn't have any optimization depending on knowing that). But I agree it can be useful information in general.
from geoparquet.
would it be strictly required to use the Z suffix if you have 3D data? (for example, specifying
type="Point"
would be invalid metadata if you actually have all 3D points?)
I'd say yes. For example PostGIS implements very strict dimensionality checks, and if using ogr2ogr to ingest a Parquet file into PostGIS, we could get into issues if the Parquet geometry type is not correctly declared.
example:
$ psql -d autotest
psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1))
Type "help" for help.
autotest=# create table test(my_geom geometry(POINTZ));
CREATE TABLE
autotest=# insert into test values ( ST_GeomFromText('POINT(1 2 3)') );
INSERT 0 1
autotest=# insert into test values ( ST_GeomFromText('POINT(1 2)') );
ERROR: Column has Z dimension but geometry does not
autotest=# drop table test;
DROP TABLE
autotest=# create table test(my_geom geometry(POINT));
CREATE TABLE
autotest=# insert into test values ( ST_GeomFromText('POINT(1 2 3)') );
ERROR: Geometry has Z dimension but column does not
autotest=# insert into test values ( ST_GeomFromText('POINT(1 2)') );
INSERT 0 1
and you would have mixed 2D and 3D points, would that need to be specified as "mixed"? (or actually you could also use ["Point", "Point Z"] I suppose, if we allow an array of types)
yes
from geoparquet.
Seems like a good idea to me. I could even see this being a required field - ensure people at least try to fill it out instead of giving an option to be lazy and leave it blank even though clients would like to know that the data is all one type.
So I lean towards options two or three. And option 3 does seem like it'd be useful for the shapefile use case.
Can you explain how GeometryCollection is different than 'mixed'? I have a vague idea but would be interested to hear how it works.
from geoparquet.
Can you explain how GeometryCollection is different than 'mixed'
'mixed' would mean you have for example record 0 is a Polygon and record 1 is a MultiPolygon
'GeometryCollection' would mean record 0 is a GEOMETRYCOLLECTION(POINT ..., LINESTRING ...) and record 1 is a GEOMETRYCOLLECTION(LINESTRING ...., POLYGON ....)
from geoparquet.
Values for type could be the ones accepted by GeoJSON
Following the "type" values from GeoJSON sounds good.
Personally I would be hesitant with allowing additional ones (Curve, Surface, etc), even if we allow ISO WKB, given that (I think?) those are not that widely supported. But it's of course also easy to check this value and raise a nice error message if you are a reader that doesn't support those extended types.
Also commented about this on #18
and with Z, M or ZM suffixes for other dimensionalities
For clarification: would it be strictly required to use the Z suffix if you have 3D data? (for example, specifying type="Point"
would be invalid metadata if you actually have all 3D points?)
And if so, and you would have mixed 2D and 3D points, would that need to be specified as "mixed"? (or actually you could also use ["Point", "Point Z"] I suppose, if we allow an array of types)
from geoparquet.
For information, the initial OGR Parquet driver writes a provisional "gdal:geometry_type" property in the "geo" metadata for now (on reading it will try both "geometry_type" and "gdal:geometry_type"). For now, it only writes for the 2D case: "Point", "LineString", "Polygon", "MultiPoint", "MultiLineString", "MultiPolygon", "GeometryCollection" or "mixed" (the later matches OGR wkbUnknown status). For the Z / M cases, do we want to append the dimensionality just after, or with a separator space: ie "PointZ" or "Point Z" ?
from geoparquet.
or "mixed" (the later matches OGR wkbUnknown status)
Do we want to distinguish "mixed" from "unknown"? (or not set)
I suppose that in the case of "mixed", GDAL would not do the effort of computing the geometry type, while if it is "unknown" it would still do that (by default / depending on the env flag)?
For the Z / M cases, do we cant to append the dimensionality just after, or with a separator space: ie "PointZ" or "Point Z" ?
I would maybe go with a space. Or are you aware of any projects that we could follow here?
Checking with GDAL, it seems that ogrinfo
uses "3D Point" instead.
from geoparquet.
Do we want to distinguish "mixed" from "unknown"? (or not set)
I suppose that in the case of "mixed", GDAL would not do the effort of computing the geometry type, while if it is "unknown" it would still do that (by default / depending on the env flag)?
The wkbUnknown state is typically used by GDAL to report geometry columns that may contain mixed geometry types, e.g a GEOMETRY PostGIS column. The OGR PostGIS driver will not try to check if there's a mix of geometry types in the table content or if it is uniform, because that could be too lengthy.
So for GeoParquet, if a "geometry_type" attribute is there and says "unknown" (or "mixed" if that would be what would be used), the OGR driver wouldn't make any effort to guess more. It only does that in the current state of the spec since there's no official way to report a geometry type.
Maybe "mixed" is not so useful. If you know the composition of the mix, then use "geometry_type": [ "Polygon", "MultiPolygon" ], and if you don't (or want a generic schema that can apply to other files), use "geometry_type": "unknown" ?
I would maybe go with a space
would be fine for me. With a space, this is consistent with the beginning of a ISO WKT (if ignoring case), e.g "POINT Z (1 2 3)"
Or are you aware of any projects that we could follow here?
No, can't think of other references.
Checking with GDAL, it seems that ogrinfo uses "3D Point" instead.
This is a human facing representation (and that predates ISO WKT/WKB support). I wouldn't use that for JSON encoding.
from geoparquet.
Maybe "mixed" is not so useful. If you know the composition of the mix, then use "geometry_type": [ "Polygon", "MultiPolygon" ], and if you don't (or want a generic schema that can apply to other files), use "geometry_type": "unknown" ?
Sounds good. It's still an optional field, so you also have the option of not including it to signal that it is unknown .. ?
from geoparquet.
It's still an optional field, so you also have the option of not including it to signal that it is unknown .. ?
yes, probably for a file that declares a version of the spec that supports geometry_type, the absence of geometry_type should be understood as the same as a "geometry_type": "unknown", and would not trigger file content analysis.
from geoparquet.
'unknown' makes sense to me.
I do think it'd be nice to have some nudge on implementors to at least try to add it if possible, since it is clearly useful for a set of tools. Perhaps make it required, but allow 'unknown', so that implementors need to at least think about if they can provide something useful. Versus thinking 'oh, it's an optional field, I don't need to worry about it'.
Looks like Flatgeobuf uses 'Unknown' for similar use case: https://github.com/flatgeobuf/flatgeobuf/blob/6884cb05438686947f7bf3776c7ae5c9febfbf5a/src/fbs/header.fbs#L70
And I think with their header stuff the field is required?
from geoparquet.
I just opened a draft PR adding some initial text: #51
Perhaps make it required, but allow 'unknown', so that implementors need to at least think about if they can provide something useful.
It's indeed maybe good to make a choice between either keeping it optional (and leaving out "Unknown") or making it required but allow "Unknown". Because otherwise we just give people two ways to say that it is not known.
(on the other hand, making it optional is more backwards compatible, so even if we make it "required" I would hope that implementations are a bit flexible)
from geoparquet.
It's indeed maybe good to make a choice between either keeping it optional (and leaving out "Unknown") or making it required but allow "Unknown". Because otherwise we just give people two ways to say that it is not known.
Ah right, yeah - two ways to say it's not known seems less good. I continue to lean towards explicitly saying Unknown. And if implementations choose to support 0.1 then they obviously need to have that flexibility. Right now it's important that implementations have that flexibility, but I'm hoping that when we go 1.0 we'll have a lot more implementations, and that for those it'll be reasonable to just say 'I only support version 1.0' (while there will also be lots of great libraries that support all the early versions).
Note I will shift at some point relatively soon to caring a lot about backward compatibility and do appreciate your push on it @jorisvandenbossche. But I feel in the early 0.1 / 0.2 / 0.3 times we should give ourselves some flexibility to change as we discuss and get wider feedback.
from geoparquet.
IMO, if the field is optional there is no need for the "Unknown" geometry type. Just not including the field should be enough. Otherwise, we need to explain what "Unknown" means, and the users of the spec should parse the "Unknown" type.
from geoparquet.
Yes, @Jesus89, I think we're all on the same page.
@jorisvandenbossche comment:
It's indeed maybe good to make a choice between either keeping it optional (and leaving out "Unknown") or making it required but allow "Unknown". Because otherwise we just give people two ways to say that it is not known.
And @cholmes commit suggestion: #51 (review)
from geoparquet.
Related Issues (20)
- Schema version invalid HOT 11
- Simplify or remove script dependencies HOT 3
- PROJJSON schema version HOT 4
- Metadata encoding options for GeoArrow-encoded columns in GeoParquet metadata HOT 2
- Is it possible to define a transform alongside a CRS, similar to geotiff? HOT 3
- Recommendation on the Arrow specific type for the WKB geometry column ? HOT 5
- Antimeridian Crossings and bbox HOT 5
- Update example files for 1.1 HOT 4
- The releases on the repository can be misleading regarding the status of GeoParquet as an OGC Standard HOT 1
- Clarify projection of bounding box columns HOT 4
- Mixed concerns: Encoding + Geometry Type HOT 14
- Covering Schema
- Clarify recommended file extension HOT 9
- List of Submitting Organisations HOT 3
- Enforce pull requests and approvals for all repository updates HOT 4
- Require status checks to pass before merging HOT 4
- Synchronise requirements in the metanorma asciidoc files with those in the gpq validator HOT 1
- add support wkt or wkt2 formats for crs HOT 26
- Thoughts about a first-class GEOMETRY data type in Parquet? HOT 17
- Start a 'best practices' document
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geoparquet.