Giter VIP home page Giter VIP logo

Comments (15)

jorisvandenbossche avatar jorisvandenbossche commented on June 4, 2024 1

+1 on adding such an optional field.

We briefly discussed a potential optional "geometry_type" field in the initial version (geoarrow/geoarrow#2), but in the end didn't add it in the first version and left it as a follow-up, as we didn't have a direct need for it (geopandas itself doesn't track this, or doesn't have any optimization depending on knowing that). But I agree it can be useful information in general.

from geoparquet.

rouault avatar rouault commented on June 4, 2024 1

would it be strictly required to use the Z suffix if you have 3D data? (for example, specifying type="Point" would be invalid metadata if you actually have all 3D points?)

I'd say yes. For example PostGIS implements very strict dimensionality checks, and if using ogr2ogr to ingest a Parquet file into PostGIS, we could get into issues if the Parquet geometry type is not correctly declared.

example:

$ psql -d autotest
psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1))
Type "help" for help.

autotest=# create table test(my_geom geometry(POINTZ));
CREATE TABLE
autotest=# insert into test values ( ST_GeomFromText('POINT(1 2 3)') );
INSERT 0 1
autotest=# insert into test values ( ST_GeomFromText('POINT(1 2)') );
ERROR:  Column has Z dimension but geometry does not

autotest=# drop table test;
DROP TABLE
autotest=# create table test(my_geom geometry(POINT));
CREATE TABLE
autotest=# insert into test values ( ST_GeomFromText('POINT(1 2 3)') );
ERROR:  Geometry has Z dimension but column does not
autotest=# insert into test values ( ST_GeomFromText('POINT(1 2)') );
INSERT 0 1

and you would have mixed 2D and 3D points, would that need to be specified as "mixed"? (or actually you could also use ["Point", "Point Z"] I suppose, if we allow an array of types)

yes

from geoparquet.

cholmes avatar cholmes commented on June 4, 2024

Seems like a good idea to me. I could even see this being a required field - ensure people at least try to fill it out instead of giving an option to be lazy and leave it blank even though clients would like to know that the data is all one type.

So I lean towards options two or three. And option 3 does seem like it'd be useful for the shapefile use case.

Can you explain how GeometryCollection is different than 'mixed'? I have a vague idea but would be interested to hear how it works.

from geoparquet.

rouault avatar rouault commented on June 4, 2024

Can you explain how GeometryCollection is different than 'mixed'

'mixed' would mean you have for example record 0 is a Polygon and record 1 is a MultiPolygon
'GeometryCollection' would mean record 0 is a GEOMETRYCOLLECTION(POINT ..., LINESTRING ...) and record 1 is a GEOMETRYCOLLECTION(LINESTRING ...., POLYGON ....)

from geoparquet.

jorisvandenbossche avatar jorisvandenbossche commented on June 4, 2024

Values for type could be the ones accepted by GeoJSON

Following the "type" values from GeoJSON sounds good.
Personally I would be hesitant with allowing additional ones (Curve, Surface, etc), even if we allow ISO WKB, given that (I think?) those are not that widely supported. But it's of course also easy to check this value and raise a nice error message if you are a reader that doesn't support those extended types.
Also commented about this on #18

and with Z, M or ZM suffixes for other dimensionalities

For clarification: would it be strictly required to use the Z suffix if you have 3D data? (for example, specifying type="Point" would be invalid metadata if you actually have all 3D points?)
And if so, and you would have mixed 2D and 3D points, would that need to be specified as "mixed"? (or actually you could also use ["Point", "Point Z"] I suppose, if we allow an array of types)

from geoparquet.

rouault avatar rouault commented on June 4, 2024

For information, the initial OGR Parquet driver writes a provisional "gdal:geometry_type" property in the "geo" metadata for now (on reading it will try both "geometry_type" and "gdal:geometry_type"). For now, it only writes for the 2D case: "Point", "LineString", "Polygon", "MultiPoint", "MultiLineString", "MultiPolygon", "GeometryCollection" or "mixed" (the later matches OGR wkbUnknown status). For the Z / M cases, do we want to append the dimensionality just after, or with a separator space: ie "PointZ" or "Point Z" ?

from geoparquet.

jorisvandenbossche avatar jorisvandenbossche commented on June 4, 2024

or "mixed" (the later matches OGR wkbUnknown status)

Do we want to distinguish "mixed" from "unknown"? (or not set)
I suppose that in the case of "mixed", GDAL would not do the effort of computing the geometry type, while if it is "unknown" it would still do that (by default / depending on the env flag)?

For the Z / M cases, do we cant to append the dimensionality just after, or with a separator space: ie "PointZ" or "Point Z" ?

I would maybe go with a space. Or are you aware of any projects that we could follow here?
Checking with GDAL, it seems that ogrinfo uses "3D Point" instead.

from geoparquet.

rouault avatar rouault commented on June 4, 2024

Do we want to distinguish "mixed" from "unknown"? (or not set)
I suppose that in the case of "mixed", GDAL would not do the effort of computing the geometry type, while if it is "unknown" it would still do that (by default / depending on the env flag)?

The wkbUnknown state is typically used by GDAL to report geometry columns that may contain mixed geometry types, e.g a GEOMETRY PostGIS column. The OGR PostGIS driver will not try to check if there's a mix of geometry types in the table content or if it is uniform, because that could be too lengthy.
So for GeoParquet, if a "geometry_type" attribute is there and says "unknown" (or "mixed" if that would be what would be used), the OGR driver wouldn't make any effort to guess more. It only does that in the current state of the spec since there's no official way to report a geometry type.
Maybe "mixed" is not so useful. If you know the composition of the mix, then use "geometry_type": [ "Polygon", "MultiPolygon" ], and if you don't (or want a generic schema that can apply to other files), use "geometry_type": "unknown" ?

I would maybe go with a space

would be fine for me. With a space, this is consistent with the beginning of a ISO WKT (if ignoring case), e.g "POINT Z (1 2 3)"

Or are you aware of any projects that we could follow here?

No, can't think of other references.

Checking with GDAL, it seems that ogrinfo uses "3D Point" instead.

This is a human facing representation (and that predates ISO WKT/WKB support). I wouldn't use that for JSON encoding.

from geoparquet.

jorisvandenbossche avatar jorisvandenbossche commented on June 4, 2024

Maybe "mixed" is not so useful. If you know the composition of the mix, then use "geometry_type": [ "Polygon", "MultiPolygon" ], and if you don't (or want a generic schema that can apply to other files), use "geometry_type": "unknown" ?

Sounds good. It's still an optional field, so you also have the option of not including it to signal that it is unknown .. ?

from geoparquet.

rouault avatar rouault commented on June 4, 2024

It's still an optional field, so you also have the option of not including it to signal that it is unknown .. ?

yes, probably for a file that declares a version of the spec that supports geometry_type, the absence of geometry_type should be understood as the same as a "geometry_type": "unknown", and would not trigger file content analysis.

from geoparquet.

cholmes avatar cholmes commented on June 4, 2024

'unknown' makes sense to me.

I do think it'd be nice to have some nudge on implementors to at least try to add it if possible, since it is clearly useful for a set of tools. Perhaps make it required, but allow 'unknown', so that implementors need to at least think about if they can provide something useful. Versus thinking 'oh, it's an optional field, I don't need to worry about it'.

Looks like Flatgeobuf uses 'Unknown' for similar use case: https://github.com/flatgeobuf/flatgeobuf/blob/6884cb05438686947f7bf3776c7ae5c9febfbf5a/src/fbs/header.fbs#L70

And I think with their header stuff the field is required?

from geoparquet.

jorisvandenbossche avatar jorisvandenbossche commented on June 4, 2024

I just opened a draft PR adding some initial text: #51

Perhaps make it required, but allow 'unknown', so that implementors need to at least think about if they can provide something useful.

It's indeed maybe good to make a choice between either keeping it optional (and leaving out "Unknown") or making it required but allow "Unknown". Because otherwise we just give people two ways to say that it is not known.

(on the other hand, making it optional is more backwards compatible, so even if we make it "required" I would hope that implementations are a bit flexible)

from geoparquet.

cholmes avatar cholmes commented on June 4, 2024

It's indeed maybe good to make a choice between either keeping it optional (and leaving out "Unknown") or making it required but allow "Unknown". Because otherwise we just give people two ways to say that it is not known.

Ah right, yeah - two ways to say it's not known seems less good. I continue to lean towards explicitly saying Unknown. And if implementations choose to support 0.1 then they obviously need to have that flexibility. Right now it's important that implementations have that flexibility, but I'm hoping that when we go 1.0 we'll have a lot more implementations, and that for those it'll be reasonable to just say 'I only support version 1.0' (while there will also be lots of great libraries that support all the early versions).

Note I will shift at some point relatively soon to caring a lot about backward compatibility and do appreciate your push on it @jorisvandenbossche. But I feel in the early 0.1 / 0.2 / 0.3 times we should give ourselves some flexibility to change as we discuss and get wider feedback.

from geoparquet.

Jesus89 avatar Jesus89 commented on June 4, 2024

IMO, if the field is optional there is no need for the "Unknown" geometry type. Just not including the field should be enough. Otherwise, we need to explain what "Unknown" means, and the users of the spec should parse the "Unknown" type.

from geoparquet.

alasarr avatar alasarr commented on June 4, 2024

Yes, @Jesus89, I think we're all on the same page.

@jorisvandenbossche comment:

It's indeed maybe good to make a choice between either keeping it optional (and leaving out "Unknown") or making it required but allow "Unknown". Because otherwise we just give people two ways to say that it is not known.

And @cholmes commit suggestion: #51 (review)

from geoparquet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.