Giter VIP home page Giter VIP logo

gpkg's Introduction

Hey now ๐Ÿ‘‹

My name is Andrew Brown. I am a soil scientist with an interest in open-source software.

๐Ÿ“ซ How to find me: ...

Brownag's GitHub stats

Most of my software development work centers around interoperability, R-based tools, National Cooperative Soil Survey data sources, and spatial data analysis. I am fortunate to be able to do some of this work as part of my job as a soil scientist.

Top Langs

I maintain a handful of R packages and contribute to several more; some of these are available on CRAN (https://cran.r-project.org/) and the ncss-tech or brownag r-universe repositories.

I also have a blog that I rarely add posts to these days: http://humus.rocks/

R Packages

gpkg's People

Contributors

brownag avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gpkg's Issues

replace core GDAL functionality with {gdalraster}

I recently added some updates to use {vapour} for things {terra} could not directly provide.

It may be worth going even closer to the source with {gdalraster}. This change be pending the addition of the OGR vector-related bindings https://usdaforestservice.github.io/gdalraster/articles/gdalvector-draft.html. Since I have several things to address between now and the next release, there is some time to see how this develops, and perhaps make suggestions for anything missing from {gdalraster}

dynamic GeoPackage file writing

Creation of a geopackage object should collect or create all the required information to take any associated R data objects and user input and create a GeoPackage file. There is no mechanism to do the whole process currently, the user has to build a new file incrementally with gpkg_write(<object>, ...), possibly appending to an existing file.

Currently you can also create geopackage S3 objects that are not "backed up by" (or derived from) any file. This is not currently a useful thing to do, the object is really only "read only" by the time the R user sees it. No changes they make to the R object can be done to the geopackage without going through the SQLiteConnection.

I would like to have a gpkg_write(<geopackage>) method that will essentially figure out the sequence of layer writes, table updates etc. that are necessary to create a (possibly complex) GeoPackage directly from an R object (i.e. a list of layers)

  • This could potentially be automatically invoked, creating a geopackage that exists only as a temporary file
    • The user should not necessarily have to be concerned with how the file is put together or where it is just that they had some R objects that they wanted to be able to get from it.
  • It also should be possible (at least in theory) to "roundtrip" arbitrary geopackages in to the S3 object representation and back out as a new geopackage file with identical contents. I am sure there are some possible wrinkles here, but should be achievable for simple cases.

GeoPackage validation routine

  • check for required tables
  • check for consistency between tables (foreign key relationships)
  • check for duplicate entries within tables (violated primary key constraints)
  • ?

    gpkg/R/gpkg-validate.R

    Lines 1 to 12 in c813561

    #' Validate a GeoPackage
    #'
    #' Checks for presence of required tables, valid values and other constraints.
    #'
    #' @param x Path to GeoPackages
    #' @param diagnostics Return a list containing diagnostics (missing table names, invalid values, other errors)
    #'
    #' @return `TRUE` if valid. `FALSE` if one or more problems are found. For full diagnostics run with `diagnostics = TRUE` to return a list containing results for each input GeoPackage.
    #' @export
    gpkg_validate <- function(x, diagnostics = FALSE) {
    stop("This is not implemented yet")
    }

`gpkg_create_dummy_features()` upgrades and deprecation

  • Use new gpkg_create_spatial_ref_sys() function internally
  • Abstract out gpkg_geometry_columns table create/insert code as new functions
  • Deprecate gpkg_create_dummy_features() function name and replace with e.g. gpkg_create_empty_features()
  • Deprecate gpkg_contents(template=) argument, provide documented arguments for each data element (srs_id, bounding box)
  • Add a gridded analog e.g. gpkg_create_empty_grid()
  • Add an option to add empty tables to gpkg_contents (default to TRUE, changing existing behavior)

`.gpkg_process_sources()` should be more robust to various OGR sources

Currently just a small number of file path extensions for raster and vector data are used to identify paths to possible spatial data sources.

Nobody has complained, and I guess it has worked fine for me, but I should have generalized this a long time ago...

I am now thinking perhaps the functionality to support and classify arbitrary file paths in the same call to gpkg_write() might be too general. The main reason using file extensions is necessary is because either terra::rast() or terra::vect(), or a non-spatial table function, needs to be called based on whether a file input is determined to be is a raster, vector, or attribute.

  • Can I create a lookup table of file extensions from file based sources found in terra::gdal(drivers=TRUE)?

  • Could I pass an argument so that user needs to specify whether they are writing raster, vector, or attributes? Meaning 1 type per call, or a vector of equal length to the input list denoting data type? Both of these alternate options seem rather cumbersome in comparison to how it works now.

connection with DBI

I work with sql files in rstudio, where you need to specify the type of connection in the first row as a comment, like so:
-- !preview conn=DBI::dbConnect(RSQLite::SQLite(),"some_file_name.gpkg")
using RSQLite doesn't allow for spatial function as gpkg allows.
I can't figure out what is the right SQLiteConnection to pass to that row.
Anyhow this could be a nice feature as a standalone function.

add spatial views

Add a convenience method for the creation of spatial views: https://gdal.org/drivers/vector/gpkg.html#spatial-views

For example:

CREATE VIEW my_view AS SELECT foo.fid AS OGC_FID, foo.geom, ... FROM foo JOIN another_table ON foo.some_id = another_table.other_id
INSERT INTO gpkg_contents (table_name, identifier, data_type, srs_id) VALUES ( 'my_view', 'my_view', 'features', 4326)
INSERT INTO gpkg_geometry_columns (table_name, column_name, geometry_type_name, srs_id, z, m) values ('my_view', 'my_geom', 'GEOMETRY', 4326, 0, 0)

refactor `dplyr.frame()` and `lazy.frame()` methods

I would like to avoid the "lazy.frame" terminology, and also "dplyr.frame" is not useful shorthand for tbl(). I was toying with the ideas, which now work fine, and now would like to have an API more consistent with the rest of the package.

  • lazy.frame() will no longer be exported and will be renamed gpkg_table_pragma() -- this is useful information but not a substitute for table contents.
  • dplyr.frame() will no longer be exported, it will be renamed internally and will be used inside gpkg_table() and gpkg_tables() unless new argument pragma is TRUE.
  • gpkg_table() and gpkg_tables(): the new argument pragma=FALSE will require suggested package {dbplyr}, pragma=TRUE will use gpkg_table_pragma() instead of gpkg_table().
    • This is requiring the user to opt in to avoid the {dbplyr} dependencies rather than relying on what namespaces could be loaded to determine behavior (yuck)
    • gpkg_table() current behavior is to use dbGetQuery() to materialize full table in memory, which was not behavior of gpkg_tables(); "table" and "tables" will now be consistent and the old gpkg_table() code to make a table in memory will be available as gpkg_get_table() (or gpkg_collect_table() (?), or collect=TRUE argument to gpkg_table())

The lazy.frame/dplyr.frame functions will be removed from the namespace as I prepare for an initial release of gpkg v0.1.0.

consider use of R6 for geopackage

Early on I decided I didn't want to use the R6 system for geopackage, but my attempt for in-place modification (#9) for connecting existing objects (that has been reverted) has renewed consideration.

I might need to make a draft PR to test this out to be sure it is not what I want.

R6 could allows for better maintenance of state within existing objects, allowing for proper in-place modification by e.g. definition of a <geopackage>$connect() method

in-place modification for `gpkg_connect()`

Often there isn't a need to manually connect a geopackage object because connections are created and destroyed on the fly. However, when performing many database operations it may be beneficial to make use of a persistent connection to avoid the extra overhead of repeatedly closing and re-opening.

Currently, you need to overwrite the object to connect to the database and have that connection "stick".

library(gpkg)

g <- geopackage()
g
#> <geopackage>
#> --------------------------------------------------------------------------------
#> # of Tables: 0
#>  
#>  
#> --------------------------------------------------------------------------------

# doesnt work
gpkg_connect(g)
#> <geopackage>
#> --------------------------------------------------------------------------------
#> # of Tables: 0
#>  
#>  
#> --------------------------------------------------------------------------------
#> <SQLiteConnection>
#>   Path: /tmp/RtmpWPANIS/Rgpkg277b54d212bf9.gpkg
#>   Extensions: TRUE
g
#> <geopackage>
#> --------------------------------------------------------------------------------
#> # of Tables: 0
#>  
#>  
#> --------------------------------------------------------------------------------

# works
g <- gpkg_connect(g)
g
#> <geopackage>
#> --------------------------------------------------------------------------------
#> # of Tables: 0
#>  
#>  
#> --------------------------------------------------------------------------------
#> <SQLiteConnection>
#>   Path: /tmp/RtmpWPANIS/Rgpkg277b54d212bf9.gpkg
#>   Extensions: TRUE

# works
gpkg_disconnect(g)
g
#> <geopackage>
#> Error: Invalid or closed connection

Connecting to a database and not saving the result prevents from being able to reference the pointer in the future to be disconnected (which causes warnings: Warning message: call dbDisconnect() when finished working with a connection).

It would be nice to come up with a way that in-place modification of a geopackage object could be used to update the connection field.

beautify `print()` method

  1. Simple way of listing registered gpkg_contents tables
  2. Identify presence of standard/core tables (gpkg_contents, gpkg_spatial_ref_sys, gpkg_geometry_columns, etc.) as a separate list
  3. Add symbols or some other notation for optional associated features (e.g. presence absence of rtrees associated with specific features, metadata tables, table relationships (?))

Item 3 intended as a way of using as markup to enhance item 1 rather than listing all (long, possibly numerous) names

refactor `gpkg_write()`

  • vector write not working properly anymore
  • better handling of list input for naming feature / tile sets / data_null
  • basic post-processing/validating of result
  • tests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.