Giter VIP home page Giter VIP logo

scidbst's Introduction

scidbst

"scidbst" is a package for R that bundles functions to manipulate spatio-temporal arrays in the database system SciDB (by Paradigm4). This package relies on the basic scidb operation that are provided in the "scidb" package also by Paradigm4 and extends it with the ability to maintain the spatial and/or temporal references that are annotated with the arrays by the scidb plugin scidb4geo. It also enables users to use spatial or temporal constructs as parameter in scidb functions in R.

Dependencies

The package has several prerequisites for its usage. First, there are specific system setup requirements regarding the database system SciDB that operates in the background, and second, the package depends also on some other R packages.

System

In the background the package needs a fully operating SciDB database that runs the scidb4geo plugin by Marius Appel, the SHIM client and the r_exec plugin.

To quickly set up a working environment, check out the docker image provided by Marius Appel.

Packages in R

The package mainly depends on the following packages:

  • scidb
  • raster

If you are going to use some of the coerce functions, then the following packages are also of interest:

  • spacetime
  • xts
  • sp

Make sure to install those dependent packages before

Getting started

In the following we are going to give simple examples, how to install and use the package. For more details about the use and availability of functions, please check the documentation files and the vignettes.

Install

devtools::install_gitub(repo="scidbst",username="flahn")

Quickstart

The first and foremost thing to do is to establish a connection to SciDB using the function 'scidbconnect' of the scidb package.

scidbconnect(host=host,port=port,user=user,password=password,protocol="https",auth_type = "digest")

After that we can start exploring the database with either scidbls() or scidbst.ls(). The latter will show all the arrays that are in SciDB that have a spatial / temporal or spatio-temporal reference attached. If you want to add arrays to the data base, please consider using our (GDAL driver for scidb)[https://github.com/appelmar/scidb4gdal].

To load a referenced array into R we use scidbst("some_st_array"), where "some_st_array" is the name of a spatio-temporal array that was listed by scidbst.ls().

Similarly to the raster package you can use wellknown functions like extent or crs to access information about the references. The following code shows some basic functionalities:

starr = scidbst("some_st_array")
extent(starr)
crs(starr)
textent(starr)
trs(starr)

A typical basic task in the geoscience domain is to create subsets (a particular scene on the earth) and slices (e.g. a scene at a particular time).

subset.extent = extent(35,36.5,6,8.5) # in lon/lat using WGS84
ethiopia.subset = crop(starr,subset.extent)

ethiopia.subset = slice(ethiopia.subset,"t","2003-07-21")

If you need to calculate values based on attributes, you can use the transform() method. Here is an example how to potentially calculate a NDVI based on a Landsat7 dataset.

ls7 = scidbst("some_ls7_array")
ls7_calc = transform(ls7, ndvi = "(band4_avg - band3_avg) / (band4_avg + band3_avg)", mdvi = "(band8_avg - band3_avg) / (band8_avg + band3_avg)")

In order to store changes on the spatial/temporal or spatio-temporal scidbst object in SciDB, we use the function scidbsteval, which evaluates the cascaded operations in the SciDB cluster and stores it under a given name.

ls7_calc = scidbsteval(ls7_calc, "ls7_ndvi_calc")

The mentioned functions will give you just a glimpse of the functionality of this package. For more elaborate analysis, we recommend to have a look at the function r.apply, which allows to execute custom R-Scripts on data chunks. Examples for this can be found in the provided R vignettes.

Vignettes

  1. Introduction to scidbst
  2. Introduction to r.apply
  3. Spatial r.apply Use Case
  4. Spatio-temporal r.apply Use Case

or check the vignettes in R using browseVignettes(package="scidbst")

Authors & Contributors

  • Florian Lahn
  • Marius Appel

License

The package is released under the GNU Affero General Public License (AGPL-3), since it heavily depends on Code written by Paradigm4 in context of SciDB that is also released under the AGPL-3 license. See LICENSE.md for details.

scidbst's People

Contributors

appelmar avatar flahn avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

appelmar

scidbst's Issues

Memory Swap with larger images

There is an issue, when you plot larger images. Then the operation runs quite a while and the hard drive is used heavily. Probable cause: The array information is stored in memory and after a while the memory is swapped onto the hard disk. This slows down the operation.

Note: internally a SpatialGridDataFrame is created which creates a full grid (even if the array is sparse)

Join Operation

I have started to implement a join operation for spatio-temporal arrays with differing resolution. But I ran into complications, when performing a downward join (joining an image with lower resolution into a higher resolved array, in combination with differing temporal resolutions). There is need for debugging in this case.

It is mainly due to the multiple 'iquery' calls and storing of temporary arrays, what makes it difficult to debug. The basic functionality is provided, but the more complex use cases remain not debugged.

Streaming Interface

In the long term the r_exec operation in SciDB will be discontinued and replaced by the stream operation, which allows the execution of arbitrary scripts using command line calls. In this project I started to implement a mean to perform those R calls using the stream interface. However there were some complications:

  1. I tried to pass a script using the expression parameter on the R call. Due to the complex nature of those scripts file upload should be used. Otherwise the script as an expression needs to be readable by R, which includes that every line needs to end with a ";" since newline escaped characters are not supported. Using the script as a file approach it remains unclear, if the script will be redistributed to every worker instance.
  2. Debugging with a distributed parallel system is hard, since there is no central place to log and errors are not transferred "as is" to the client. Errors remain as some sort of notification that "something failed to execute".

Enable subsetting with rotated bounding box

Subset requires a fully set raster to work properly. Enable also the possibility to create a subset based on a rotated bounding box. Have a look at the "scidb4gdal" repository on a merge command, where non existing values are replaced by zero (or NA value)

Installation failed - no existing definition for function ‘regrid’

I have tried installing, but this error message appears:

library(devtools)
install_github("flahn/scidbst")
Downloading GitHub repo flahn/scidbst@master
from URL https://api.github.com/repos/flahn/scidbst/zipball/master
Installing scidbst
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL
'/tmp/Rtmp2XjXRv/devtools22c27012db/flahn-scidbst-85a9e3d' --library='/home/scidb/R/x86_64-pc-linux-gnu-library/3.4' --install-tests

  • installing source package ‘scidbst’ ...
    ** R
    ** inst
    ** preparing package for lazy loading
    Error in setMethod("regrid", signature(x = "scidbst"), .regrid.scidbst) :
    no existing definition for function ‘regrid’
    Error : unable to load R code in package ‘scidbst’
    ERROR: lazy loading failed for package ‘scidbst’
  • removing ‘/home/scidb/R/x86_64-pc-linux-gnu-library/3.4/scidbst’
    Installation failed: Command failed (1)_

Reference Handling after Aggregation

Currently it is not completely clear, what to do with the dimensions one has aggregated by. For SciDB it is easy, they are simply dropped. For spatial or temporal dimensions, however, the reference might still remain.
For example, if you aggregate over space those dimensions are dropped afterwards, but the spatial extent of the object will still remain unchanged (but the spatial resolution will be set to cover the whole extent). Similarly to the temporal dimension, the spatial coordinates remain, but the temporal resolution will change (just from start time to end time).
-> Discuss if we:

  1. also drop associated references
  2. reintroduce those dimensions and set the value to 0
  3. drop it in scidb, but keep it in R

1/2 Pixel shift to upper left

After comparison of the scidb return with the original image, there is a shift of a half pixel to the upper left side.

Probably during "materialize" the raster point grid is assumed to be at the center of the pixels.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.