Giter VIP home page Giter VIP logo

quadtree's Introduction

quadtree: An R package for region quadtrees

quadtree provides functionality for working with raster-like quadtrees (called “region quadtrees”), which allow for variable-sized cells.

library(quadtree)

habitat <- terra::rast(system.file("extdata", "habitat.tif", package="quadtree")) # load sample data
qt <- quadtree(habitat, .03, "sd") # create a quadtree

Installation

The package can be installed from CRAN using install.packages():

install.packages("quadtree")

The development version can be installed from GitHub using devtools::install_github():

# install.packages("devtools")
devtools::install_github("dfriend21/quadtree")

Documentation

Visit the package website for more information.

Learning how to use the quadtree package

The best way to learn about the package is by reading the vignettes, which are available through R and through the package website. I’d suggest reading the vignettes in this order:

  1. Creating Quadtrees
  2. Using Quadtrees
  3. Finding LCPs

quadtree's People

Contributors

dfriend21 avatar brownag avatar

Stargazers

Insang Song avatar WhuLife avatar  avatar Abd. Malik A. Madinu avatar Josh Erickson avatar  avatar Stephen Roecker avatar Timothée Giraud avatar Majid Hojati avatar Jakub Nowosad avatar Christoph Stepper avatar Antony Barja  avatar Dylan Beaudette avatar Hugh Graham avatar Michael Sumner avatar Arthur Gailes avatar

Watchers

 avatar

Forkers

brownag

quadtree's Issues

LCP finder finds wrong path

I already solved this issue, but I want documentation of it, so I'll write a brief summary of the problem and the solution.

PROBLEM:

Here's a picture of a "least-cost path" found between two points:

ablihgpkmkdgceji

It's failing for one specific raster file. I tried using some other rasters, and they all worked fine. In addition, this path is correctly found using an older version of the same raster. The only difference that I know of is that the old one was output with raster while the new one was output with terra.

SOLUTION:

The issue had to do with checking equality between doubles. In particular, in the C++ code, I compare the boundaries of two nodes to determine which side two nodes are adjacent on. Here's the old code from LcpFinder.cpp, in doNextIteration():

double mid{0};
bool isX = true; // tells us whether mid coordinate is an x-coordinate or a y-coordinate
if(node->xMin == nodeNb->xMax){ // left side
    mid = node->xMin;
} else if(node->xMax == nodeNb->xMin) { // right side
    mid = node->xMax;
} else if(node->yMin == nodeNb->yMax) { // bottom
    mid = node->yMin;
    isX = false;
} else if(node->yMax == nodeNb->yMin) { // top
    mid = node->yMax;
    isX = false;
}

In the above example there were cases where none of the if statements returned true. Even though the cells were adjacent, the values for the max and min values were slightly different. That meant that mid was 0, which messed up the later calculations. It allowed for the cost to be negative, and that messed everything up, since it looks for the lowest possible cost, so it was always picking the negative cost edges.

To fix this, I switched to finding the side that has the smallest absolute difference:

std::vector<double> difs{
    std::abs(node->xMin - nodeNb->xMax), // left side
    std::abs(node->xMax - nodeNb->xMin), // right side
    std::abs(node->yMin - nodeNb->yMax), // bottom
    std::abs(node->yMax - nodeNb->yMin) // top
}; 
int minIndex = 0;
for(int i = 1; i < 4; ++i){
    if(difs[i] < difs[minIndex]){
        minIndex = i;
    }
}

double mid{0};
bool isX = true; // tells us whether mid coordinate is an x-coordinate or a y-coordinate

if(minIndex == 0){ // left side
    mid = node->xMin;
} else if(minIndex == 1) { // right side
    mid = node->xMax;
} else if(minIndex == 2) { // bottom
    mid = node->yMin;
    isX = false;
} else if(minIndex == 3) { // top
    mid = node->yMax;
    isX = false;
}

Since I already know that the cells are adjacent, this is safe to do.

I'm unsure as to why this problem emerged once terra was used to save the raster file.

Make Quadtree more space-efficient

A Quadtree is currently not as space-efficient as it could be, partially because it stores information on the cell bounds, the cell's "level", and the cell's neighbors. I think the cell bounds and cell level may be unnecessary - that might be able to be inferred by the "path" taken to reach the cell.

Including "neighbors" allows for fast computation of neighbors, at the cost of space efficiency - it's a trade-off. It might be worth reconsidering this at some point - perhaps its better to go with better space efficiency rather than fast neighbor calculations.

plot(<Quadtree>): legend doesn't show when alpha is small

Example:

data(habitat)
qt <- quadtree(habitat, .1)
plot(qt, crop = TRUE, alpha = .05, border_col = "transparent")

image

The following error is shown:

Error in graphics::rasterImage(rast, bar_x0, bar_y0, bar_x1, bar_y1, xpd = TRUE) : 
  invalid RGB specification 

6. graphics::rasterImage(rast, bar_x0, bar_y0, bar_x1, bar_y1, xpd = TRUE) at plot_Quadtree.R#416
5. (function (zlim, col, alpha = 1, lgd_box_col = NULL, lgd_x_pct = 0.5, 
    lgd_y_pct = 0.5, lgd_wd_pct = 0.5, lgd_ht_pct = 0.5, bar_box_col = "black", 
    bar_wd_pct = 0.2, bar_ht_pct = 1, text_cex = 1, text_col = NULL, 
    text_font = NULL, text_x_pct = 1, ticks = NULL, ticks_n = 5)  ... 
4. do.call(add_legend, legend_args) at plot_Quadtree.R#239
3. .local(x, ...) 
2. plot(qt, crop = TRUE, alpha = 0.05, border_col = "transparent") 
1. plot(qt, crop = TRUE, alpha = 0.05, border_col = "transparent") 

quadtree from multi-band grid

Hi,

Is it possible to extend quadree() to work with multiband grids? For example, splits based on total variance (across bands) and in space? A possible work-around might be a single band quadtree applied to the first principal component of a multi-band grid.

Thanks!

Migrate issues from GitLab to GitHub

When developing the package I used GitLab, but now I'd like to switch to GitHub being the primary source of the code (the DESCRIPTION file now points to GitHub rather than GitLab). This GitHub repository is mirrored from the GitLab repository. I have a number of issues in the GitLab repository, and it'd be nice if I could import them to GitHub.

Here's a link to the GitLab issues:

https://gitlab.com/dafriend/quadtree/-/issues

A (very) quick Google search brought up these links, which may be helpful when I get around to doing this:

https://github.community/t/import-issues-from-gitlab/707/12
https://gist.github.com/jonmagic/5282384165e0f86ef105
https://github.com/piceaTech/node-gitlab-2-github

More support for binary quadtrees (in particular, set operations)

Much of the literature I've come across on quadtrees makes the assumption that the values are binary. This article:

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.299.334&rep=rep1&type=pdf

is a great overview of quadtrees. But it basically assumes binary values.

The author talks about operations like union and intersection. That'd be cool to implement, but it makes much more sense for binary values than it does for continuous values.

There's a lot of functionality in that paper like this that would be nice to include, but a lot of it makes way more sense with binary values, and that hasn't been my focus. So maybe I should try to provide more support for binary-valued quadtrees?

errors with raster-dev

When checking quadtree with the dev version of "raster" (https://github.com/rspatial/raster) I now get

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Failure ('test_minor_functions.R:151:3'): projection() runs without errors and returns correct value ──
  `qt_proj` (`actual`) not equal to raster::projection(habitat) (`expected`).
  
  `actual`:   ""
  `expected`: NA

This is because of changes in "raster" related to the changes in the "sp family" packages. sp::identicalCRS does not work well with Raster* objects anymore.

You now need to use CRS(Raster*) or you could use terra::same.crs(x, y). But the best approach might be to use "terra" instead of "raster".

Can you please fix this so that your package won't break when I submit "raster" to CRAN?

Efficient conversion to sf objects

Thanks for this awesome package! I've been wanting to do this in R for a very long time.

Tinkering with the CRAN version I could not figure out how to efficiently convert a quadree object to sf object. I may have overlooked something obvious. Here is a FAST solution that I came up with in case you find it useful. The {wk} and {data.table} packages improved the collation + flattening of cells into a single {sf} object.

# https://www.sciencedirect.com/science/article/pii/S0098300406001828?via%3Dihub
# https://www.cabdirect.org/cabdirect/abstract/20013118404
# https://link.springer.com/content/pdf/10.1007/BF02083652.pdf

# https://dfriend21.github.io/quadtree/index.html

library(soilDB)
library(raster)
library(rasterVis)
library(sf)
library(wk)
library(data.table)
library(quadtree)

# make a bounding box and assign a CRS (4326: GCS, WGS84)
a.CA <- st_bbox(
  c(xmin = -2280334, xmax = -2065433, ymin = 1755361, ymax = 1970262), 
  crs = st_crs(5070)
)

# convert bbox to sf geometry
a.CA <- st_as_sfc(a.CA)


pH_3060cm <- ISSR800.wcs(aoi = a.CA, var = 'ph_3060cm')
plot(pH_3060cm, axes = FALSE, xlab = '', ylab = '')


qt <- quadtree(pH_3060cm, split_threshold = 0.25, split_method = "sd", adj_type = "resample",  resample_n_side = 128)


par(mar = c(0, 1, 0, 0), bg = 'black', mfcol = c(1, 2))
plot(pH_3060cm, axes = FALSE, xlab = '', ylab = '', legend = FALSE, col = viridis::viridis(100))
plot(qt, axes = FALSE, xlab = '', ylab = '', legend = FALSE, col = viridis::viridis(100), border_col = 'white', na_col = 'black', crop = TRUE)

r <- as_raster(qt, pH_3060cm)

par(mar = c(0, 1, 0, 0), bg = 'black', mfcol = c(1, 2))
plot(pH_3060cm, axes = FALSE, xlab = '', ylab = '', legend = FALSE, col = viridis::viridis(100))
plot(r, axes = FALSE, xlab = '', ylab = '', legend = FALSE, col = viridis::viridis(100))


df <- as_data_frame(qt, FALSE)

# iterate over boxes -> single sf object
# ~ 5 seconds with wk
v <- lapply(1:nrow(df), function(i) {
  
  .row <- df[i, ]
  
  ## this is very slow
  # st_bbox -> st_as_sfc -> st_as_sf
  
  # fast BBOX -> SF via wk
  
  # BBOX -> WK
  .bbox <- rct(
    xmin = .row$xmin, xmax = .row$xmax, ymin = .row$ymin, ymax = .row$ymax, 
    crs = st_crs(5070)
  )
  
  # WK -> SF
  return(st_as_sf(.bbox))
})

# rbind.sf is slow for > 1,000 objects
# v <- do.call('rbind', v)library(wk)

# ! 3 seconds
# https://github.com/r-spatial/sf/issues/798
v <- st_as_sf(data.table::rbindlist(v))


dev.off()

par(mar = c(0, 0, 0, 0) + 0.125, bg = 'black')
plot(pH_3060cm, axes = FALSE, xlab = '', ylab = '', legend = FALSE, col = viridis::viridis(100))
plot(v, border = 'white', add = TRUE)

LCP sometimes returns inconsistent results

I have already solved this problem, but I want documentation of it.

PROBLEM:

I noticed that sometimes different paths were returned when using the exact same input data. After a lot of investigation, I discovered that it was happening in cases where there are two shortest paths - that is, two paths between two points that have the same overall cost. I'm using a std::multiset to store the potential edges, and I was sorting it on the total cost of the path. However, std::multiset doesn't guarantee a consistent ordering in cases where two objects are equivalent. This was causing problems when running simulations - even though I was setting the random seed, I was getting slightly different results between runs.

SOLUTION:

Here is the function I was previously using to determine the order of objects in the multiset:

bool operator() (std::tuple<int,int,double,double> a, std::tuple<int,int,double,double> b) const {
    return std::get<2>(a) < std::get<2>(b);
}

To avoid the problem I added code so that if the cost is equal it will sort on the other values in the tuple. The order of sorting is as follows: cost, distance, ID of the first node, ID of the second node. This guarantees a consistent (but arbitrary) ordering.

bool operator() (std::tuple<int,int,double,double> a, std::tuple<int,int,double,double> b) const {
    bool is_cost_eq = std::get<2>(a) == std::get<2>(b);
    if(is_cost_eq){
        bool is_dist_eq = std::get<3>(a) == std::get<3>(b);
        if(is_dist_eq){
            bool is_node1_eq = std::get<0>(a) == std::get<0>(b);
            if(is_node1_eq){
                return std::get<1>(a) < std::get<1>(b);   
            } else {
                return std::get<0>(a) < std::get<0>(b);    
            }
        } else {
            return std::get<3>(a) < std::get<3>(b);
        }
    } else {
        return std::get<2>(a) < std::get<2>(b);
    }
}

Arithmetic operations between quadtrees

It'd be cool if quadtrees could be subtracted, added, etc.

This would likely involve some of quadtree "union" or something - basically, if I wanted to preserve all the possible info, I'd need to have the resulting quadtree take on the smallest resolution of any cell. Basically, if one quadtree has a single cell in one area, and the other has that same area broken into multiple levels, the resulting quadtree would need to take on the structure of the more detailed quadtree in order to preserve the data in the smaller cells.

as_raster() doesn't work for single cell quadtrees

library(quadtree)
qt <- quadtree(matrix(1), 1)
as_raster(qt)

This code produces this error:

Error in .local(x, y, ...) : 'y' must be a matrix or a data frame
7. stop("'y' must be a matrix or a data frame") at extract.R#47
6. .local(x, y, ...)
5. extract(x, pts[, 1:2])
4. extract(x, pts[, 1:2]) at as_raster.R#52
3. .local(x, ...)
2. as_raster(qt) at generics.R#4
1. as_raster(qt)

Manual splitting/merging of quadtree cells

Currently the user has no way to alter the quadtree structure - they can change values, but not the structure. It'd be nice if users could split/merge cells after a quadtree has been created.

I think the tricky thing about this would be deciding how the user interface would work. Do they specify a point, and the cell at that point is split? What about merging? I suppose they could specify a point, and the cell at that point could be merged with its siblings.

But I'm not sure I'm crazy about this interface - it feels a bit awkward. Is there a better way to do this?

Create quadtrees from lines/polygons/points

It'd be nice to be able to create a quadtree from vector data. User could specify the minimum cell size, and then any quadrant that had a segment running through it would split until the minimum cell size is met.

For points, there could be some different ways to do it. You could split a quadrant whenever there's more than one (or some user-defined threshold) point in the quadrant. You could make the value binary, or, if more than one point is allowed per quadrant, the value could be the number of points.

Another interesting idea is to use the value of points rather than the point location. This would actually be very similar to what I'm doing now with rasters - at each step it'd check the values of the points and use that to decide whether or not to split. Not entirely sure how useful this would be, but it sounds cool.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.