Giter VIP home page Giter VIP logo

Comments (5)

raymondben avatar raymondben commented on September 23, 2024

That's a bit of an edge case because the data files live on a different server to the web pages. The wget --span-hosts option helps a bit, by allowing the recursion to cross onto a different domain, but it won't solve everything in this case. I think there is another solution though, stand by ...

from bowerbird.

raymondben avatar raymondben commented on September 23, 2024
  1. For the zip file, what about:
library(rvest)
links <- read_html("http://gadm.org/download_world.html") %>% html_nodes("a")
## find links pointing to gadmNNN.gdb.zip files, take the highest number
src_url <- head(sort(Filter(function(z) grepl("gadm[[:digit:]]+\\.gdb\\.zip", z), sapply(links, html_attr, "href")), decreasing=TRUE), 1)

and use source_url=src_url in your existing bb_source def. I don't think there's a "pure" bowerbird solution to that one.

  1. What's the issue with RDS levels? You want *_adm0.rds, *_adm1.rds, etc?
    Maybe:
x <- read_html("http://gadm.org/download_country.html")
## find all non-empty options that are part of the countrySelect element
links <- Filter(nzchar, sapply(x %>% html_node("#countrySelect") %>% html_nodes("option"), html_attr, "value"))
do.call(rbind, lapply(str_match_all(links, "^([[:alpha:]]{3})_.*([[:digit:]])$"), function(z) z[2:3]))

will give you all the countries and how many levels they have, then you construct the appropriate URLs from that?

from bowerbird.

mdsumner avatar mdsumner commented on September 23, 2024

Ah, ok - that's fine - thanks!

from bowerbird.

mdsumner avatar mdsumner commented on September 23, 2024

Just for the record, here's the final set up to get all the country-files for all levels, as well as the master GDB (a bit over 2Gb in total)

library(rvest)
links <- read_html("http://gadm.org/download_world.html") %>% html_nodes("a")
## find links pointing to gadmNNN.gdb.zip files, take the highest number
gadm_src_url <- head(sort(Filter(function(z) grepl("gadm[[:digit:]]+\\.gdb\\.zip", z), sapply(links, html_attr, "href")), decreasing=TRUE), 1)
x <- read_html("http://gadm.org/download_country.html")
## find all non-empty options that are part of the countrySelect element
links <- Filter(nzchar, sapply(x %>% html_node("#countrySelect") %>% html_nodes("option"), html_attr, "value"))
gadm_rds0 <- do.call(rbind, lapply(stringr::str_match_all(links, "^([[:alpha:]]{3})_.*([[:digit:]])$"), function(z) z[2:3]))

gadm_rds <- tibble::tibble(name = gadm_rds0[,1], levels = gadm_rds0[,2]) %>% 
  dplyr::slice(rep(row_number(), levels)) %>% dplyr::group_by(name) %>% 
  ## zero-based
  dplyr::mutate(level = row_number() - 1) %>% dplyr::ungroup() %>% dplyr::select(name, level) %>% as.matrix()

template <- file.path(dirname(gadm_src_url), "rds/%s_adm%s.rds")
gadm_rds_src_url <- apply(gadm_rds, 1, function(ab) sprintf(template, ab[1], ab[2]))

library(bowerbird)
gadm.rds <- bb_source(
  name="GADM maps and data in RDS format",
  id="gadm-maps-rdb",
  description="GADM provides maps and spatial data for all countries and their sub-divisions.",
  doc_url="http://www.gadm.org",
  citation="http://gadm.org/about.html",
  source_url= gadm_rds_src_url,
  license="http://gadm.org/license.html",
  method=list("bb_handler_wget",level=1, robots_off=TRUE),
 collection_size= 0.1,
  access_function = "base::readRDS",
  data_group="Administrative")

gadm <- bb_source(
  name="GADM maps and data in ESRI Geodatabase",
  id="gadm-maps-gdb",
  description="GADM provides maps and spatial data for all countries and their sub-divisions.",
  doc_url="http://www.gadm.org",
  citation="http://gadm.org/about.html",
  source_url=gadm_src_url,
  license="http://gadm.org/license.html",
  method=list("bb_handler_wget",recursive=TRUE,level=1, robots_off=TRUE),
  postprocess=list("bb_unzip"),
  collection_size= 1,
  access_function = "sf::read_sf",
  data_group="Administrative")


my_directory <- "~/bowerbird"
cf <- bb_config(local_file_root=my_directory)

cf <- bb_add(cf, gadm) %>% bb_add(gadm.rds)
status <- bb_sync(cf,verbose=TRUE)
  

from bowerbird.

raymondben avatar raymondben commented on September 23, 2024

The gadm site and data format has changed, so for anyone visiting this issue now, an updated version of this might look like:

library(bowerbird)
gadm <- bb_source(
  name = "GADM maps and data in ESRI Geodatabase",
  id = "gadm-maps-gdb",
  description = "GADM provides maps and spatial data for all countries and their sub-divisions.",
  doc_url = "http://www.gadm.org",
  citation = "http://gadm.org/about.html",
  source_url = "https://gadm.org/download_world.html",
  license = "http://gadm.org/license.html",
  method = list("bb_handler_rget", level = 1, accept_download = "gpkg\\.zip$"),
  comment = "This will download the data as a single database as well as a version with six separate layers (one for each level of subdivision/aggregation). Adjust the 'accept_download' parameter if you only want one of these",
  postprocess = list("bb_unzip"),
  collection_size = 7.5,
  access_function = "sf::read_sf",
  data_group = "Administrative")


cf <- bb_config("~/temp/data/bbtest") %>% bb_add(gadm)

## don't use dry_run = TRUE if you are doing this for real!
bb_sync(cf, dry_run = TRUE, verbose = TRUE)

Which gives:

Thu Jul 26 17:10:50 2018
Synchronizing dataset: GADM maps and data in ESRI Geodatabase
Source URL https://gadm.org/download_world.html
--------------------------------------------------------------------------------------------

 this dataset path is: ~/temp/data/bbtest/gadm.org
 building file list ... done.
 visiting https://gadm.org/download_world.html ... 
  |====================================================================================================================================| 100%
No encoding supplied: defaulting to UTF-8.

 done.
 dry_run is TRUE, bb_rget is not downloading the following files:
 https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_gpkg.zip
 https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_levels_gpkg.zip

Thu Jul 26 17:10:52 2018 dataset synchronization complete: GADM maps and data in ESRI Geodatabase
# A tibble: 1 x 5
  name                                   id            source_url                           status files           
  <chr>                                  <chr>         <chr>                                <lgl>  <list>          
1 GADM maps and data in ESRI Geodatabase gadm-maps-gdb https://gadm.org/download_world.html TRUE   <tibble [2 × 3]>

And the files will be in ~/temp/data/bbtest/biogeo.ucdavis.edu/data/.

from bowerbird.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.