I have these sources for the GADM data behind raster::getDat

For the zip file, what about: <div class="snippet-clipboard-content n

bb and GADM about bowerbird HOT 5 CLOSED

ropensci commented on September 23, 2024

bb and GADM

from bowerbird.

Comments (5)

raymondben commented on September 23, 2024

That's a bit of an edge case because the data files live on a different server to the web pages. The wget --span-hosts option helps a bit, by allowing the recursion to cross onto a different domain, but it won't solve everything in this case. I think there is another solution though, stand by ...

from bowerbird.

raymondben commented on September 23, 2024

For the zip file, what about:

library(rvest)
links <- read_html("http://gadm.org/download_world.html") %>% html_nodes("a")
## find links pointing to gadmNNN.gdb.zip files, take the highest number
src_url <- head(sort(Filter(function(z) grepl("gadm[[:digit:]]+\\.gdb\\.zip", z), sapply(links, html_attr, "href")), decreasing=TRUE), 1)

and use source_url=src_url in your existing bb_source def. I don't think there's a "pure" bowerbird solution to that one.

What's the issue with RDS levels? You want *_adm0.rds, *_adm1.rds, etc?
Maybe:

x <- read_html("http://gadm.org/download_country.html")
## find all non-empty options that are part of the countrySelect element
links <- Filter(nzchar, sapply(x %>% html_node("#countrySelect") %>% html_nodes("option"), html_attr, "value"))
do.call(rbind, lapply(str_match_all(links, "^([[:alpha:]]{3})_.*([[:digit:]])$"), function(z) z[2:3]))

will give you all the countries and how many levels they have, then you construct the appropriate URLs from that?

from bowerbird.

mdsumner commented on September 23, 2024

Ah, ok - that's fine - thanks!

from bowerbird.

mdsumner commented on September 23, 2024

Just for the record, here's the final set up to get all the country-files for all levels, as well as the master GDB (a bit over 2Gb in total)

library(rvest)
links <- read_html("http://gadm.org/download_world.html") %>% html_nodes("a")
## find links pointing to gadmNNN.gdb.zip files, take the highest number
gadm_src_url <- head(sort(Filter(function(z) grepl("gadm[[:digit:]]+\\.gdb\\.zip", z), sapply(links, html_attr, "href")), decreasing=TRUE), 1)
x <- read_html("http://gadm.org/download_country.html")
## find all non-empty options that are part of the countrySelect element
links <- Filter(nzchar, sapply(x %>% html_node("#countrySelect") %>% html_nodes("option"), html_attr, "value"))
gadm_rds0 <- do.call(rbind, lapply(stringr::str_match_all(links, "^([[:alpha:]]{3})_.*([[:digit:]])$"), function(z) z[2:3]))

gadm_rds <- tibble::tibble(name = gadm_rds0[,1], levels = gadm_rds0[,2]) %>% 
  dplyr::slice(rep(row_number(), levels)) %>% dplyr::group_by(name) %>% 
  ## zero-based
  dplyr::mutate(level = row_number() - 1) %>% dplyr::ungroup() %>% dplyr::select(name, level) %>% as.matrix()

template <- file.path(dirname(gadm_src_url), "rds/%s_adm%s.rds")
gadm_rds_src_url <- apply(gadm_rds, 1, function(ab) sprintf(template, ab[1], ab[2]))

library(bowerbird)
gadm.rds <- bb_source(
  name="GADM maps and data in RDS format",
  id="gadm-maps-rdb",
  description="GADM provides maps and spatial data for all countries and their sub-divisions.",
  doc_url="http://www.gadm.org",
  citation="http://gadm.org/about.html",
  source_url= gadm_rds_src_url,
  license="http://gadm.org/license.html",
  method=list("bb_handler_wget",level=1, robots_off=TRUE),
 collection_size= 0.1,
  access_function = "base::readRDS",
  data_group="Administrative")

gadm <- bb_source(
  name="GADM maps and data in ESRI Geodatabase",
  id="gadm-maps-gdb",
  description="GADM provides maps and spatial data for all countries and their sub-divisions.",
  doc_url="http://www.gadm.org",
  citation="http://gadm.org/about.html",
  source_url=gadm_src_url,
  license="http://gadm.org/license.html",
  method=list("bb_handler_wget",recursive=TRUE,level=1, robots_off=TRUE),
  postprocess=list("bb_unzip"),
  collection_size= 1,
  access_function = "sf::read_sf",
  data_group="Administrative")


my_directory <- "~/bowerbird"
cf <- bb_config(local_file_root=my_directory)

cf <- bb_add(cf, gadm) %>% bb_add(gadm.rds)
status <- bb_sync(cf,verbose=TRUE)

from bowerbird.

raymondben commented on September 23, 2024

The gadm site and data format has changed, so for anyone visiting this issue now, an updated version of this might look like:

library(bowerbird)
gadm <- bb_source(
  name = "GADM maps and data in ESRI Geodatabase",
  id = "gadm-maps-gdb",
  description = "GADM provides maps and spatial data for all countries and their sub-divisions.",
  doc_url = "http://www.gadm.org",
  citation = "http://gadm.org/about.html",
  source_url = "https://gadm.org/download_world.html",
  license = "http://gadm.org/license.html",
  method = list("bb_handler_rget", level = 1, accept_download = "gpkg\\.zip$"),
  comment = "This will download the data as a single database as well as a version with six separate layers (one for each level of subdivision/aggregation). Adjust the 'accept_download' parameter if you only want one of these",
  postprocess = list("bb_unzip"),
  collection_size = 7.5,
  access_function = "sf::read_sf",
  data_group = "Administrative")


cf <- bb_config("~/temp/data/bbtest") %>% bb_add(gadm)

## don't use dry_run = TRUE if you are doing this for real!
bb_sync(cf, dry_run = TRUE, verbose = TRUE)

Which gives:

Thu Jul 26 17:10:50 2018
Synchronizing dataset: GADM maps and data in ESRI Geodatabase
Source URL https://gadm.org/download_world.html
--------------------------------------------------------------------------------------------

 this dataset path is: ~/temp/data/bbtest/gadm.org
 building file list ... done.
 visiting https://gadm.org/download_world.html ... 
  |====================================================================================================================================| 100%
No encoding supplied: defaulting to UTF-8.

 done.
 dry_run is TRUE, bb_rget is not downloading the following files:
 https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_gpkg.zip
 https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_levels_gpkg.zip

Thu Jul 26 17:10:52 2018 dataset synchronization complete: GADM maps and data in ESRI Geodatabase
# A tibble: 1 x 5
  name                                   id            source_url                           status files           
  <chr>                                  <chr>         <chr>                                <lgl>  <list>          
1 GADM maps and data in ESRI Geodatabase gadm-maps-gdb https://gadm.org/download_world.html TRUE   <tibble [2 × 3]>

And the files will be in ~/temp/data/bbtest/biogeo.ucdavis.edu/data/.

from bowerbird.

bb and GADM about bowerbird HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent