Comments (5)
That's a bit of an edge case because the data files live on a different server to the web pages. The wget --span-hosts
option helps a bit, by allowing the recursion to cross onto a different domain, but it won't solve everything in this case. I think there is another solution though, stand by ...
from bowerbird.
- For the zip file, what about:
library(rvest)
links <- read_html("http://gadm.org/download_world.html") %>% html_nodes("a")
## find links pointing to gadmNNN.gdb.zip files, take the highest number
src_url <- head(sort(Filter(function(z) grepl("gadm[[:digit:]]+\\.gdb\\.zip", z), sapply(links, html_attr, "href")), decreasing=TRUE), 1)
and use source_url=src_url
in your existing bb_source def. I don't think there's a "pure" bowerbird solution to that one.
- What's the issue with RDS levels? You want
*_adm0.rds
,*_adm1.rds
, etc?
Maybe:
x <- read_html("http://gadm.org/download_country.html")
## find all non-empty options that are part of the countrySelect element
links <- Filter(nzchar, sapply(x %>% html_node("#countrySelect") %>% html_nodes("option"), html_attr, "value"))
do.call(rbind, lapply(str_match_all(links, "^([[:alpha:]]{3})_.*([[:digit:]])$"), function(z) z[2:3]))
will give you all the countries and how many levels they have, then you construct the appropriate URLs from that?
from bowerbird.
Ah, ok - that's fine - thanks!
from bowerbird.
Just for the record, here's the final set up to get all the country-files for all levels, as well as the master GDB (a bit over 2Gb in total)
library(rvest)
links <- read_html("http://gadm.org/download_world.html") %>% html_nodes("a")
## find links pointing to gadmNNN.gdb.zip files, take the highest number
gadm_src_url <- head(sort(Filter(function(z) grepl("gadm[[:digit:]]+\\.gdb\\.zip", z), sapply(links, html_attr, "href")), decreasing=TRUE), 1)
x <- read_html("http://gadm.org/download_country.html")
## find all non-empty options that are part of the countrySelect element
links <- Filter(nzchar, sapply(x %>% html_node("#countrySelect") %>% html_nodes("option"), html_attr, "value"))
gadm_rds0 <- do.call(rbind, lapply(stringr::str_match_all(links, "^([[:alpha:]]{3})_.*([[:digit:]])$"), function(z) z[2:3]))
gadm_rds <- tibble::tibble(name = gadm_rds0[,1], levels = gadm_rds0[,2]) %>%
dplyr::slice(rep(row_number(), levels)) %>% dplyr::group_by(name) %>%
## zero-based
dplyr::mutate(level = row_number() - 1) %>% dplyr::ungroup() %>% dplyr::select(name, level) %>% as.matrix()
template <- file.path(dirname(gadm_src_url), "rds/%s_adm%s.rds")
gadm_rds_src_url <- apply(gadm_rds, 1, function(ab) sprintf(template, ab[1], ab[2]))
library(bowerbird)
gadm.rds <- bb_source(
name="GADM maps and data in RDS format",
id="gadm-maps-rdb",
description="GADM provides maps and spatial data for all countries and their sub-divisions.",
doc_url="http://www.gadm.org",
citation="http://gadm.org/about.html",
source_url= gadm_rds_src_url,
license="http://gadm.org/license.html",
method=list("bb_handler_wget",level=1, robots_off=TRUE),
collection_size= 0.1,
access_function = "base::readRDS",
data_group="Administrative")
gadm <- bb_source(
name="GADM maps and data in ESRI Geodatabase",
id="gadm-maps-gdb",
description="GADM provides maps and spatial data for all countries and their sub-divisions.",
doc_url="http://www.gadm.org",
citation="http://gadm.org/about.html",
source_url=gadm_src_url,
license="http://gadm.org/license.html",
method=list("bb_handler_wget",recursive=TRUE,level=1, robots_off=TRUE),
postprocess=list("bb_unzip"),
collection_size= 1,
access_function = "sf::read_sf",
data_group="Administrative")
my_directory <- "~/bowerbird"
cf <- bb_config(local_file_root=my_directory)
cf <- bb_add(cf, gadm) %>% bb_add(gadm.rds)
status <- bb_sync(cf,verbose=TRUE)
from bowerbird.
The gadm site and data format has changed, so for anyone visiting this issue now, an updated version of this might look like:
library(bowerbird)
gadm <- bb_source(
name = "GADM maps and data in ESRI Geodatabase",
id = "gadm-maps-gdb",
description = "GADM provides maps and spatial data for all countries and their sub-divisions.",
doc_url = "http://www.gadm.org",
citation = "http://gadm.org/about.html",
source_url = "https://gadm.org/download_world.html",
license = "http://gadm.org/license.html",
method = list("bb_handler_rget", level = 1, accept_download = "gpkg\\.zip$"),
comment = "This will download the data as a single database as well as a version with six separate layers (one for each level of subdivision/aggregation). Adjust the 'accept_download' parameter if you only want one of these",
postprocess = list("bb_unzip"),
collection_size = 7.5,
access_function = "sf::read_sf",
data_group = "Administrative")
cf <- bb_config("~/temp/data/bbtest") %>% bb_add(gadm)
## don't use dry_run = TRUE if you are doing this for real!
bb_sync(cf, dry_run = TRUE, verbose = TRUE)
Which gives:
Thu Jul 26 17:10:50 2018
Synchronizing dataset: GADM maps and data in ESRI Geodatabase
Source URL https://gadm.org/download_world.html
--------------------------------------------------------------------------------------------
this dataset path is: ~/temp/data/bbtest/gadm.org
building file list ... done.
visiting https://gadm.org/download_world.html ...
|====================================================================================================================================| 100%
No encoding supplied: defaulting to UTF-8.
done.
dry_run is TRUE, bb_rget is not downloading the following files:
https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_gpkg.zip
https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_levels_gpkg.zip
Thu Jul 26 17:10:52 2018 dataset synchronization complete: GADM maps and data in ESRI Geodatabase
# A tibble: 1 x 5
name id source_url status files
<chr> <chr> <chr> <lgl> <list>
1 GADM maps and data in ESRI Geodatabase gadm-maps-gdb https://gadm.org/download_world.html TRUE <tibble [2 × 3]>
And the files will be in ~/temp/data/bbtest/biogeo.ucdavis.edu/data/
.
from bowerbird.
Related Issues (20)
- Mike's postrev notes HOT 1
- Add guidance on writing new handler functions
- possible issue, found on RStudio cloud HOT 1
- structure of list-col for method HOT 6
- US building footprints HOT 3
- Interrupting a download does not necessarily delete the partially-downloaded file
- rewrite rget to use curl directly, not httr HOT 1
- Resolution available for satellite data
- reusing curl handle has odd behaviour
- Unexpected timeout error in syncing from password-protected ftp server? HOT 2
- Oceandata downloader broken HOT 1
- Consider an extensible metadata model? HOT 4
- Support for 'local' or unpublished data? HOT 1
- Use R.utils in place of archive? HOT 3
- how to add datasource with source_url without filename in url HOT 2
- Adjustments to zenodo handler
- File counter HOT 1
- CRAN
- getting OISST HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bowerbird.