Comments (13)
You're not crazy. I'm sorry, it did. When NCEI changed the formats of the data that they served I had to update GSODR a few months ago and reformat_GSOD()
uses the same code behind the scenes that get_GSOD()
uses. So it no longer supports the .op.gz format because NCEI is serving .gz files of .csv files.
from gsodr.
My case is exactly in between. I pull 10 years worth of data for every station within a 25 km radius of every major city in the world.
What I do instead is download full year archives, then delete the files of the stations that I don't need, and then process the remaining station files with reformat_GSOD
Here is the function that does it all for me:
get_weather <- function(yrs = seq(year(today()) - 11, year(today())),
stns = stations_v) {
for (yr in yrs) {
file <- paste0(yr, '.tar.gz')
destfile <- paste0('data/gsod/', file)
if (!file.exists(destfile)) {
link <- paste0('https://www.ncei.noaa.gov/data/global-summary-of-the-day/archive/', file)
curl::curl_download(link, destfile)
}
untar(destfile, exdir = paste(tempdir(), yr, sep = "/"))
}
# Go through all unpacked files, decide what to remove and what to keep
# based on the stations of interest
files_all <- list.files(path = tempdir(), pattern = "^.*\\.csv$", recursive = TRUE, full.names = FALSE)
# Get a cartesian join of all stations of interest and all years.
files_stations <-
purrr::cross(list(x1 = paste0(yrs, "/"), x2 = paste0(stns, ".csv"))) %>%
purrr::map(purrr::lift(paste0)) %>%
as_vector()
files_keep <- subset(files_all, files_all %in% files_stations)
# Transform weather data ----------------------------------------------------
out <- GSODR::reformat_GSOD(file_list = paste(tempdir(), files_keep, sep = "/"))
unlink(tempdir(), force = TRUE, recursive = TRUE)
out
}
from gsodr.
Thank you for letting me know about the bottleneck. You OK if I add you as a contributor to the package for the ideas/input?
from gsodr.
I figured as much after poking around this repo. Thanks. I repointed my code to load data from the other source and not from ftp, but ran into a different issue, and now I'm just trying to simply use get_GSOD()
for consistency purposes (my previous workflow avoided get_GSOD()
and downloaded full year archives instead, as get_GSOD()
was erroring out and I couldn't figure out why)
from gsodr.
I remember my problem with get_GSOD()
now. It is very long to run for many stations (global scale) for many years (e.g. 10). I found that curl_download()
of the entire year archive of all the data is a much faster route.
In the old process, when I downloaded data from ftp, each .op.gz
file was titled as stationid-year.op.gz
. In the current source that I assume you point to, the .csv files are named after a station, no year. I now just need to figure out the renaming process so that files from different years don't overwrite each other.
get_GSOD()
would be a much more straight-forward way to pull the data, it's a shame it doesn't work well for me.
from gsodr.
Ah, interesting. It's faster to download the entire .gz and sort it out after? How many stations are you fetching at once?
I hadn't considered this case. I though either download all or just a few selected not many selected.
from gsodr.
Here is the repo, the code snippet above is from functions.R
https://github.com/taraskaduk/weather
from gsodr.
Awesome, thanks! I'll have a look and see if I can improve the package.
from gsodr.
Sweet! Let me know if I can contribute in any way.
from gsodr.
I've updated the internal functionality to check how many requests are being made. If the number of stations is greater than 10, GSODR will download the entire global annual file and sort out the needed files locally. If there are less than 10 stations, then it will download each requested station individually.
I did a few tests to check how many individual requests were faster vs downloading the whole. The number is not exact due to things I can't control (Internet), but this should help in most cases to make it faster to request large numbers of stations that are not ALL stations or just a few.
Line 155 in b1b6a5d
from gsodr.
Sweet! I'll make sure to chuck my extra piece of code to pull full archives for my analysis on my next update.
from gsodr.
I would be honored! To be honest, I'd do a pull request on this item (and a couple of others), but I've never done a PR before, and was afraid to mess things up 🤷♂️
from gsodr.
from gsodr.
Related Issues (20)
- GSODR package no longer working HOT 8
- Unexpected `NA`s in longitude and latitude using `reformat_GSOD` HOT 4
- Release GSODR 3.1.8
- Replace {httr} HOT 1
- Fix codecov badge HOT 1
- Release GSODR 3.1.9
- Release GSODR 3.1.10
- Release GSODR 4.0.0
- Batch Processing HOT 2
- Improve test coverage
- reformat_GSOD() does not use file_list input parameter HOT 2
- About `update-tic.yml` HOT 12
- Country list problem HOT 1
- reformat_GSOD() runs infinitely with 0 activity HOT 8
- ISO Country Codes and Names are not in final output HOT 1
- Release GSODR 3.1.0
- Hail flag no working HOT 2
- Release GSODR 3.1.2
- Release GSODR 3.1.4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gsodr.