simonpcouch / anyflights Goto Github PK
View Code? Open in Web Editor NEWAn R package to generate `nycflights13`-like air travel dataπ©οΈ
Home Page: https://simonpcouch.github.io/anyflights/
An R package to generate `nycflights13`-like air travel dataπ©οΈ
Home Page: https://simonpcouch.github.io/anyflights/
anyflights(station = "PDX", year = 2015, dir = tempdir())
Error in FUN(X[[i]], ...) :
Can't access flight data for supplied year. Check date of 'Latest Available Data' for 'Airline On-Time Performance Data' on
https://www.transtats.bts.gov/releaseinfo.asp
When I run most of the get_ functions I get errors similar to this one.
Love that this is back on CRAN! Congrats on getting it up there again, Simon!
Given it does take some time to download data, could some messages be added letting users know of where they are in the process? Something like
Downloading flights data for SEA
Hi, I have tried to run the example code from this package's main github page, using both the CRAN and developer version of 'anyflights' and I keep running into this issue where the code:
library(anyflights)
pdxflights19<-get_flights("PDX", 2019,6)
Results in this:
pdxflights19 <- anyflights("PDX", 2019, 6)
Total Time Elapsed
Finished Processing Arguments 1s 0s
Downloaded Flights Data for June 35s
Finished Downloading Flights Data 36s
Downloading Airlines... Error in open.connection(3L, "rb") : HTTP error 500.
In addition: Warning message:
One or more parsing issues, seeproblems()
for details
It seems like the flight data downloads fine but the airline data doesn't. I also ran the "get_flights" then the "get_airlines" function which resulted in this :
pdxflights19 <- anyflights("PDX", 2019)
pdxflights19<-get_flights("PDX", 2019,6)
Total Time Elapsed
Finished Processing Arguments 6s 0s
Downloaded Flights Data for June 44s
Finished Processing Flights Data 46s
All Done! Warning messages:
1: In for (i in seq_along(specs)) { :
closing unused connection 4 (http://www.transtats.bts.gov/Download_Lookup.asp?Lookup=L_UNIQUE_CARRIERS)
2: In mget(objectNames, envir = ns, inherits = TRUE) :
restarting interrupted promise evaluation
3: One or more parsing issues, seeproblems()
for details
pdxairlines19<-get_airlines(flights_data =pdxflights19)
pdxairlines19<-get_airlines(flights_data =pdxflights19)
Error in open.connection(4L, "rb") : HTTP error 500.
Not sure if this a problem on the user end or the server end? I have tried doing this on other wifi connection as well.
From Hannes Becker on Twitter:
Another problem I ran into with my scrape case (I realize, I'm not using the package as it's supposed to, so it's more of an FYI): The raw weather data is quite big (>100k obs per airport), so if I first download all airports (weather_raw) and then do the munging it breaks. The solution might simply be to move the '# tidy the data part' (line 76 ff in get_weather) to the get_weather_for_station function. But once again, this is just an FYI. I should probably make my I branch and should be more helpful, but my brain's not awake enough for git rn. :)
pdxflights19 <- anyflights("PDX", 2019, 6)
Total Time Elapsed
Finished Processing Arguments 1s
Downloaded Flights Data for June 44s
Finished Downloading Flights Data 52s
Finished Downloading Airlines Data 53s
Downloading Planes... Error in utils::unzip(planes_tmp, exdir = planes_lcl, junkpaths = TRUE) :
cannot open file 'C:/Users/tbats/AppData/Local/Temp/RtmpGs7bI0/planes/MASTER.txt': Invalid argument
Master.txt is in that directory, but I don't know what the problem is.
- Session info ------------------------------------------------------------------------------------------
setting value
version R version 3.6.2 (2019-12-12)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/New_York
date 2020-08-29
- Packages ----------------------------------------------------------------------------------------------
package * version date lib source
anyflights * 0.3.0 2020-08-10 [1] CRAN (R 3.6.3)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
backports 1.1.6 2020-04-05 [1] CRAN (R 3.6.3)
bit 1.1-15.2 2020-02-10 [1] CRAN (R 3.6.2)
bit64 0.9-7 2017-05-08 [1] CRAN (R 3.6.0)
broom 0.5.5 2020-02-29 [1] CRAN (R 3.6.3)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.1)
cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.3)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
curl 4.3 2019-12-02 [1] CRAN (R 3.6.2)
DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2)
dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.1)
dplyr * 1.0.1 2020-07-31 [1] CRAN (R 3.6.3)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
forcats * 0.5.0 2020-03-01 [1] CRAN (R 3.6.3)
fs 1.4.1 2020-04-04 [1] CRAN (R 3.6.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1)
ggplot2 * 3.3.0.9000 2020-04-04 [1] Github (tidyverse/ggplot2@bca6105)
glue 1.4.1 2020-05-13 [1] CRAN (R 3.6.3)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1)
haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.2)
hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.2)
httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1)
jsonlite 1.7.0 2020-06-25 [1] CRAN (R 3.6.3)
lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3)
lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
modelr 0.1.6 2020-02-22 [1] CRAN (R 3.6.3)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1)
nlme 3.1-145 2020-03-04 [1] CRAN (R 3.6.3)
pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
progress 1.2.2 2019-05-16 [1] CRAN (R 3.6.1)
purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.2)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2)
Rcpp 1.0.5 2020-07-06 [1] CRAN (R 3.6.2)
readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.1)
readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.1)
reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.1)
rlang 0.4.7 2020-07-09 [1] CRAN (R 3.6.3)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.3)
rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.2)
scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.2)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
tibble * 3.0.0 2020-03-30 [1] CRAN (R 3.6.2)
tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3)
tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.2)
vctrs 0.3.2 2020-07-15 [1] CRAN (R 3.6.3)
vroom 1.3.1 2020-08-27 [1] CRAN (R 3.6.2)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
xml2 1.2.5 2020-03-11 [1] CRAN (R 3.6.3)
[1] C:/Users/tbats/Documents/R/win-library/3.6
[2] C:/Program Files/R/R-3.6.2/library
The package in the unit tests currently generates the following with devtools::check()
:
Duration: 11.7s
> checking Rd cross-references ... WARNING
Missing link or links in documentation object 'flights.Rd':
βget_airlinesβ βget_airportsβ
See section 'Cross-references' in the 'Writing R Extensions' manual.
> checking for code/documentation mismatches ... WARNING
Data codoc mismatches from documentation object 'weather':
Variables in data frame 'weather'
Code: day dewp hour humid month origin precip pressure temp time_hour
visib wind_dir wind_gust wind_speed year
Docs: humid origin precip pressure temp, dewp time_hour visib
wind_dir, wind_speed, wind_gust year, month, day, hour
As well as these messages:
Warning: The existing 'airlines.Rd' file was not generated by roxygen2, and will not be overwritten.
Warning: The existing 'airports.Rd' file was not generated by roxygen2, and will not be overwritten.
Warning: The existing 'flights.Rd' file was not generated by roxygen2, and will not be overwritten.
Warning: The existing 'planes.Rd' file was not generated by roxygen2, and will not be overwritten.
Warning: The existing 'weather.Rd' file was not generated by roxygen2, and will not be overwritten.
Ideally, the .Rd files should be generated from roxygen2 documentation in .R files rather than the other way around. This might fix the WARNINGs as a side effect.π§
Hello,
When I try to download all of the flights for January 2020 originating from LAX, I get a timeout error:
library(anyflights)
# get all flights in January 2020 with origin LAX
la2020 <- anyflights("LAX", 2020, 1)
#> Total Time Elapsed
#> Finished Processing Arguments 1s
#> Warning in utils::download.file(fl_url, flight_temp, quiet = TRUE): downloaded
#> length 24250200 != reported length 30636034
#> Warning in utils::download.file(fl_url, flight_temp,
#> quiet = TRUE): URL 'https://transtats.bts.gov/PREZIP/
#> On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2020_1.zip': Timeout
#> of 60 seconds was reached
#> Error in utils::download.file(fl_url, flight_temp, quiet = TRUE): download from 'https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2020_1.zip' failed
Created on 2021-02-12 by the reprex package (v0.3.0)
devtools::session_info()
#> β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 4.0.3 (2020-10-10)
#> os macOS Catalina 10.15.7
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Los_Angeles
#> date 2021-02-12
#>
#> β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date lib source
#> anyflights * 0.3.1 2021-02-12 [1] Github (simonpcouch/anyflights@17c581f)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> cachem 1.0.1 2021-01-21 [1] CRAN (R 4.0.2)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2)
#> cli 2.3.0 2021-01-31 [1] CRAN (R 4.0.2)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
#> curl 4.3 2019-12-02 [1] CRAN (R 4.0.1)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> dplyr 1.0.4 2021-02-02 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.2)
#> htmltools 0.5.1 2021-01-12 [1] CRAN (R 4.0.2)
#> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
#> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
#> memoise 2.0.0 2021-01-28 [1] Github (hadley/memoise@a2187e6)
#> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.2)
#> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.2)
#> progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.2)
#> ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> readr 1.4.0 2020-10-05 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.2)
#> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 3.0.1 2020-12-17 [1] CRAN (R 4.0.2)
#> tibble 3.0.6 2021-01-29 [1] CRAN (R 4.0.2)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
#> usethis 2.0.1 2021-02-10 [1] CRAN (R 4.0.2)
#> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.2)
#> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.3)
#> xfun 0.21 2021-02-10 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
It was able to track down the issue to be caused by the default R option timeout
is set to 60 seconds. I found this stack overflow answer that shows how to change this option. I propose two possible ways to help users avoid this issue in the future:
"If you are repeatedly getting a timeout error, try extending the timeout period for your R session using:"
options(timeout = timeout_value_in_seconds)
.onAttach()
function.: This might be less ideal because it is changing a users' session, but if a package start-up message as done with the rpushbullet package I think this is another viable option.Thanks for making this package, and if you'd like, I can submit a PR with one of the two changes.
I'm able to download the FAA's releasable aircraft data in a couple seconds from my browser, but see timeout errors whenever I try to download via download.file()
:
out_file <- tempfile()
out <-
utils::download.file(
"https://registry.faa.gov/database/yearly/ReleasableAircraft.2022.zip",
out_file
)
#> Warning in
#> utils::download.file("https://registry.faa.gov/database/yearly/ReleasableAircraft.2022.zip",
#> : URL 'https://registry.faa.gov/database/yearly/ReleasableAircraft.2022.zip':
#> Timeout of 60 seconds was reached
#> Error in utils::download.file("https://registry.faa.gov/database/yearly/ReleasableAircraft.2022.zip", : cannot open URL 'https://registry.faa.gov/database/yearly/ReleasableAircraft.2022.zip'
Created on 2023-09-07 with reprex v2.0.2
I admit I haven't looked into the code of this package much, but I was curious if there is a way to download just a few rows of the data instead of all of the data. As a use case, I'm trying to explore all flights leaving usually commercially designated airports in Oregon, but I don't really want to try out lots of options here and devote the time needed for all the downloads. I'd like just a head()
of what's available.
Whoa, what a slick package! It's come a long ways since I last checked in.
As I was creating some data for a new assignment, I ran into this issue.
library(anyflights)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
mry_flights <- anyflights("MRY", 2019, 12)
#> Total Time Elapsed
#> Finished Processing Arguments 0s
#> Downloaded Flights Data for December 52s
#> Finished Downloading Flights Data 54s
#> Finished Downloading Airlines Data 56s
#> Finished Downloading Planes Data 73s
#> Finished Downloading Airports Data 74s
#> Finished Downloading Weather Data 76s
mry_flights %>%
as_flights_package("mryflights")
#> Error in loadNamespace(name): there is no package called 'nycflights13'
Created on 2020-10-09 by the reprex package (v0.3.0)
The issue is easily solved by install.packages("nycflights13")
, but I figure we should try to make that a bit smoother. What is the current best practice for dealing with this sort of dependence?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.