ropensci / yfr Goto Github PK

Repository for R package yf

Home Page: https://docs.ropensci.org/yfR

License: Other

R 44.29% HTML 55.71%

yfr's Introduction

Motivation

yfR facilitates importing stock prices from Yahoo finance, organizing the data in the tidy format and speeding up the process using a cache system and parallel computing. yfR is the second and backwards-incompatible version of BatchGetSymbols, released in 2016 (see vignette yfR and BatchGetSymbols for details).

In a nutshell, Yahoo Finance (YF) provides a vast repository of stock price data around the globe. It covers a significant number of markets and assets, being used extensively in academic research and teaching. In order to import the financial data from YF, all you need is a ticker (id of a stock, e.g. “GM” for General Motors) and a time period – first and last date.

The Data

The main function of the package, yfR::yf_get, returns a dataframe with the financial data. All price data is measured at the unit of the financial exchange. For example, price data for GM (NASDAQ/US) is measured in dollars, while price data for PETR3.SA (B3/BR) is measured in Reais (Brazilian currency).

The returned data contains the following columns:

ticker: The requested tickers (ids of stocks);

ref_date: The reference day (this can also be year/month/week when using argument freq_data);

price_open: The opening price of the day/period;

price_high: The highest price of the day/period;

price_close: The close/last price of the day/period;

volume: The financial volume of the day/period, in the unit of the exchange;

price_adjusted: The stock price adjusted for corporate events such as splits, dividends and others – this is usually what you want/need for studying stocks as it represents the real financial performance of stockholders;

ret_adjusted_prices: The arithmetic or log return (see input type_return) for the adjusted stock prices;

ret_adjusted_prices: The arithmetic or log return (see input type_return) for the closing stock prices;

cumret_adjusted_prices: The accumulated arithmetic/log return for the period (starts at 100%).

Finding tickers

The easiest way to find the tickers of a company stock is to search for it in Yahoo Finance’s website. At the top page you’ll find a search bar:

A company can have many different stocks traded at different markets (see picture above). As the example shows, Petrobras is traded at NYQ (New York Exchange), SAO (Sao Paulo/Brazil - B3 exchange) and BUE (Buenos Aires/Argentina Exchange), all with different symbols (tickers). For market indices, a list of tickers is available here.

Features of `yfR`

Fetches daily/weekly/monthly/annual stock prices/returns from yahoo finance and outputs a dataframe (tibble) in the long format (stacked data);
A new feature called collections facilitates download of multiple tickers from a particular market/index. You can, for example, download data for all stocks in the SP500 index with a simple call to yf_collection_get("SP500");
A session-persistent smart cache system is available by default. This means that the data is saved locally and only missing portions are downloaded, if needed.
All dates are compared to a benchmark ticker such as SP500 and, whenever an individual asset does not have a sufficient number of dates, the software drops it from the output. This means you can choose to ignore tickers with a high proportion of missing dates.
A customized function called yf_convert_to_wide() can transform the long dataframe into a wide format (tickers as columns), much used in portfolio optimization. The output is a list where each element is a different target variable (prices, returns, volumes).
Parallel computing with package furrr is available, speeding up the data importation process.

Warnings

Yahoo finance data is far from perfect or reliable, specially for individual stocks. In my experience, using it for research code with stock indices is fine and I can match it with other data sources. But, adjusted stock prices for individual assets is messy as stock events such as splits or dividends are not properly registered. I was never able to match it with other data sources, specially for long time periods with lots of corporate events. My advice is to never use the yahoo finance data of individual stocks in production (research papers or academic documents – thesis and dissertations). If adjusted price data of individual stocks is important for your research, use other data sources such as EOD, SimFin or Economática.

Installation

# CRAN (stable)
install.packages('yfR')

# Github (dev version)
devtools::install_github('ropensci/yfR')

# ropensci
install.packages("yfR", repos = "https://ropensci.r-universe.dev")

A simple example of usage

library(yfR)

# set options for algorithm
my_ticker <- 'META'
first_date <- Sys.Date() - 30
last_date <- Sys.Date()

# fetch data
df_yf <- yf_get(tickers = my_ticker, 
                     first_date = first_date,
                     last_date = last_date)
#> 
#> ── Running yfR for 1 stocks | 2023-01-17 --> 2023-02-16 (30 days) ──
#> 
#> ℹ Downloading data for benchmark ticker ^GSPC
#> ℹ (1/1) Fetching data for META
#> !    - not cached
#> ✔    - cache saved successfully
#> ✔    - got 22 valid rows (2023-01-17 --> 2023-02-15)
#> ✔    - got 100% of valid prices -- Time for some tea?
#> ℹ Binding price data
#> 
#> ── Diagnostics ─────────────────────────────────────────────────────────────────
#> ✔ Returned dataframe with 22 rows -- Youre doing good!
#> ℹ Using 6.3 kB at /tmp/RtmpvCnCwr/yf_cache for 2 cache files
#> ℹ Out of 1 requested tickers, you got 1 (100%)

# output is a tibble with data
head(df_yf)
#> # A tibble: 6 × 11
#>   ticker ref_date   price_open price_h…¹ price…² price…³ volume price…⁴ ret_ad…⁵
#>   <chr>  <date>          <dbl>     <dbl>   <dbl>   <dbl>  <dbl>   <dbl>    <dbl>
#> 1 META   2023-01-17       136.      137.    134.    135. 2.11e7    135. NA      
#> 2 META   2023-01-18       136.      137.    133.    133. 2.02e7    133. -1.73e-2
#> 3 META   2023-01-19       132.      137.    132.    136. 2.86e7    136.  2.35e-2
#> 4 META   2023-01-20       136.      140.    135.    139. 2.86e7    139.  2.37e-2
#> 5 META   2023-01-23       139.      144.    139.    143. 2.75e7    143.  2.80e-2
#> 6 META   2023-01-24       142.      145     141.    143. 2.20e7    143. -9.07e-4
#> # … with 2 more variables: ret_closing_prices <dbl>,
#> #   cumret_adjusted_prices <dbl>, and abbreviated variable names ¹price_high,
#> #   ²price_low, ³price_close, ⁴price_adjusted, ⁵ret_adjusted_prices

Acknowledgements

Package yfR is based on quantmod (@joshuaulrich) and uses one of its functions (quantmod::getSymbols) for fetching raw data from Yahoo Finance. As with any API, there is significant work in maintaining the code. Joshua was always fast and openminded in implemented required changes, and I’m very grateful for it.

yfr's People

Contributors

Stargazers

Watchers

Forkers

s3alfisc rpatel15-hue pastrosd jtrecenti bigcandybunny henrique1008 jc855

yfr's Issues

Cant find SP500 tickers with "-"

BF.B is BF-B in yahoo finance

yfR::yf_get("BF.B")

Price Data Discrepancies

Hi msperlin,

I have been running multiple tests using individual, multiple and collection "SP500" tickers with yfR and found random discrepancies when comparing price data to Yahoo Finance Website (Historical Price Data) and BatchGetSymbols price data.

This example was conducted on AAPL for Weekly price data for the last 3 weeks.

Yahoo  Finance Website
AAPL Weekly Historical Price Data

Date 		  Open	        High	Low	Close *Adj close**	Volume
May 09, 2022	  154.93	155.83	151.49	152.06	152.06		131,419,200
May 02, 2022	  156.71	166.48	153.27	157.28	157.05		566,859,300
Apr 25, 2022	  161.12	166.20	155.38	157.65	157.42		541,536,700

BatchGetSymbols 
AAPL Weekly Price Data
Date 		  Open	        High	Low     Close  *Adj close**	Volume
May 09, 2022	  154.93	155.83	151.49	152.06	152.06		131,419,200
May 02, 2022	  156.71	166.48	153.27	157.28	157.28		566,859,300
Apr 25, 2022	  161.12	166.20	155.38	157.65	157.42		541,536,700

yfR 
AAPL Weekly Price Data
Date 		  Open	        High	Low     Close  *Adj close**	Volume
May 09, 2022	  154.93	155.83	151.49	152.06	152.06		131,419,200
May 02, 2022	  **156.01**	166.48	153.27	157.28	157.28		566,859,300
Apr 25, 2022	  **161.84**	166.20	155.38	157.65	157.42		541,536,700

These errors have also been discovered when using "freq_data" daily, weekly and monthly across multiple tickers and in this example alone appears to be an issue with the Open price. When I carried this test on all SP500 tickers approx 100 tickers has issues with the price data.

Please note in this example I have cross-checked AAPL weekly price data with stock charting software. The Yahoo Finance website and BatchGetSymbols is correct. yfR is incorrect.

Here is the R script I used for yfR:

library(yfR)
 
 tickers <- c("AAPL")
 
first_date <- Sys.Date() - 70
last_date <- Sys.Date()

 df_yf <- yf_get(
     tickers = tickers,
     first_date = first_date,
     last_date = last_date,
     freq_data = "weekly",
     bench_ticker = "^GSPC",
     how_to_aggregate = "last",
     do_complete_data = FALSE,
     thresh_bad_data = 0.75,
     do_cache = TRUE)

Kind regards,
Ron

`'all_of' is not an exported object` if tidyr version < 1.2.0

Per this SO thread, yf_convert_to_wide() throws an error for users running older versions of tidyr. I suspect this is because the function source calls tidyr::all_of(), but all_of() originates in the tidyselect package and wasn't re-exported by tidyr until v1.2.0. I would consider either changing tidyr::all_of() to tidyselect::all_of(), or adding tidyr (>= 1.2.0) to the DESCRIPTION file.

error when downloading 1 trading day

Reproducible example with error:

stocks <- yfR::yf_get(
  "META", 
  first_date = Sys.Date(),
  last_date = Sys.Date() + 1 ,
                  
)

Error in download

I'm getting an error in downloads when running the sample code.

what am I doing wrong?

Backward compatibility with R 4.02

@msperlin

Thank you for creating this package! I've read your blog post about it and I have used your other package BatchGetSymbols for a while. When I tried to install yfR:: via Github, I get an error message about yfR:: requires R 4.1 or greater. Is there any chance that this package can be made backward compatible to R 4.0.2. The reason being that's the most recent version used in Microsoft R Open, which I suspect is still a reasonably popular R distribution that others are using for its MKL acceleration capabilities.

Thanks for considering the request!

Alert users of yf limits

make sure users know that their download failed due to yf limits..

Suggestion: remove/change curl::has_internet() from yfR

Hi! Thanks for this awesome package. We use it a lot in our classes at Insper.

We are facing some issues running yfR::yf_get() because curl::has_internet() returns FALSE, even though we have a stable internet connection here. Probably it is some proxy issue that we do not control.

A suggestion would be to remove this check (if the user does not have internet connection, the download will fail anyway) or add a check_internet= parameter having TRUE as default.

What do you think? Thanks!

fatal error when one of the symbols returns only one row

hello, I recently changed my code to use this package instead of BatchGetSymbols but the behavior changed, on BatchGetSymbols if one of the symbols don't return enough data it will be NA on the result table, on yf_get I get this fatal error below, and none of the other symbols data is returned, is there any way to use yf_get and still get the results of the other symbols ?
I tried setting thresh_bad_data=0 and do_complete_data=TRUE but that still results in a fatal error.

Error in `.f()`:
! Returned dataframe for DNAI11.SA between 2022-09-30 and 2022-12-02 has only ONE row.  Perhaps you should check
  your inputs dates?

and this is the function call:

	dd = yf_get(tickers=paste0(x$ticker,".SA"),first_date=d0, last_date=d+1, 
do_cache=T, bench_ticker="^BVSP", thresh_bad_data=0, do_complete_data=TRUE, freq_data='daily')

thanks.

Market Cap + Sector?

Hi, kudos to the great work.

Is it possible for the yf_get() function to also return market cap and sector?

yf api limits?

Yf is now restricting access to the data. Set up a way to let user know about restriction?

Coming Back to the Downloading datas Issue

Hello, first i want to say that i really do appreciate your work.
I allready saw all the topics about this issue.
But there is no possibilities to extend the download limit?
like it was befor with getbatchsymbol?
It was so convenient working with big data but now impossible to work.
Waiting for your answer, thanks in advance

Unstated dependency version?

I installed the package and tried to run the examples in yf-vignette.Rmd but got some errors:

> library(yfR)
> # set options for algorithm
> my_ticker <- 'FB'
> first_date <- Sys.Date() - 30
> last_date <- Sys.Date()
> # fetch data
> df_yf <- yf_get(tickers = my_ticker, 
+                      first_date = first_date,
+                      last_date = last_date)

── Running yfR for 1 stocks | 2022-05-05 --> 2022-06-04 (30 days) ──

ℹ Downloading data for benchmark ticker ^GSPC
ℹ (1/1) Fetching data for FB
! 	- not cached
✖ 	- error in download..
ℹ Binding price data
Error in `vec_slice()`:
! `x` must be a vector, not NULL.
Run `rlang::last_error()` to see where the error occurred.
Warning messages:
1: Unknown or uninitialised column: `ticker`. 
2: Unknown or uninitialised column: `threshold_decision`. 
3: Unknown or uninitialised column: `ticker`. 
4: Unknown or uninitialised column: `price_adjusted`. 
> rlang::last_error()
<error/vctrs_error_scalar_type>
Error in `vec_slice()`:
! `x` must be a vector, not NULL.
---
Backtrace:
 1. yfR::yf_get(tickers = my_ticker, first_date = first_date, last_date = last_date)
 2. yfR:::calc_ret(...)
 3. dplyr::lag(P)
 5. vctrs::vec_slice(inputs$x, seq_len(xlen - n))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/vctrs_error_scalar_type>
Error in `vec_slice()`:
! `x` must be a vector, not NULL.
---
Backtrace:
    ▆
 1. ├─yfR::yf_get(tickers = my_ticker, first_date = first_date, last_date = last_date)
 2. │ └─yfR:::calc_ret(...)
 3. │   └─dplyr::lag(P)
 4. │     ├─vctrs::vec_c(...)
 5. │     └─vctrs::vec_slice(inputs$x, seq_len(xlen - n))
 6. └─vctrs:::stop_scalar_type(`<fn>`(NULL), "x", `<env>`)
 7.   └─vctrs:::stop_vctrs(...)
 8.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))
> session_info()
Error in session_info() : could not find function "session_info"
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] yfR_0.0.3

loaded via a namespace (and not attached):
 [1] pillar_1.7.0     compiler_4.1.2   tools_4.1.2      xts_0.12.1       digest_0.6.29    evaluate_0.14    lifecycle_1.0.1  tibble_3.1.6    
 [9] lattice_0.20-45  pkgconfig_2.0.3  rlang_1.0.2      cli_3.3.0        DBI_1.1.1        rstudioapi_0.13  curl_4.3.1       yaml_2.2.2      
[17] xfun_0.29        fastmap_1.1.0    dplyr_1.0.9      stringr_1.4.0    knitr_1.37       generics_0.1.2   vctrs_0.4.1      grid_4.1.2      
[25] tidyselect_1.1.1 glue_1.6.1       R6_2.5.0.9000    fansi_0.5.0      rmarkdown_2.11   purrr_0.3.4      TTR_0.24.3       magrittr_2.0.1  
[33] ellipsis_0.3.2   htmltools_0.5.2  assertthat_0.2.1 quantmod_0.4.18  utf8_1.2.2       stringi_1.7.6    crayon_1.4.1     zoo_1.8-9

This may be due to package versioning issues; I hadn't updated any packages in a long time and after updating all of my packages, I successfully ran the examples. Here's my sessionInfo() from after my update:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] yfR_0.0.3

loaded via a namespace (and not attached):
 [1] pillar_1.7.0     compiler_4.1.2   tools_4.1.2      xts_0.12.1       lifecycle_1.0.1  tibble_3.1.7     lattice_0.20-45  pkgconfig_2.0.3 
 [9] rlang_1.0.2      DBI_1.1.2        cli_3.3.0        rstudioapi_0.13  curl_4.3.2       dplyr_1.0.9      stringr_1.4.0    generics_0.1.2  
[17] vctrs_0.4.1      hms_1.1.1        grid_4.1.2       tidyselect_1.1.2 glue_1.6.2       R6_2.5.1         fansi_1.0.3      purrr_0.3.4     
[25] readr_2.1.2      tzdb_0.3.0       TTR_0.24.3       magrittr_2.0.3   scales_1.2.0     ellipsis_0.3.2   assertthat_0.2.1 quantmod_0.4.20 
[33] colorspace_2.0-3 utf8_1.2.2       stringi_1.7.6    munsell_0.5.0    crayon_1.5.1     zoo_1.8-10

So there may be a package in there which you need to add a minimum version for in your DESCRIPTION.

Error with get_collection_data()

df_sp500 <- yfR::yf_get_collection(collection = 'SP500', first_date = '2020-01-01', last_date = Sys.Date())

issue with log returns

Boa tarde Marcelo, parabéns pelo pacote yfR.
Identifiquei 2 problemas, relacionados ao cálculo da coluna cumret_adjusted_prices.

No caso de log returns teria que ajustar a função (1+cumsum( x) por cumsum(1+x)):

calc_cum <- function(x, type_return) {
this_cum_ret <- switch(type_return, arit = cumprod(1 + x), log = 1+cumsum( x))
return(this_cum_ret)
}

Mesmo feito dessa forma pode acabar com um acumulado negativo (ver ao final exemplo reproduzível)

Pessoalmente, dado que a ideia é mostrar o retorno acumulado de 1 real, seria melhor calcular um índice a partir das cotações ajustadas (x = price_adjusted), válido para ambos tipos de retornos:
calc_cum1 <- function(x) {
this_cum_ret <- x/x[1]
return(this_cum_ret)
}

Quando a lista de símbolos não está ordenada, como você usa split na funação calc_cum_ret, o resultado é ordenado alfabeticamente, organizando os dados de forma diferente, que não necessariamente corresponde ao original. Para superar isso sem precisar maiores alterações sugiro incluir

tickers <- sort(tickers)
ao início da função yf_get

Abraço

StartDat = as.Date("2015-01-01") [#inicio](https://www.linkedin.com/feed/hashtag/?keywords=%23inicio)
EndDat = as.Date("2022-03-25") [#Fim](https://www.linkedin.com/feed/hashtag/?keywords=%23Fim) data

symbols<-c("^BVSP","PETR4.SA","VALE3.SA","LREN3.SA","MGLU3.SA") # s?imbolos no Yahoo
symbols <- sort(symbols)
[#Download](https://www.linkedin.com/feed/hashtag/?keywords=%23Download) dados

df_yf <- yf_get(tickers = symbols,
first_date = StartDat,
last_date = EndDat,
type_return = "log") # arit, log


# correct cum for log
ret=df_yf$ret_adjusted_prices
tickers = df_yf$ticker [#df_yf](https://www.linkedin.com/feed/hashtag/?keywords=%23df_yf)$ticker
unique(tickers)
type_return = "log"

idx <- is.na(ret)
ret[idx] <- 0
l_ret <- split(ret, tickers)
l_cot <- split(df_yf$price_adjusted, tickers)
calc_cum <- function(x, type_return) {
this_cum_ret <- switch(type_return, arit = cumprod(1 + x), log = 1+cumsum( x))
return(this_cum_ret)
}
calc_cum1 <- function(x) {
this_cum_ret <- x/x[1]
return(this_cum_ret)
}
l_cum_ret <- purrr::map(l_ret, calc_cum, type_return = type_return)
l_cum_ret1 <- purrr::map(l_cot, calc_cum1)
names(l_ret)
names(l_cum_ret)
cum_ret <- do.call(c, l_cum_ret)
cum_ret1 <- do.call(c, l_cum_ret1)
names(cum_ret) <- NULL
names(cum_ret1) <- NULL
df_yf$cumret_adjusted_prices = cum_ret
df_yf$cumret_adjusted_prices1 = cum_ret1

plot(df_yf$price_adjusted[df_yf$ticker=="MGLU3.SA"],log = "y")


library(ggplot2)

p <- ggplot(df_yf,
aes(x = ref_date,
y = cumret_adjusted_prices1,
color = ticker)) +
geom_line() +
labs(
title = paste0("Index Value (",
year(min(df_yf$ref_date)), ' - ',
year(max(df_yf$ref_date)), ")"
),
x = "Time",
y = "Accumulated Return (from 100%)",
caption = "Data from Yahoo Finance <https://finance.yahoo.com//>") +
theme_light() + scale_y_log10()

p