Giter VIP home page Giter VIP logo

pinkfish's People

Contributors

commongeek avatar dependabot[bot] avatar ericbrown avatar fja05680 avatar jiuguangw avatar simongarisch avatar tombohub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pinkfish's Issues

adjust_percents() uses the closing price to determine total_funds.

Under the hood, total_funds = self._total_funds(row) uses the close price to determine total funds. This impacts the value going into self._adjust_value (total_funds (at close) * weight). But the shares are then purchased / sold at the price passed in. I think this means that if open prices were passed in the shares would sell at close price, but then be bought at the open price. I suspect that prices shouldn't be passed in at all. The field should be passed and that should be used to re-balance the portfolio.

Question

Hello,

I was wondering, say you priovided your own minute OHLC data. Would it still be possible to back test strategies, or is this library exclusively written for daily OHLC?

Best,

Dans.

Unstable backtest results

Farrell,

This really has me baffled... Consider your Example 200 (Antonacci GEM model). To keep things simple I will just talk about CAGRs (other stats more or less follow suit). Here is a summary of my experiments

You show 10.59% in the downloaded notebook. Seems plausible to me. So I ran it on my Norgate data to verify my pre-processing code. But instead of 10.6%, I got about 8.5%. That seemed odd and my initial thought was a data problem. So I looked at the logs. I found your version missing trades on yahoo data that are taken on Norgate data For example, using yahoo data you are net zero on 3 May 2010 and only take a SPY position on June 1. On Norgate data, I got a position on SPY on May 3 (at a higher prce!); I think that is correct. The Antonacci model should never be out of the market. But whatever.

I am also seeing a variable number of miniscule trades (1 share or 2 shares) on different runs. That suggests that maybe sells aren't happening before buys. But again, let's leave that for now.

The point is that if I clear outputs and rerun on the same data (doesn't matter whether it's from yahoo or Norgate), I get a different CAGR more or less at radom! I might get 8.5%, 9.5%, 9.9%, 10.5% or 11.91%. No way of predicting. I normally work in VS Code. I have tried "clear contents/rerstart kernel" for consecutive runs; I havae tried complete shutdown/restart of VS code. On the off chance that there is something wrong with my VS Code setup, I just reran some of the experiments in Juputer Lab. Same deal. I saved 3 examples to html if you want me to send them. It's not accidentally picking up your random lookback code. I make it print the lookback to make sure.

Bottom line: two consecutive runs on the same data will give different results. In my case, always; I have never had two in a row the same. That means the backtest results can't be trusted. I think it may have to do with how the adjust_percent rebalancing code places orders. I will try to figure that out. Beyond that, I'm not sure what to look for.

Have run into this before? Any advice?

Best regards!

Pandas irow deprecated

def ending_balance(dbal):
     return dbal.irow(-1)['close']

This no longer works on the latest version of Pandas > 0.19.2.

Replace with dbal.tail(1)['close'].values[0] or similar. You also have another example in function total_net_profit.

How rigid are the pinkfish requirements?

Pinkfish seems quite promising for rapid backtesting of portfolio strategies using moderate sized databases . But I note that all the requirements are set to specific package versions. I imagine that just reflects the fact that those are the ones you used to develop most recently. But they make a mess of an existing up-to-date Anaconda environment. If they really are necessary, it would be good to tell potential users that they had better set up a dedicated env. If not, writing them as >= ver no. would make things easier. I don't really like proliferating envs, but could do it if there is a good reason. This is a Win 10 install. The Linux VM might be fun but just adds complexity.

Are there any known requirements that really are locked and couldn't be rewritten as suggested?

regards
ay

installing issue with python 3.3

Hello,

I am trying to install pinkfish, according to PyPI, it works with Python 3.3(https://pypi.org/project/pinkfish/).

But when i try to install it in python 3.3 environment it show the following error:
Does it work with higher python version now?

py33) C:\Users\david>pip install pinkfish
Traceback (most recent call last):
File "c:\users\david\anaconda3\envs\py33\lib\runpy.py", line 160, in run_module_as_main
"main", fname, loader, pkg_name)
File "c:\users\david\anaconda3\envs\py33\lib\runpy.py", line 73, in run_code
exec(code, run_globals)
File "C:\Users\david\Anaconda3\envs\py33\Scripts\pip.exe_main
.py", line 5, in
File "c:\users\david\anaconda3\envs\py33\lib\site-packages\pip_internal_init
.py", line 40, in
from pip._internal.cli.autocompletion import autocomplete
File "c:\users\david\anaconda3\envs\py33\lib\site-packages\pip_internal\cli\autocompletion.py", line 8, in
from pip._internal.cli.main_parser import create_main_parser
File "c:\users\david\anaconda3\envs\py33\lib\site-packages\pip_internal\cli\main_parser.py", line 8, in
from pip._internal.cli import cmdoptions
File "c:\users\david\anaconda3\envs\py33\lib\site-packages\pip_internal\cli\cmdoptions.py", line 22, in
from pip._internal.utils.hashes import STRONG_HASHES
File "c:\users\david\anaconda3\envs\py33\lib\site-packages\pip_internal\utils\hashes.py", line 10, in
from pip._internal.utils.misc import read_chunks
File "c:\users\david\anaconda3\envs\py33\lib\site-packages\pip_internal\utils\misc.py", line 20, in
from pip.vendor import pkg_resources
File "c:\users\david\anaconda3\envs\py33\lib\site-packages\pip_vendor\pkg_resources_init
.py", line 92, in
raise RuntimeError("Python 3.4 or later is required")
RuntimeError: Python 3.4 or later is required

outdated pandas.io

Hello,
in fetch.py a line need to be changed:
from
from pandas.io.data import DataReader
to
from pandas_datareader import data

or it gives error importing (at least on Linux Mint 18.1)

cant import pinkfish

Hello, when running the code below,some error occurs. Do you know what cause it?

(base) root@highperformancePC:/home/pinkfish# ipython 
Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pinkfish
Segmentation fault (core dumped)

And I install the package as follow:

git clone https://github.com/fja05680/pinkfish.git
cd pinkfish
sudo python setup.py install

is there an option to backtest, but without money?

Hello,
I am interest in purely statistical performance of stock so I would like to backtest, but without using money as factor.

Because for example if stock is 6000$ and I put 10000$ capital it will trade only one stock and all the returns stats based on inital capital will not be same as real performance , right?

I can use large capital like 1.000.000 or something, but just in case -
do you have some hidden option to trade without money?

what is drawdown_recovery ?

Hello,

I am trying to figure out what this stat represent.

As I can see it seems to be difference in years between max_closed_out_drawdown_end_date and max_closed_out_drawdown_start_date.

And the result is something like : -0.25

My question is, why those two dates and why not difference between max_closed_out_drawdown_recovery_date and max_closed_out_drawdown_end_date ?

And why is the value negative?

Thanks

Incorrect ordering of buy/sell ordering in portfolio adjust_percents()

"This says we want to sell current positions first to obtain cash. I agree that we need to sell current positions first, but this ordering doesn't make sense to me. This sorts the current largest positions first, but there's no guarantee that the largest weighted item will be a sell or a buy. Instead, I think you need to sort the change of the current weight of the position to the new weight of the position and order negative / sell orders first."

Backtesting with several symbols

Hello,

I´ve played around a little bit with pinkfish. However, right now I am not sure how to backtest a strategy with more than one symbol for example by choosing a basket of stocks from an universe depending on several factors. Is this possible with pinkfish? If so, can you provide an example?

Thanks so far,
legout

Not able to specify the leaf data dir for benchmark

benchmark always tries to use the data cache. It only downloads from yahoo finance if it can't find the timeseries for a symbol in the data cache. This part is correct, as it shouldn't be allowed to fetch a newer timeseries than the one used in the backtest. You have identified a valid issue though, the benchmark doesn't use the data cache that's specified in fetch_timeseries() for the non-benchmark backtest. They should use the same data cache.

Bugs in Portfolio.adjust_percents

I think there are issues/bugs inside Portfolio.adjust_percents. This could have been a pull request, but I'd rather have a discussion in case I'm mistaken. I'll walk through what I think are the problems.

def adjust_percents(self, date, prices, weights, row, directions=None):
        w = {}

        # Get current weights
        for symbol in self.symbols:
            w[symbol] = self.share_percent(row, symbol)

Weights in other functions are converted between whole numbers and floats. We don't know if the user is passing in whole numbers or floats yet. Later sorting will fail if there's a mismatch. I think this should be something like:

convert_weight = lambda weight: self._float(weight) if weight <= 1 else self._float(weight) / self._float(100)
weights = {symbol: convert_weight(weight) for symbol, weight in weights.items()}

w = {}

# Get current weights
for symbol in self.symbols:
    w[symbol] = convert_weight(self.share_percent(row, symbol, field=field))

Next,

# If direction is None, this set all to pf.Direction.LONG
if directions is None:
    directions = {symbol:trade.Direction.LONG for symbol in self.symbols}

# Reverse sort by weights.  We want current positions first so that
# if they need to reduced or closed out, then cash is freed for
# other symbols.
w = utility.sort_dict(w, reverse=True)

This says we want to sell current positions first to obtain cash. I agree that we need to sell current positions first, but this ordering doesn't make sense to me. This sorts the current largest positions first, but there's no guarantee that the largest weighted item will be a sell or a buy. Instead, I think you need to sort the change of the current weight of the position to the new weight of the position and order negative / sell orders first.

Next, more issues.

# Update weights with new values.
w.update(weights)

# Call adjust_percents() for each symbol.
for symbol, weight in w.items():
    price = prices[symbol]
    direction = directions[symbol]
    self.adjust_percent(date, price, weight, symbol, row, direction)
return w
`self.adjust_percent` adjusts the weight by the total cash each iteration. But the total cash is changing each iteration. Each time this function is run the total cash is changing. In order to set the overall weights of the portfolio, the total cash available has to be calculated first. More like this:

Update: I was wrong about this. The total funds are recalculated, but they shouldn't change.

total_funds = self._total_funds(row)

def adjust_percent(date, price, weight, symbol, row, direction):
    value = total_funds * weight
    shares = self._adjust_value(date, price, value, symbol, row, direction)
    return shares

# Call adjust_percents() for each symbol.
# Sell first to free capital
buys = {}
for symbol in set(w.keys()).union(weights.keys()):
    weight = w.get(symbol, weights[symbol])
    new_weight = weights.get(symbol, w[symbol])
    if new_weight < weight:
        adjust_percent(date, prices[symbol], new_weight, symbol, row, directions[symbol])
    else:
        buys[symbol] = new_weight

# Now buy
for symbol, weight in buys.items():
    adjust_percent(date, prices[symbol], weight, symbol, row, directions[symbol])

w.update(weights)
return w

Finally, there's a subtler issue still lurking.
Under the hood, total_funds = self._total_funds(row) uses the close price to determine total funds. This impacts the value going into self._adjust_value (total_funds (at close) * weight). But the shares are then purchased / sold at the price passed in. I think this means that if open prices were passed in the shares would sell at close price, but then be bought at the open price. I suspect that prices shouldn't be passed in at all. The field should be passed and that should be used to rebalance the portfolio. Was there some other intent? I understand we might want to simulate slippage, but I'm not sure that adjust_percent is the right place? I kind of liked that this framework was rather straight forward and the intricacies of actual trading were mostly abstracted away so I could focus more on strategies -- not whether my sells and buys were ordered perfectly, or that technically I can't use 100% of my funds to rebalance my portfolio perfectly.

Undesirable results from select_tradeperiod()

I like the pinkfish codebase a lot. As a practical test I tried to replicate a very simple crossover experiment on data from 1970 - mid 2011. That led me to find several data handling problems. It is probably better to identify them as separate issues. Here is the first one:

  • Try running example 050 Golden Cross Tutorial on ^GSPC.
  • Yahoo data downloads from 1971 ... not good enough, but adequate for a quick test.
  • Except pinkfish throws away 12 years of data (and only starts in 1983),
    image-20210723172855121

How come? because of the dropna() in select_tradeperiod. Why is that? For some reason, the yahoo data has no open from 1971 to a single day in 1978 and then nothing until 1982. (finalize_timeseries drops another year on account of NaNs in the indicator calculation).

The result is a ts dataframe with: one record for 1978-07-26 and the next record for 1982-04-20. That is not desirable behaviour. I understand that dropping records with no open might be what you want for intraday crypto trading, but for EOD equities or ETFs it's not what is expected. And for sure there is a calendaring problem.

This is a pathological situation that I have never seen before. In my experience, yahoo's data is quite reliable.

Two possible fixes come to mind:

  1. more data checking in select_tradeperiod(), but that is a never-ending struggle
  2. let the user identify priority columns for the dropna()

My own approach is never to dropna() automatically; I've had too many bad experiences.

ay

Issue with Calculating CAGR

I think there's a bug in the code used to call the cagr calculation on line 424 in the statistics.py file.

Currently, it reads:

cagr = annual_return_rate(dbal['close'][-1] + dbal['cash'][-1],capital, start, end)

This is adding the ending balance to each other for the "end_balance". I believe it should read:

cagr = annual_return_rate(dbal['cash'][-1],capital, start, end)

Missing adj_close column still causing me problems

Farrell,
I keep coming back to testing pinkfish because it so clean and efficient! But here is something I don't understand:
It seems to me that, if I set use_adj = False, that should tell pf just to look at a 'close' column, ignoring whether 'adj_close' is there or not.

But when I do that, example 200 still blows up because fetch.select_tradeperiod at line 205 requires an adj_close column even when use_adj is set to False.

I don't see that it's a problem for portfolio.py. In fact, your example 200 just sets fields = ['close'].
I would think that if I don't add 'adj_close' to that list, example 200 should run fine.

So two questions:

  1. is there something I'm too dull to understand about what use_adj is for? and
  2. do you see any easy way to tell select_tradeperiod that if use_adj=False there is no need to look for an adj_c;lose
    column?

I know there are other ways around it without altering the pf codebase. But I thought you might have an idea.

I have two two use cases:

  1. index data doesn't have any adjustments
  2. my price database puts adjusted data into 'close' and unadjusted data into ... yup, "unadjusted close'

Best regards
arthur

Stats error in case of empty tradelog

If a tradelog is empty (it happens, for example, when optimizing strategy parameters) the stats function generates an error:

File "/home/witold/miniconda3/lib/python3.9/site-packages/pinkfish/statistics.py", line 827, in stats
    stats['total_net_profit'] = _total_net_profit(tlog)

  File "/home/witold/miniconda3/lib/python3.9/site-packages/pinkfish/statistics.py", line 411, in _total_net_profit
    return tlog.iloc[-1]['cumul_total']

  File "/home/witold/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py", line 967, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)

  File "/home/witold/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py", line 1520, in _getitem_axis
    self._validate_integer(key, axis)

  File "/home/witold/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py", line 1452, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")

IndexError: single positional indexer is out-of-bounds

I suspect than the problem is in line 411 "return tlog.iloc[-1]['cumul_total']"

Full test code:

import datetime
import pinkfish as pf

symbol = 'SPY'
capital = 10000
start = datetime.datetime(1900, 1, 1)
end = datetime.datetime.now()

# Fetch timeseries, select, finalize.
ts = pf.fetch_timeseries(symbol)
ts = pf.select_tradeperiod(ts, start, end, use_adj=True)
ts, start = pf.finalize_timeseries(ts, start)

# Create tradelog and daily balance objects.
tlog = pf.TradeLog(symbol)
dbal = pf.DailyBal()

pf.TradeLog.cash = capital

# Loop through timeseries.
for i, row in enumerate(ts.itertuples()):
    # no trade here for the test
    date = row.Index.to_pydatetime()
    dbal.append(date, row.high, row.low, row.close)

tlog = tlog.get_log()
dbal = dbal.get_log(tlog)

stats = pf.stats(ts, tlog, dbal, capital)

Inconsistent usage of percentage weights

I think there are issues/bugs inside Portfolio.adjust_percents. This could have been a pull request, but I'd rather have a discussion in case I'm mistaken. I'll walk through what I think are the problems.

def adjust_percents(self, date, prices, weights, row, directions=None):
w = {}

    # Get current weights
    for symbol in self.symbols:
        w[symbol] = self.share_percent(row, symbol)

Weights in other functions are converted between whole numbers and floats. We don't know if the user is passing in whole numbers or floats yet. Later sorting will fail if there's a mismatch. I think this should be something like:

convert_weight = lambda weight: self._float(weight) if weight <= 1 else self._float(weight) / self._float(100)
weights = {symbol: convert_weight(weight) for symbol, weight in weights.items()}

w = {}

Get current weights

for symbol in self.symbols:
w[symbol] = convert_weight(self.share_percent(row, symbol, field=field))

Can we buy at open and sell at close same candlestick?

Hello,
I am exploring a lot of backtesting libraries and after many hours it seems to me that none that I tried can buy at candlestick Open, close or sell at same candlestick Close.
Before I start going trough examples, is it possible?
Buy at candlestick Open, sell or close at same candlestick Close?

Thanks

P.S. I saw you wrote this library for the exact same reason. I am just a bit frustrated now, just want confirmation to ease my mind

updating dependency versions in requests.txt

Hi, it seems lot of dependencies are older versions in requests.txt. Because all of them are pinned to exact version number we cannot install pinkfish using poetry, unless we downgrade packages installed before we installed pinkfish.

Error message poetry:

Because no versions of pinkfish match >1.16.0,<2.0.0
 and pinkfish (1.16.0) depends on requests (2.28.1), pinkfish (>=1.16.0,<2.0.0) requires requests (2.28.1).
So, because stock-analytics depends on both requests (^2.31.0) and pinkfish (^1.16.0), version solving failed.

which means I get requests v2.31.0 installed , but pinkfish requires exact 2.28.1 version.

How do we go with updating dependencies to at lease using carrots: ^2.28.1 ?

Need facility for using data sources other than yfinance/Yahoo

The data handling in pinkfish is too tightly bound to yfinance/Yahoo. It would be a valuable enhancement to be able to deliver data easily to pinkfish from other data sources. Lots of individual investors/students/researchers will be using non-yahoo data sources, many housed in databases, not csv files.

It shouldn't be hard, because all we really need (in the first instance) is to deliver the basic ts dataframe. I ran an experiment using my Norgate database since I already know how to make it deliver data to zipline and backtrader. I have now given up.

I thought I would just write the df out of the Norgate db. That turned out to be problematic so I tried copying a Norgate csv into the cache directory. Here are the problems:

  1. pinkfish has yahoo column names hard-coded. But I do all the data configuration and adjustment in the database before getting anywhere near a backtester. Column names are not the same and I have several timeseries columns pinkfish doesn't know about a priori but that I might want to use; no need to throw them away. To try to move forward, I reconfigured my csv.

  2. The showstopper seems to be the fetch_timeseries and select_timeperiod in Benchmark. Benchmark just doesn't want to use the cache at all and seems to insist on trying to download from yahoo. None of the small code patches I tried have solved the problem.

In short,

  • a generic interface to the ts data frame would make pinkfish far more usable
  • exclusive reliance on a pinkfish data cache is not the best approach; users do not want to replicate existing datastores; conversely, data downloaded to serve pinkfish should be generally available; (i.e. ideally, you just provision a pinkfish experiment out of an existing database; the alternative is to point pinkfish at an existing csv repository )
  • select_timeperiod seems to be more problematic than fetch_timeseries
  • I have no clue at all why fetch_timeseries in benchmark doesn't seem respect the use_cache setting
  • there should also be a single spot where we could specify the default path to the pinkfish data cache/folder

Once again, I really enjoy pinkfish and appreciate the elegance of the codebase! But I cannot use it if I'm locked into yahoo data.

btw: backtrader already discovered this and offers 3 or 4 generic data interfaces. At the other end of the spectrum, zipline's data handling is almost impossible. Easy, flexible pandas-based data handling could really differentiate pinkfish.

best regards
ay

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.