Comments (15)
@timothyyu absolutely right! You should apply wavelet and any kind of preprocessing separately on train and test dataset.
I am also working on this topic and I recommend you the following article:
Recurrent Neural Networks for Financial Time-Series Modelling / Gavin Tsang; Jingjing Deng; Xianghua Xie
It has some interesting concepts.
Cheers
from aialpha.
Not to speak for the developer, but you are aware that that is how you train an AI model right? You can preprocess the dataset and then use the model when you run it on a live sample.
from aialpha.
In general I would say that if you preprocess your dataset and then you split into train/test, you train the model and you check the results in the test part, then you are making a mistake. Because you assume to have knowledge of the future in order to preprocess the whole train/test dataset. I was thinking this is the case, but I am not sure anymore, I need to check the code again and I don't have time right now.
from aialpha.
@mg64ve I am looking into this exact issue in implementing the WSAE-LSTM model, which uses the wavelet transform to denoise data (Bao et al., 2017):
https://github.com/timothyyu/wsae-lstm
My implementation is a work in progress/currently vastly incomplete, but my understanding so far is that you cannot apply the wavelet transform to the entire dataset in one pass - but you can arrange the data in a continuous fashion in a clearly defined train-validate-test split that appears to mostly sidestep this issue.
From Bao et al. (2017) defining the train-validate-test split arrangement for continuous training (Fig 7):
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180944#pone-0180944-g007
from aialpha.
Here's an example of applying the wavelet transform to the first two train-validate-test
splits of the csci300 index data
:
from aialpha.
Here's an example of applying the wavelet transform to the first two
train-validate-test
splits of thecsci300 index data
:
ok @timothyyu , does this example come from your code?
from aialpha.
@mg64ve yes, this is from my own code. I have an updated implementation of the above (scaling is done on the train set, and then applied to the validate and test set per period/interval, and then the wavelet transform is applied to each train-validate-test split individually):
https://github.com/timothyyu/wsae-lstm/blob/master/wsae_lstm/visualize.py
from aialpha.
@mg64ve here is an updated version of the above that clearly illustrates the train-validate-test
split, with the effect of scaling and scaling + denoising being visualized:
Implemented as of v0.1.2 / b715d88
https://github.com/timothyyu/wsae-lstm/releases/tag/v0.1.2
from aialpha.
Hi @timothyyu thanks for your reply, let me check the code.
One more question: how do you apply scaling?
Also to scaling you should apply the same concept. Validate and test datasets should be scaled without knowing them in advance.
from aialpha.
Scaling is done with RobustScaler
on the train
set, and then the same parameters used to scale the train
set are applied to the validate
and test
sets.
ddi_scaled[index_name][intervals from 1-24][1-train,2-validate,3-test]
def scale_periods(dict_dataframes):
ddi_scaled = dict()
for key, index_name in enumerate(dict_dataframes):
ddi_scaled[index_name] = copy.deepcopy(dict_dataframes[index_name])
for key, index_name in enumerate(ddi_scaled):
scaler = preprocessing.RobustScaler(with_centering=True)
for index,value in enumerate(ddi_scaled[index_name]):
X_train = ddi_scaled[index_name][value][1]
X_train_scaled = scaler.fit_transform(X_train)
X_train_scaled_df = pd.DataFrame(X_train_scaled,columns=list(X_train.columns))
X_val = ddi_scaled[index_name][value][2]
X_val_scaled = scaler.transform(X_val)
X_val_scaled_df = pd.DataFrame(X_val_scaled,columns=list(X_val.columns))
X_test = ddi_scaled[index_name][value][3]
X_test_scaled = scaler.transform(X_test)
X_test_scaled_df = pd.DataFrame(X_test_scaled,columns=list(X_test.columns))
ddi_scaled[index_name][value][1] = X_train_scaled_df
ddi_scaled[index_name][value][2] = X_val_scaled_df
ddi_scaled[index_name][value][3] = X_test_scaled_df
return ddi_scaled```
from aialpha.
Hi @timothyyu , I had a look to autoencoder.py and model.py. You basically don't use embedding.
So you basically denoise all your test dataset and then use a value from denoised test dataset to predict the next step.
This can't happen in real life because we only know the past also in the test dataset.
That's why I am thinking that embedding is more useful because at each instant t you process the interval [t-N, t].
What do you think about it?
from aialpha.
Hi, thank you for sharing your work and it's interesting. I am looking at the codes, but there were always some errors in generating results for stocks(I see well in FX rate). I would like to compare my results with yours for AAPL. Could you also present a predicted log return vs historical log return for AAPL for the most three years or one years if possible? Thank you very much!
from aialpha.
Why stock_price = np.exp(np.reshape(prediction, (1,)))*stock_data_test[i]
?
File: model.py
Line 54 in 18a58ef
from aialpha.
@az13js I believe it is because log return it is used during preprocessing
from aialpha.
yeah, you should first split, then preprocess
from aialpha.
Related Issues (20)
- How to create encoded_return_train_data ?
- LSTM predict HOT 3
- pywt has no attribute dwt HOT 2
- CSV not found error? HOT 8
- If I had today and tomorrow's average I wouldn't need a fancy network to predict prices HOT 1
- download() missing 1 required positional argument: 'tickers' HOT 2
- What's the state of the project?
- Run fails on a clean freshly downloaded project HOT 3
- processed_data/price_bars/dollar_bars.csv missing HOT 1
- Yahoo data HOT 1
- Unsupported operand type(s) for /: 'str' and 'float' HOT 5
- 统计特征问题
- Dockerize Application:Trying to follow quickstart guide to run the application quickly, having problems HOT 1
- FileNotFoundError HOT 5
- Missing directories and symbolic links HOT 3
- Give plz full file of model
- R library problem
- New complementary tool HOT 2
- Probably how long it will take us to run run.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aialpha.