janschm / capmarket Goto Github PK

View Code? Open in Web Editor NEW

278.0 278.0 170.0 4.4 MB

Jupyter Notebook 100.00%

capmarket's People

Contributors

Stargazers

Watchers

Forkers

glossner sangwoo3 debackerl aeim vuviethung1998 hedgefair ssquant-team rohhenry andrup avinashrocks1990 tejamoy vslobody bayuhebat beemd oriakiva nhu2000 btezergil chrisbatso lyonleelpl parei sasikanuri varunjain3 embraysitereal ginward tlennon140 lkh-1 eshkiya cl12102783 sumitsrv kukkerem sebadima dorienh dhrms margaritakartaviciute yrbahn citymap noahzuckerman domsooch ayxemma lucky7chess benwaldner balker0322 ivanletteri itemhsu evdokimovmaks fdoperezi fabble2202 paratra asrulsibaoel doubleyu celsosingoaramaki spytensor pedro1492 raitraidma halasnet godwinbenny galaxyh wesley1001 rottschaferanders cmajorsolo mixaural chocdoughnut xfx88 jessehenson pavithranr bbascbr ilew pureuniverse ibkvictor bigandsweet cwickniss tedpark cris-her wangsheng1991 ianderrington vinh-cao jetonbacaj parshkov zac-j-harris seanahmad ahaidichen stix2311 jol reinforcement-learning-experiments itsmemba13 q77wang marearts utahman chengxuncc starlord2222 hwang127 mikeissimo pablo-lg riaje 12lholt qli007 lq-ql ivanmkc diegoug leslielee1203

capmarket's Issues

Shuffling data samples during training

Hi Jan,

may I ask you a question: why don't you shuffle your training data? As far as I know, shuffling a batch helps the model to converge fast and it prevents any bias during the training. Since you only use the encoder and slice fixed data samples, there is no temporal relationship between the samples anymore.
Can you share a little more insight about your training configuration of 1k stocks? I am struggling with the training, since most of the experiments result in straight line or very poor performance.

Have you tried the Spacetimeformer? It is a great idea to have cross attention on multivariate architecture.

Thanks for sharing your great works.
Best,
Vinh

If you are not shuffling ur files during training, it looks like that the last files that go into the generator have a lot of entries. What I can deriving from shape[3736448,256] is that ur are passing 3736448 sequences with a length of 256 into the model.

The 3736448 is the aggregated batch size of that file batch.

Just check whether u have very large file in ur dataset and potentially exclude it for now.

Originally posted by @JanSchm in #1 (comment)

Architecture question

Hi,

Got here from your article. I got a couple of quick questions on the choice of architecture for your model. Specifically, if we are using the transformer architecture, why are we aggregating the results with a pooling layer? The attention mechanism works best for sequential input and sequential output as it learns how each token is related to other tokens in the sequence. Aggregating all the token feels like it defeats the purpose of using a transformer model. I feel like using a mask with the decoder layer architecture will be more appropriate in this case?

Sorry if I am misunderstanding your approach and any clarifications will be greatly appreciated!

How to retrieve IBM dataset?

Hi,

since there is no reference to the datasets used for IBM: Am I correct that this refers to one of those downloadable from https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs or https://www.kaggle.com/jacksoncrow/stock-market-dataset?select=stocks?

Thanks in advance for clarification!

EDIT: I guess its not the same because the data only goes to 2017 or 2020-04-01 on the datasets. So it would be great if you could provide a source :)

hardware configuration used for execution of program

Sir,
Can you please tell what is the configuration and platform you used to execute the transformer code? This would be very helpful. please reply as soon as possible.

Question about residual connection

Got here from this article.

Firstly thank you so much about your work, giving insight for time series analysis beginners like me.

However, in the definition of TransformerEncoder, I found in forward() function, the residual connection is defined like this:

ff_layer = self.ff_normalize(x[0] + ff_layer)

According to the original paper (Attention is all you need), the residual connection should be defined in this way:

ff_layer = self.ff_normalize(attn_layer + ff_layer)

Any possibility that you could provide more explanation about the intuition behind this discrepancy?

Thanks in advance,
Nate

janschm / capmarket Goto Github PK

capmarket's People

Contributors

Stargazers

Watchers

Forkers

capmarket's Issues

Shuffling data samples during training

Architecture question

How to retrieve IBM dataset?

hardware configuration used for execution of program

Question about residual connection

question about data contamination

Training on multiple csvs

Model predicting flat-line even after moving average implementation

Does Transformer Model perform better on more data? (How to avoid straight line prediction)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent