Giter VIP home page Giter VIP logo

capmarket's People

Contributors

janschm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

capmarket's Issues

Shuffling data samples during training

Hi Jan,

may I ask you a question: why don't you shuffle your training data? As far as I know, shuffling a batch helps the model to converge fast and it prevents any bias during the training. Since you only use the encoder and slice fixed data samples, there is no temporal relationship between the samples anymore.
Can you share a little more insight about your training configuration of 1k stocks? I am struggling with the training, since most of the experiments result in straight line or very poor performance.

Have you tried the Spacetimeformer? It is a great idea to have cross attention on multivariate architecture.

Thanks for sharing your great works.
Best,
Vinh

If you are not shuffling ur files during training, it looks like that the last files that go into the generator have a lot of entries. What I can deriving from shape[3736448,256] is that ur are passing 3736448 sequences with a length of 256 into the model.

The 3736448 is the aggregated batch size of that file batch.

Just check whether u have very large file in ur dataset and potentially exclude it for now.

Originally posted by @JanSchm in #1 (comment)

Architecture question

Hi,

Got here from your article. I got a couple of quick questions on the choice of architecture for your model. Specifically, if we are using the transformer architecture, why are we aggregating the results with a pooling layer? The attention mechanism works best for sequential input and sequential output as it learns how each token is related to other tokens in the sequence. Aggregating all the token feels like it defeats the purpose of using a transformer model. I feel like using a mask with the decoder layer architecture will be more appropriate in this case?

Sorry if I am misunderstanding your approach and any clarifications will be greatly appreciated!

Question about residual connection

Got here from this article.

Firstly thank you so much about your work, giving insight for time series analysis beginners like me.

However, in the definition of TransformerEncoder, I found in forward() function, the residual connection is defined like this:

ff_layer = self.ff_normalize(x[0] + ff_layer)

According to the original paper (Attention is all you need), the residual connection should be defined in this way:

ff_layer = self.ff_normalize(attn_layer + ff_layer)

Any possibility that you could provide more explanation about the intuition behind this discrepancy?

Training on multiple csvs

Hi, thanks for sharing your awesome work, I wanted to train the transformer model in multiple csvs which have the same time span, should I just contact them and make one big dataframe and train the model?
Thanks

Does Transformer Model perform better on more data? (How to avoid straight line prediction)

I know in the original post the author states that the straight line prediction is reasonable for this amount of data, but I wanted to ask if anyone has had success using this model in predicting more accurately than the straight line?

If so, what data was used? If not, what aspects of the model have others looked to change to create better predictions? data shuffling?

Thanks in advance,
Nate

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.