Giter VIP home page Giter VIP logo

Comments (4)

albertz avatar albertz commented on July 17, 2024 1

The behavior of get_seq_length is correct.

load_seqs is supposed to load some seqs and make them available, so that get_data etc will work for those seqs. If there is no other call to load_seqs, those seqs must be kept available - that is the expected behavior of load_seqs.

get_seq_length can get called for some future seq which was not yet loaded (but usually only the next one), that is why it internally calls load_seqs in that case, but in such a way that it will not remove the other seqs from memory.

init_seq_order with the argument "sorted" (or related) will normally use get_seq_length but that will not work as you describe. But that should not be the case for CachedDataset2 - its implementation does not support "sorted" or any other option. Any sorting will not work for the CachedDataset2 because this dataset is implemented in such a way that it has no real control over the sorting logic. So, usually you are supposed to implement init_seq_order yourself if you want to have some control. See the derived versions of CachedDataset2.init_seq_order for some examples.

I just looked at the RawWavDataset which you might relate to. You either should load in advance the length of each sequence (you can just take the file-length instead, that will preserve the right order) to support the "sorted" or related sorting options. Maybe do that in a lazy way, so the passed function get_seq_len to get_seq_order_for_epoch can load it at the first call. Or otherwise, just pass get_seq_len=None and that will also be fine - in that case, "sorted" or related sorting options will not be supported.

from returnn.

mennetob avatar mennetob commented on July 17, 2024

Thanks.

So for the completion of the Issue my solution is the following:

If I understand correctly the "sorted" option is not really supported by CachedDataset2.

But it seems to me that due to the initialization of dev dataset, the seq_ordering option of the corss validation set in training is set to "sorted" by default?

The config option "batching" only affects the training_set but not the dev_set.
So simply adding the option "seq_ordering": "default" to my specification of the dev_dataset seems to be a sufficient work aroung for my Problem.

from returnn.

albertz avatar albertz commented on July 17, 2024

CachedDataset2 itself will ignore the seq_ordering option. In all cases, it's up to your implementation to also ignore that (e.g. the ExternSprintDataset will just ignore it, like some others do as well), throw an error or do something sensible. Using get_seq_order_for_epoch with the default get_seq_length will not work, as you described.

from returnn.

mennetob avatar mennetob commented on July 17, 2024

Yeah sure.

I'm now also setting get_seq_len=None in the call of get_seq_order_for_epoch(...) as you suggested. But doing this alone will still make me end up in the sorted branch of Dataset.get_seq_order_for_epoch(...) and thus cause an assertion error.

By setting seq_ordering": "default" in the config of the cross validation set, I only avoid entering that if branch. But you are right I might better handle that in the implementation of my dataset.

Thanks.

from returnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.