Giter VIP home page Giter VIP logo

Comments (27)

manning avatar manning commented on March 28, 2024

You should look at the RNN tutorial: http://tensorflow.org/tutorials/recurrent/index.md .

from tensorflow.

AvantiShri avatar AvantiShri commented on March 28, 2024

(Thanks, I have seen the tutorial but it is not a substitute for an API; I filed the issue after consulting with a friend at Google Brain)

from tensorflow.

vrv avatar vrv commented on March 28, 2024

Hi Avanti -- internally we've been working on iterating the API for RNNs, and we were happy enough with the current API to use it in the tutorial, but we're making sure it's solid before promoting it to the public API, since we'd then have to support it indefinitely. (Anything not in the public API is a work-in-progress :)

We'll keep this bug open in the meantime, and for now you can look at the source code documentation if you're interested in playing around: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/rnn.py#L9

from tensorflow.

AvantiShri avatar AvantiShri commented on March 28, 2024

Ah, got it, thanks for the explanation.

from tensorflow.

zer0n avatar zer0n commented on March 28, 2024

The white paper of TensorFlow mentions looping control within the graph. Is it already available? If so, are there examples to show how it can be done?

The RNN example has a Python loop. Will TensorFlow treat that as a symbolic loop and compile it?

Also, the explanation of sequence_length here isn't clear to me. What does it mean by dynamic calculations? When t is past max_sequence_length, can it just break from the loop instead of continuing with zeros state? Returning zeros state is different from returning the state at max_sequence_length, isn't it?

from tensorflow.

vrv avatar vrv commented on March 28, 2024

On your first question, see #208.

On your second question: the core TF engine currently only sees the GraphDef produced by python, so the RNN example is an unrolled one today.

I'm not super familiar with that RNN example -- @lukaszkaiser or @ludimagister might know better.

from tensorflow.

ludimagister avatar ludimagister commented on March 28, 2024

I zer0n,

the current RNN is statically unrolled, there is no (not yet) dynamic unrolling based on the length of the sequence. Th dynamic calculation means the graph is unrolled up to the max_sequence_length, but if a sequence_length is provided the calculations on the unrolled graph are cut short once the sequence_length is reached, using a conditional op. Depending on the application this may result in shorter processing time.

from tensorflow.

ebrevdo avatar ebrevdo commented on March 28, 2024

Yes, to add to what @ludimagister says: the conditional op will plug in zeros to the output & state past max(sequence_length), thus reducing the total amount of computation (if not memory).

I may actually modify this so that instead of plugging in zeros to the state, it just copies the state. This way the final state will represent the "final" state at max(sequence_length). However, I'm undecided on this. If you want the final state at time sequence_length, you can concat the state vectors and use transpose() followed by gather() with sequence_length in order to pull out the states you care about. That's probably what you would want to do, in fact, because if you have batch_size = 2 and sequence_length = [1, 2], then for the first minibatch entry, the state at max(sequence_length) will not equal the state at sequence_length[0].

An alternative solution is to right-align your inputs so that they always "end" on the final time step. This breaks down the dynamic calculation performed when you pass sequence_length (because it assumes left-aligned inputs). I may extend this by adding a bool flag like "right_aligned" to the rnn call, which assumes that calculation starts at len(inputs) - max(sequence_length), and copies the initial state through appropriately. But that doesn't exist now.

from tensorflow.

zer0n avatar zer0n commented on March 28, 2024

Thanks @vrv, @ludimagister, and @ebrevdo for the answers. However, some details still confuse me.

  1. @ludimagister, the code doesn't seem to statically unroll. It has a loop which depends on the length of the inputs. Plus, max_sequence_length is not a const; instead it's just the scalar of the sequence_length parameter, which can be and is None by default. So, by default, the unrolling is not truncated. Correct me if I misread the code.
  2. @ebrevdo I understand the computational saving motivation. However, returning zeros is logically very different from returning the state at sequence_length (if provided). The former is just wrong. Again, please correct if I misread the code.

Thanks.

from tensorflow.

ebrevdo avatar ebrevdo commented on March 28, 2024

@zer0n It depends on your task.

Returning zeros is fine if you only care about outputs (i.e., you're not hooking up to a decoder); and your loss function knows to ignore outputs past the sequence_length.

Returning the state from the end of the last time step might also be considered "wrong", but will generally always happen if you have inputs of different lengths (and aren't performing dynamic computation). This is a typical approach to performing RNN with minibatches. For this reason when performing encoding/decoding, people usually right-align with left-side padding instead, so the last input of any example always corresponds to the very last state. This seems like the cleanest solution for now.

Anyway, this part of the API may change; not sure yet the best approach.

from tensorflow.

ebrevdo avatar ebrevdo commented on March 28, 2024

(also, specifically returning the state at sequence_length for every entry is taxing both in terms of computation and in terms of memory, both in short supply with RNNs )

from tensorflow.

zer0n avatar zer0n commented on March 28, 2024

OK, I did miss the line outputs.append(output). I originally thought that it returned the final state, not a sequence of states.

Anyway, this implementation still looks weird (I'm aware it's changing so I'm only discussing the current state). Usually, for truncated BPTT implementation, people pad eos for short sentences and truncate the sentences if the lengths are larger than max_length. This enables static unrolling and efficient mini-batching.

The RNN example seems doing the reverse. What I see is that it's doing dynamic unrolling (i.e. with dynamic output size), but padding zeros to the outputs past max_length.

from tensorflow.

vrv avatar vrv commented on March 28, 2024

(This discussion is probably better off had on the discussion mailing list, rather than this bug about documentation)

from tensorflow.

webmaven avatar webmaven commented on March 28, 2024

@vrv:

but we're making sure it's solid before promoting it to the public API, since we'd then have to support it indefinitely.

May we assume that TF is going to use semantic versioning for releases (ie. major.minor.patch)?

Major releases can have backward-incompatible API changes, and minor releases can certainly add a new API (or extend an existing one in a backward-compatible way) especialy if it marks the old API as deprecated (and to be removed in the next major release).

Since TF has not yet had a major version release, (the current release is only 0.5.0), you have a lot of wiggle room between now and an eventual 1.0.0 release that then really would commit you to maintaining backward compatibility for quite a while.

from tensorflow.

martinwicke avatar martinwicke commented on March 28, 2024

@webmaven: We are going to be using semver, and we will publish exactly what we mean by that too. We'll try reasonably hard to maintain API stability even before 1.0.0, especially in the parts of the API that are official. For now, we have decided that something becomes "official" when it shows up on the API docs. You'll notice that many other functions are documented but not included in the docs, that means their interface is still in flux.

I'm closing this bug for now.

from tensorflow.

webmaven avatar webmaven commented on March 28, 2024

@martinwicke:

We'll try reasonably hard to maintain API stability even before 1.0.0, especially in the parts of the API that are official. For now, we have decided that something becomes "official" when it shows up on the API docs.

Hmm. OK, but wouldn't putting APIs in the docs and marking them as 'draft' or 'unofficial' get you valuable external feedback while you're still iterating on an API? Or is external feedback only wanted/needed after an API becomes 'official'?

from tensorflow.

vrv avatar vrv commented on March 28, 2024

Yeah, we're definitely considering adding something like this since it helps to get community feedback on some experimental APIs before they are fully ready.

from tensorflow.

webmaven avatar webmaven commented on March 28, 2024

@vrv and @martinwicke, so getting back to the original topic, is the RNN API currently a good candidate for such 'draft' treatment, and does that mean this issue on documentation should be re-opened?

from tensorflow.

ebrevdo avatar ebrevdo commented on March 28, 2024

We are reconsidering the api, deciding if we should make the current
rnncell implementations stateful, or whether to store some of the non
variable state in graph collections. After that decision is made, we will
likely add documentation.
On Dec 20, 2015 6:45 AM, "Michael R. Bernstein" [email protected]
wrote:

@vrv https://github.com/vrv and @martinwicke
https://github.com/martinwicke, so getting back to the original topic,
is the RNN API currently a good candidate for such 'draft' treatment, and
does that mean this issue on documentation should be re-opened?


Reply to this email directly or view it on GitHub
#7 (comment)
.

from tensorflow.

webmaven avatar webmaven commented on March 28, 2024

@ebrevdo what are the considerations surrounding this decision between stateful rnncell implementations vs. storing the state in the graph collections?

from tensorflow.

ebrevdo avatar ebrevdo commented on March 28, 2024

The main consideration is in how calculations common across time are
cached. Simpler caching is traded off against having RNNCell be a pure
logic object (non-stateful).

For example, if you access two Variables at each time step and then concat
them, using the result in your call calculation, then this is something
that should be cached beforehand because it creates redundant computation.
The two approaches to caching are:

  1. RNNCell is stateful: create and cache this Tensor inside the RNNCell
    object
  2. RNNCell is non-stateful: call looks for the cached Tensor inside a
    graph collection; if it doesn't exist, it creates it (similar to using
    get_variable).

With a stateful RNNCell, Variables are created when the RNNCell is created;
and so that variable scope is used. With a non-stateful RNNCell, Variables
are created / accessed during call and the variable scope used is
whatever it was when you ran rnn() or bidirectional_rnn() or whatever.

Because of this, moving from non-stateful RNNCell to a stateful one (and
modifying the associated implementations of LSTM, GRU, etc cells) would be
a breaking change.

I personally prefer stateful objects, because it's easier to understand and
debug them. But there are arguments in both directions that have to be
considered.

On Sat, Dec 26, 2015 at 2:50 PM, Michael R. Bernstein <
[email protected]> wrote:

@ebrevdo https://github.com/ebrevdo what are the considerations
surrounding this decision between stateful rnncell implementations vs.
storing the state in the graph collections?


Reply to this email directly or view it on GitHub
#7 (comment)
.

from tensorflow.

webmaven avatar webmaven commented on March 28, 2024

@ebrevdo:

[snip explanation]
But there are arguments in both directions that have to be considered.

What are those arguments?

from tensorflow.

ebrevdo avatar ebrevdo commented on March 28, 2024

For example:

Moving to stateful objects now would break a bunch of external dependencies. Of course, this is an undocumented API and therefore folks should expect it to break in their projects. However, I'm afraid of breaking external projects in subtle ways that don't emit errors. This indeed may happen with this change. Especially for those who depth-stack RNNs on top of each other using the same instance.

In addition, there are those who argue that the RNNCell should continue to be a purely logical object with no state, so you can reuse the same instance of RNNCell across multiple RNNs without fear of reusing the same variable in multiple places (though get_variable's checks for over-sharing may ameliorate this somewhat).

EDIT: scratch that last sentence. the get_variable would then be called only once in the RNNCell's initialization, and all those get_variable protections would go out the door :(.

from tensorflow.

webmaven avatar webmaven commented on March 28, 2024

So, you can't actually reuse the same RNNCell instance across multiple RNNs in either case?

from tensorflow.

ebrevdo avatar ebrevdo commented on March 28, 2024

Currently you can. You can also use it with a shared name scope to tie the
parameters across multiple rnns.
On Dec 29, 2015 3:31 PM, "Michael R. Bernstein" [email protected]
wrote:

So, you can't actually reuse the same RNNCell across multiple RNNs in
either case?


Reply to this email directly or view it on GitHub
#7 (comment)
.

from tensorflow.

webmaven avatar webmaven commented on March 28, 2024

OK, so you can currently reuse an instance of RNNCell across multiple RNNs without worrying about reusing the same variable in multiple places.

Does using a shared name scope with a single RNNCell instance across multiple RNNs gain you anything? Or is that really for reusing parameters across multiple RNNCell instances in multiple RNNs?

from tensorflow.

ebrevdo avatar ebrevdo commented on March 28, 2024

Right. It gains you the ability to tie parameters not only within one lstm,
but also across multiple lstms.
On Jan 5, 2016 3:52 PM, "Michael R. Bernstein" [email protected]
wrote:

OK, so you can currently reuse an instance of RNNCell across multiple
RNNs without worrying about reusing the same variable in multiple places.

Does using a shared name scope with a single RNNCell instance across
multiple RNNs gain you anything? Or is that really for reusing parameters
across multiple RNNCell instances in multiple RNNs?


Reply to this email directly or view it on GitHub
#7 (comment)
.

from tensorflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.