Comments (27)
You should look at the RNN tutorial: http://tensorflow.org/tutorials/recurrent/index.md .
from tensorflow.
(Thanks, I have seen the tutorial but it is not a substitute for an API; I filed the issue after consulting with a friend at Google Brain)
from tensorflow.
Hi Avanti -- internally we've been working on iterating the API for RNNs, and we were happy enough with the current API to use it in the tutorial, but we're making sure it's solid before promoting it to the public API, since we'd then have to support it indefinitely. (Anything not in the public API is a work-in-progress :)
We'll keep this bug open in the meantime, and for now you can look at the source code documentation if you're interested in playing around: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/rnn/rnn.py#L9
from tensorflow.
Ah, got it, thanks for the explanation.
from tensorflow.
The white paper of TensorFlow mentions looping control within the graph. Is it already available? If so, are there examples to show how it can be done?
The RNN example has a Python loop. Will TensorFlow treat that as a symbolic loop and compile it?
Also, the explanation of sequence_length
here isn't clear to me. What does it mean by dynamic calculations? When t
is past max_sequence_length
, can it just break from the loop instead of continuing with zeros
state? Returning zeros
state is different from returning the state at max_sequence_length
, isn't it?
from tensorflow.
On your first question, see #208.
On your second question: the core TF engine currently only sees the GraphDef produced by python, so the RNN example is an unrolled one today.
I'm not super familiar with that RNN example -- @lukaszkaiser or @ludimagister might know better.
from tensorflow.
I zer0n,
the current RNN is statically unrolled, there is no (not yet) dynamic unrolling based on the length of the sequence. Th dynamic calculation means the graph is unrolled up to the max_sequence_length, but if a sequence_length is provided the calculations on the unrolled graph are cut short once the sequence_length is reached, using a conditional op. Depending on the application this may result in shorter processing time.
from tensorflow.
Yes, to add to what @ludimagister says: the conditional op will plug in zeros to the output & state past max(sequence_length), thus reducing the total amount of computation (if not memory).
I may actually modify this so that instead of plugging in zeros to the state, it just copies the state. This way the final state will represent the "final" state at max(sequence_length). However, I'm undecided on this. If you want the final state at time sequence_length, you can concat the state vectors and use transpose() followed by gather() with sequence_length in order to pull out the states you care about. That's probably what you would want to do, in fact, because if you have batch_size = 2 and sequence_length = [1, 2], then for the first minibatch entry, the state at max(sequence_length) will not equal the state at sequence_length[0].
An alternative solution is to right-align your inputs so that they always "end" on the final time step. This breaks down the dynamic calculation performed when you pass sequence_length (because it assumes left-aligned inputs). I may extend this by adding a bool flag like "right_aligned" to the rnn call, which assumes that calculation starts at len(inputs) - max(sequence_length), and copies the initial state through appropriately. But that doesn't exist now.
from tensorflow.
Thanks @vrv, @ludimagister, and @ebrevdo for the answers. However, some details still confuse me.
- @ludimagister, the code doesn't seem to statically unroll. It has a loop which depends on the length of the inputs. Plus,
max_sequence_length
is not a const; instead it's just the scalar of thesequence_length
parameter, which can be and isNone
by default. So, by default, the unrolling is not truncated. Correct me if I misread the code. - @ebrevdo I understand the computational saving motivation. However, returning zeros is logically very different from returning the state at
sequence_length
(if provided). The former is just wrong. Again, please correct if I misread the code.
Thanks.
from tensorflow.
@zer0n It depends on your task.
Returning zeros is fine if you only care about outputs (i.e., you're not hooking up to a decoder); and your loss function knows to ignore outputs past the sequence_length.
Returning the state from the end of the last time step might also be considered "wrong", but will generally always happen if you have inputs of different lengths (and aren't performing dynamic computation). This is a typical approach to performing RNN with minibatches. For this reason when performing encoding/decoding, people usually right-align with left-side padding instead, so the last input of any example always corresponds to the very last state. This seems like the cleanest solution for now.
Anyway, this part of the API may change; not sure yet the best approach.
from tensorflow.
(also, specifically returning the state at sequence_length for every entry is taxing both in terms of computation and in terms of memory, both in short supply with RNNs )
from tensorflow.
OK, I did miss the line outputs.append(output)
. I originally thought that it returned the final state, not a sequence of states.
Anyway, this implementation still looks weird (I'm aware it's changing so I'm only discussing the current state). Usually, for truncated BPTT implementation, people pad eos
for short sentences and truncate the sentences if the lengths are larger than max_length
. This enables static unrolling and efficient mini-batching.
The RNN example seems doing the reverse. What I see is that it's doing dynamic unrolling (i.e. with dynamic output size), but padding zeros to the outputs past max_length
.
from tensorflow.
(This discussion is probably better off had on the discussion mailing list, rather than this bug about documentation)
from tensorflow.
@vrv:
but we're making sure it's solid before promoting it to the public API, since we'd then have to support it indefinitely.
May we assume that TF is going to use semantic versioning for releases (ie. major.minor.patch)?
Major releases can have backward-incompatible API changes, and minor releases can certainly add a new API (or extend an existing one in a backward-compatible way) especialy if it marks the old API as deprecated (and to be removed in the next major release).
Since TF has not yet had a major version release, (the current release is only 0.5.0), you have a lot of wiggle room between now and an eventual 1.0.0 release that then really would commit you to maintaining backward compatibility for quite a while.
from tensorflow.
@webmaven: We are going to be using semver, and we will publish exactly what we mean by that too. We'll try reasonably hard to maintain API stability even before 1.0.0, especially in the parts of the API that are official. For now, we have decided that something becomes "official" when it shows up on the API docs. You'll notice that many other functions are documented but not included in the docs, that means their interface is still in flux.
I'm closing this bug for now.
from tensorflow.
We'll try reasonably hard to maintain API stability even before 1.0.0, especially in the parts of the API that are official. For now, we have decided that something becomes "official" when it shows up on the API docs.
Hmm. OK, but wouldn't putting APIs in the docs and marking them as 'draft' or 'unofficial' get you valuable external feedback while you're still iterating on an API? Or is external feedback only wanted/needed after an API becomes 'official'?
from tensorflow.
Yeah, we're definitely considering adding something like this since it helps to get community feedback on some experimental APIs before they are fully ready.
from tensorflow.
@vrv and @martinwicke, so getting back to the original topic, is the RNN API currently a good candidate for such 'draft' treatment, and does that mean this issue on documentation should be re-opened?
from tensorflow.
We are reconsidering the api, deciding if we should make the current
rnncell implementations stateful, or whether to store some of the non
variable state in graph collections. After that decision is made, we will
likely add documentation.
On Dec 20, 2015 6:45 AM, "Michael R. Bernstein" [email protected]
wrote:
@vrv https://github.com/vrv and @martinwicke
https://github.com/martinwicke, so getting back to the original topic,
is the RNN API currently a good candidate for such 'draft' treatment, and
does that mean this issue on documentation should be re-opened?—
Reply to this email directly or view it on GitHub
#7 (comment)
.
from tensorflow.
@ebrevdo what are the considerations surrounding this decision between stateful rnncell implementations vs. storing the state in the graph collections?
from tensorflow.
The main consideration is in how calculations common across time are
cached. Simpler caching is traded off against having RNNCell be a pure
logic object (non-stateful).
For example, if you access two Variables at each time step and then concat
them, using the result in your call calculation, then this is something
that should be cached beforehand because it creates redundant computation.
The two approaches to caching are:
- RNNCell is stateful: create and cache this Tensor inside the RNNCell
object - RNNCell is non-stateful: call looks for the cached Tensor inside a
graph collection; if it doesn't exist, it creates it (similar to using
get_variable).
With a stateful RNNCell, Variables are created when the RNNCell is created;
and so that variable scope is used. With a non-stateful RNNCell, Variables
are created / accessed during call and the variable scope used is
whatever it was when you ran rnn() or bidirectional_rnn() or whatever.
Because of this, moving from non-stateful RNNCell to a stateful one (and
modifying the associated implementations of LSTM, GRU, etc cells) would be
a breaking change.
I personally prefer stateful objects, because it's easier to understand and
debug them. But there are arguments in both directions that have to be
considered.
On Sat, Dec 26, 2015 at 2:50 PM, Michael R. Bernstein <
[email protected]> wrote:
@ebrevdo https://github.com/ebrevdo what are the considerations
surrounding this decision between stateful rnncell implementations vs.
storing the state in the graph collections?—
Reply to this email directly or view it on GitHub
#7 (comment)
.
from tensorflow.
[snip explanation]
But there are arguments in both directions that have to be considered.
What are those arguments?
from tensorflow.
For example:
Moving to stateful objects now would break a bunch of external dependencies. Of course, this is an undocumented API and therefore folks should expect it to break in their projects. However, I'm afraid of breaking external projects in subtle ways that don't emit errors. This indeed may happen with this change. Especially for those who depth-stack RNNs on top of each other using the same instance.
In addition, there are those who argue that the RNNCell should continue to be a purely logical object with no state, so you can reuse the same instance of RNNCell across multiple RNNs without fear of reusing the same variable in multiple places (though get_variable's checks for over-sharing may ameliorate this somewhat).
EDIT: scratch that last sentence. the get_variable would then be called only once in the RNNCell's initialization, and all those get_variable protections would go out the door :(.
from tensorflow.
So, you can't actually reuse the same RNNCell instance across multiple RNNs in either case?
from tensorflow.
Currently you can. You can also use it with a shared name scope to tie the
parameters across multiple rnns.
On Dec 29, 2015 3:31 PM, "Michael R. Bernstein" [email protected]
wrote:
So, you can't actually reuse the same RNNCell across multiple RNNs in
either case?—
Reply to this email directly or view it on GitHub
#7 (comment)
.
from tensorflow.
OK, so you can currently reuse an instance of RNNCell across multiple RNNs without worrying about reusing the same variable in multiple places.
Does using a shared name scope with a single RNNCell instance across multiple RNNs gain you anything? Or is that really for reusing parameters across multiple RNNCell instances in multiple RNNs?
from tensorflow.
Right. It gains you the ability to tie parameters not only within one lstm,
but also across multiple lstms.
On Jan 5, 2016 3:52 PM, "Michael R. Bernstein" [email protected]
wrote:
OK, so you can currently reuse an instance of RNNCell across multiple
RNNs without worrying about reusing the same variable in multiple places.Does using a shared name scope with a single RNNCell instance across
multiple RNNs gain you anything? Or is that really for reusing parameters
across multiple RNNCell instances in multiple RNNs?—
Reply to this email directly or view it on GitHub
#7 (comment)
.
from tensorflow.
Related Issues (20)
- CategoryEncoding layer lacks consideration of 0-dimension inputs HOT 3
- ValueError occurred when building cascaded model with loaded sub-model HOT 6
- Conv2D computes wrongly in Windows OS HOT 2
- Saving a model defined by model subclassing can not be saved HOT 3
- I am encountering an error: Error with computation = RQAComputation.create(settings, verbose=True) when I use HPC. However, I do not encounter any error when I use my personal laptop for the same code and installing the same package : from pyrqa.computation import RQAComputation HOT 1
- TF 2.16 Incorrect model with TextVectorization in SavedModel format contains float input type instead of string type
- MultiWorkerMirrorStrategy Metrics Incorrectly Aggregating HOT 2
- WARNING:tensorflow:Your input ran out of data; interrupting training always occuring after 80% of total epochs HOT 1
- Input shape changed after converting model to tflite HOT 2
- protobuf fatal error in Tensorflow 2.16.1 on macos M1 HOT 2
- Cannot create TFLite model with TF 2.16 HOT 1
- no matching function for call to 'MakeOneDnnStream' HOT 3
- Problem with Tensorflow 2.16.1 and gpu HOT 2
- tensorflow=2.16 cant use GPU HOT 3
- `tf.cast` does not preserve requested precision for Python types of float64/int64/complex128 etc
- ImportError when using tensorflow 2.8.0 and python 3.9.16 undefined symbol: PyCMethod_New HOT 2
- Error with loading a .pb model for prediction: Op type not registered 'DecodeProtoSparseV4' HOT 3
- Tensorflow import and GPU env HOT 2
- could not find valid device for node when call tf.acos
- Calling submodels in train_step throws ValueError when saving and loading error.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow.