google-research / google-research Goto Github PK

Google Research

License: Apache License 2.0

Python 43.53% Jupyter Notebook 50.09% Shell 0.64% MATLAB 0.07% Dockerfile 0.01% JavaScript 0.95% CSS 0.02% HTML 0.57% C++ 3.13% Java 0.66% Starlark 0.17% Perl 0.01% R 0.07% Roff 0.01% Smarty 0.01% C 0.01% Makefile 0.01% Cuda 0.03% NASL 0.04% CMake 0.01%

ai machine-learning research

google-research's Introduction

Google Research

This repository contains code released by Google Research.

All datasets in this repository are released under the CC BY 4.0 International license, which can be found here: https://creativecommons.org/licenses/by/4.0/legalcode. All source files in this repository are released under the Apache 2.0 license, the text of which can be found in the LICENSE file.

Because the repo is large, we recommend you download only the subdirectory of interest:

SUBDIR=foo
svn export https://github.com/google-research/google-research/trunk/$SUBDIR

If you'd like to submit a pull request, you'll need to clone the repository; we recommend making a shallow clone (without history).

git clone [email protected]:google-research/google-research.git --depth=1

Disclaimer: This is not an official Google product.

Updated in 2023.

google-research's People

Contributors

Stargazers

Watchers

Forkers

mazzzystar adolfoeliazat fgdbtkd nd1511 codeaudit wanjinchang nipengmath henryxue tandychao 2017alan hanst xcbat chenghuige ahashisyuu keyonvafa pankajmehar etsangsplk wurentidai yueyedeai ml-lab pythseq errai34 colinsongf c0ns0le jdetras wangguangyuan zmoon111 hephaex nguyenducnhaty knowscieng wenh123 intuitionmachine chirayukong cclauss sosojustdo pandinosaurus yaduvendra vikas-sindhwani itsayushthada bachirelkhadir liuqihan beesitech anirband zhengwang100 devops8012 angusoft-india phiphigengen hanhaotian sagarverma suppergg shafiahmed daniel-s-ingram brettkoonce saeedseyyedi quettabit hulalazz wdan shekharravi awesome-archive zhoujialinmumu shubhank-saxena fangyh09 wookayin hack121 senaygui devhttps ahviplc kamyu104 nehgu layneins haipinglu dongxiao92 hhy5277 sharmer156 shaunstanislauslau lsmray muxinghan xc35 xeransis nvfx yflyzhang airobot-v1 feilipumaluo stry multiplecrashes eycab lhmzll george-kris rheasilvia reinforcement-learning-fun-2 embeddedsamurai michaelchi08 gdcollect hyzcn sunnybalog shubhampachori12110095 tony32769 raphaelsc19 pubfork misalraj

google-research's Issues

[state_of_sparsity] - Knowledge transfer and reconstitution

@sarahooker I really enjoyed the paper. I'd like to engage in a little speculation and I hope you'll indulge me.

Knowledge transfer during iterative sparsification

The lottery ticket result surprises me. I think you should be able to retrain to much closer to the same accuracy given the same initialization and a sparse mask. However I speculate that the magnitude pruning method induces knowledge transfer which prevents this.

Because the sparsity inducing mask changes during the iterative process you're dealing with some number of subnets. If they were fully disjoint you would transfer knowledge using one as a teacher and the other as the student. However in the iterative process you have a gradual knowledge transfer. Which means that the representations (and ultimate accuracy of the sparsified network) are no longer a function of sparse initial weights + training, but the full initial weights and the sparsification procedure.

If this is the case I suspect that if you do a single-step sparsification at the end of training and use that sparse mask along with the same initial weights (lottery ticket) you should see much closer accuracies.

(Iterative pruning is still a better way to do pruning of course.)

Knowledge reconstitution

I'm curious how much work has been done in the area of densifying sparse nets. For example, can you perfectly reverse the accurracy loss curves by increasing sparsity and retraining? Does it work better if you do this in one step (go from 90% sparsity to 70% sparsity by initializing a lot of random weights) or iteratively (90->85->80->75->70)

Ultimately the question is do you think a sparse bottleneck + densification + retraining procedure can produce a highly efficient and compressed version of finetuning?

dql_grasping: Training with default config did not converge to good success rate

We tried to recreate some of the results in dql_grasping. After setting up the environment according to requirements, we ran run_random_collect_oss.sh, and then run_train_collect_eval_oss.sh with dqn on-policy and dqn off-policy. The results shown in images below suggest the training didn't converge on policies with expected success rate, what steps should we take to reproduce similar results to those presented in the paper?

demogen: loading resnet models fails

After working-around a problem in example.py as described in previous issue I could load NIN models but not resnet models. The error is below:

I0716 13:20:53.269759 140097876026944 saver.py:1280] Restoring parameters from data/demogen_models/RESNET_CIFAR10/resnet_wide_1.0x_batchnorm__decay_0.0_1/model.ckpt-150000
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "~/google-research/demogen/example.py", line 62, in <module>
    tf.app.run(main)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "~/.local/lib64/python2.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "~/.local/lib64/python2.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "~/google-research/demogen/example.py", line 57, in main
    load_and_run(model_config, root_dir)
  File "~/google-research/demogen/example.py", line 44, in load_and_run
    sess.run(logits)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnimplementedError: Generic conv implementation only supports NHWC tensor format for now.
         [[node resnet/conv2d/Conv2D (defined at /tmp/tmpdCZJAJ.py:12) ]]

Errors may have originated from an input operation.
Input Source operations connected to node resnet/conv2d/Conv2D:
 transpose (defined at demogen/data_util.py:79)
 resnet/conv2d/kernel/read (defined at demogen/models/resnet.py:136)

Original stack trace for u'resnet/conv2d/Conv2D':
  File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "~/google-research/demogen/example.py", line 62, in <module>
    tf.app.run(main)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "~/.local/lib64/python2.7/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "~/.local/lib64/python2.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "~/google-research/demogen/example.py", line 57, in main
    load_and_run(model_config, root_dir)
  File "~/google-research/demogen/example.py", line 41, in load_and_run
    logits = model_fn(image, is_training=False)
  File "demogen/models/resnet.py", line 391, in __call__
    strides=self.conv_stride, data_format=self.data_format)
  File "demogen/models/resnet.py", line 136, in conv2d_fixed_padding
    data_format=data_format)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/convolutional.py", line 424, in conv2d
    return layer.apply(inputs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/base.py", line 537, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
    ), args, kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/tmpdCZJAJ.py", line 12, in tf__call
    outputs = ag__.converted_call('_convolution_op', self, ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (inputs, self.kernel), None)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
    return _call_unconverted(f, args, kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted
    return f(*args)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1079, in __call__
    return self.conv_op(inp, filter)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 635, in __call__
    return self.call(inp, filter)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 234, in __call__
    name=self.name)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1953, in conv2d
    name=name)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1071, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

cluster gcn

Hi,I have few questions for the Amazon-2M dataset
1.Can the code of cluster_gcn run on Amazon-2M dataset?
2.for the dataset Amazon-2M
(1) Will the stopwords be removed?
(2) What is the ratio of training test data?

Is per frame label needed to train Temporal Cycle-Consistency model?

My understand is it seems like we don't need per frame label to train the TCC model, right? But when I feed the tfrecord without per frame label, it throws an error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node __inference_<lambda>_61009}} Name: , Feature list 'frame_labels' is required but could not be found.  Did you \
mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
         [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext]]
         [[conv3_block4_1_bn/beta0/buckets/cond/else/_865/Identity/_2076]]
  (1) Invalid argument: {{function_node __inference_<lambda>_61009}} Name: , Feature list 'frame_labels' is required but could not be found.  Did you \
mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
         [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored. [Op:__inference_<lambda>_61009]

Seems like the frame_label is required here. So can I just fill in dummy labels here?

Undefined name: 'embed_translation' in ./qanet/squad_helper.py

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./qanet/squad_helper.py:87:1: F822 undefined name 'embed_translation' in __all__
__all__ = ['preprocess_inputs', 'build_a_layer',
^

Code snippet for simple prediction

Hello, Im trying to get prediction results (depth and camera parameters) given Image1, Image2 and Masks. But Im having hard time how to, assuming I have these images as numpy matrixes, can i make this prediction by calling a single method? I tried doing this myself but code is overwhelmingly complicated and I pretty much got lost

Im running the train code with single step and its executing without any errors, but cant find prediction results anywhere, tensorboard Image folder is empty.

Essentially what i am trying to create is a method with following signature
depth_image, camera_calibration = model.predict(previous_image, current_image, mask)

I have found a method that returns self.est_depth which i presume is depth map, but I couldnt find out how to supply the inputs to that method and how do i retrieve predicted camera calibration

+Update
Setting Summary freq to 1 and training for 10 episodes on single data instance using provided code seems to generate images on Tensorboard
python -m depth_from_video_in_the_wild.train --checkpoint_dir=***\trained_models --data_dir=***\depth_from_video_in_the_wild\data_example --train_steps=10

depth_from_video_in_the_wild: Ego-motion measurement unit

Quick question.
What is the unit of the estimated ego-motion? Is it mm, cm, m?
If it is not real world units, how can it be converted?

Thanks a lot
Edit: I am talking about the project depth_from_video_in_the_wild

Running Inference for Unprocess Net on real-world RGB image

Hi,

Questions regarding denoiser models from paper "Unprocessing Images for Learned Raw Denoising". Can I run the inference for some arbitrary already noisy image (RGB) if I don't have access to noise information (shot and read noises) from metadata?

Specifically in the 'dnd_denoise.py' file, I can see the feed_dict comprised of noisy image, read and shot noise tensors.

Thanks!

Inference code and models for 'Unprocessing Images for Learned Raw Denoising'

Hi,
I was wondering if you could provide the pre-trained models for CVPR2019 paper title 'Unprocessing Images for Learned Raw Denoising'. I have looked into 'unprocessing' dir, no models or inference code, do you intend to release that or not?

Thanks you!
Touqeer

in ./graph_embedding/watch_your_step Reason for scaling up positive loss by the number of nodes, significance of attention and why no validation split?

[Line no 226, graph_attention_learning.py, Watch Your Step] return tf.transpose(d_sum) * GetNumNodes() * 80, feed_dict
Why is an arbitrary scaling by the number of nodes done? I am not sure if it is reported in the Watch Your Step paper.
When I remove the scaling, there is some decrease in the results for most of the datasets
Results and Ablation Study:
PPI(without scaling,learn attention) - 90.84
PPI(with scaling,learn attention) - 91.8
PPI(uniform attention,scaling) - 91.7

ca-HepTh(without scaling,learn attention) - 93.14
ca-HepTh(with scaling,learn attention) - 93.8
ca-HepTh(uniform attention,scaling) - 93.9

Wiki-Vote(without scaling,learn attention) - 94.3
Wiki-Vote(with scaling,learn attention) - 93.7
Wiki-Vote(uniform attention,scaling) - 94.0

Configs:
Embedding dimension:128
Share Embeddings: False
Transition_powers: 5
Loss: nlgl
context_regularizer: 0.1
learnable attention - softmax over 5 hops
uniform attention - Equal attention of earch of 5 hops(0.2 for each hop)

A couple of more questions:

Why is a validation set(validation positive edges, negative edges) not chosen for stopping? I know that Learning Edge Representations via Low-Rank Asymmetric Projections, other baselines and many of the graph embedding literature doesn't use it for link prediction, but I feel that stopping based on validation from ROC-AUC scores is more appropriate than stopping by best train ROC-AUC.

2.a) I see that attention makes only little contribution to an increase in the performance. Uniform attention works well. For example, in PPI, the paper reports that the attention learned favors the first hop. But not learning any attention(or in other words, uniform attention) also performs equally well. This is true even for Soc-Facebook, Wiki-Vote, ca-HepTh, ca-AstroPh.

2.b)
Why is the stopping criteria in line 441
"if i - 100 > eval_metrics['i at best train']:
LogMsg('Reached peak a while ago. Terminating...')
break"

based on training error? Shouldn't stopping criteria be always based on validation error?

2.c) in line 340 "eval_metrics['test auc at best train'] = float(test_auc)"

Why log and report test AUC at best train? We always report metrics and save model that gives the best performance on the validation set. If we modify the code to report the AUC / precision scores at least validation error then there is little to no difference between having weights and trainable attention.

Unprocessing : code and pretrained model has mismatch

While checking the unprocessing, I could not make model similar to the one uploaded in google drive.
When I tried to train the model in our environment, graph does not have short noise and read noise.
I got below error when I tried to denoise the model which is created using the code committed in github.
The name 'stddev/shot_noise:0' refers to a Tensor which does not exist. The operation, 'stddev/shot_noise', does not exist in the graph. The name 'stddev/shot_noise:0' refers to a Tensor which does not exist. The operation, 'stddev/shot_noise', does not exist in the graph.

Could you check training code shared is latest one.

Negative distance function in tcc.visualize_aligment.py

Hello,

regarding the distance function in visualize_alignment:

def dist_fn(x, y): dist = -1.0 * np.matmul(x, y.T) return dist

that is passed as the argument for the dist parameter in the align function,

By using the negative of the matmul call, I believe the dynamic time warping to be finding the worst possible path. Experimentally I have verified that the reconstruction error is 0 when removing the negation.

error running example.py

I am trying to run the demogen example.py in a conda environment, and in either 2.7 or 3.7

get the following error:

File ".../google-research/demogen/data_util.py", line 73, in input_data
    dataset = prob.dataset(mode)
  File ".../anaconda3/envs/demogen/lib/python3.7/site-packages/tensor2tensor/data_generators/problem.py", line 631, in dataset
    assert data_dir

I am running TF versions

tensor2tensor            1.13.4   
tensorboard              1.14.0   
tensorflow               1.14.0   
tensorflow-datasets      1.0.2    
tensorflow-estimator     1.14.0   
tensorflow-metadata      0.13.0   
tensorflow-probability   0.7.0

Error recorded from training_loop: Found Inf or NaN global norm. : Tensor had NaN values

I try to run run_classifier.sh in cpu, it runs ok. But when I run in gpu, sometimes good, sometimes bad. I think it might have something to do with seq_max_length and batch_size. But it's useless to reduce seq_max_length and batch_size.

following is my run_classifier.sh

python3.5 run_classifier.py
--task_name=sim
--do_train=true
--do_eval=true
--do_predict=true
--data_dir=$MY_DATASET
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--max_seq_length=128
--train_batch_size=32
--learning_rate=2e-5
--num_train_epochs=2.0 \

following is error log

2019-03-21 14:21:12.342706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2019-03-21 14:21:13.673814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-21 14:21:13.673891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2 3
2019-03-21 14:21:13.673904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N N N
2019-03-21 14:21:13.673912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N N N
2019-03-21 14:21:13.673918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: N N N N
2019-03-21 14:21:13.673925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3: N N N N
2019-03-21 14:21:13.675250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15119 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pc
i bus id: 0000:00:07.0, compute capability: 6.0)
2019-03-21 14:21:13.676003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15119 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pc
i bus id: 0000:00:08.0, compute capability: 6.0)
2019-03-21 14:21:13.676420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15119 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pc
i bus id: 0000:00:09.0, compute capability: 6.0)
2019-03-21 14:21:13.676764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15119 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pc
i bus id: 0000:00:0a.0, compute capability: 6.0)
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ./mayi_model_2/model.ckpt.
2019-03-21 14:22:44.995671: E tensorflow/core/kernels/check_numerics_op.cc:185] abnormal_detected_host @0x1085900ab00 = {1, 0} Found Inf or NaN global norm.
INFO:tensorflow:Error recorded from training_loop: Found Inf or NaN global norm. : Tensor had NaN values
[[node VerifyFinite/CheckNumerics (defined at /var/log//bert_chi/optimization.py:74) = CheckNumerics[T=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"](global_n
orm/global_norm)]]

bam/run_classifier.py model always load distill_outputs from _train_predictions_1.pkl

I was trying to train a BAM model using the command like python -m bam.run_classifier rte-mrpc-bam-model $BAM_DIR '{"task_names": ["rte", "mrpc"], "distill": true, "teachers": {"rte": "rte-model", "mrpc": "mrpc-model"}}' given in the Readme.md，an error occured like:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1758, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1752, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\pydevd.py", line 1147, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2019.1.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/yirongli/Desktop/bam/run_classifier.py", line 281, in <module>
    tf.app.run()
  File "C:\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "C:/Users/yirongli/Desktop/bam/run_classifier.py", line 274, in main
    model_runner.write_outputs([task], trial, split)
  File "C:/Users/yirongli/Desktop/bam/run_classifier.py", line 196, in write_outputs
    distill_input_fn, _, _ = self._preprocessor.prepare_predict(tasks, split)
  File "C:\Users\yirongli\Desktop\bam\data\preprocessing.py", line 69, in prepare_predict
    return self._serialize_dataset(tasks, False, split)
  File "C:\Users\yirongli\Desktop\bam\data\preprocessing.py", line 108, in _serialize_dataset
    self.serialize_examples(examples, is_training, tfrecords_path)
  File "C:\Users\yirongli\Desktop\bam\data\preprocessing.py", line 127, in serialize_examples
    tf_example = self._example_to_tf_example(example, is_training)
  File "C:\Users\yirongli\Desktop\bam\data\preprocessing.py", line 136, in _example_to_tf_example
    example, is_training))
  File "C:\Users\yirongli\Desktop\bam\task_specific\classification\classification_tasks.py", line 136, in featurize
    self._distill_inputs[eid])
KeyError: 2490

When I tried to figure out, i found that in configure.py line 131, model always load distill_outputs from _train_predictions_1.pkl no matter the student model requiring training or testing prediction by teacher model in run_classifier.py line 269.
So it obviously goes wrong when get the testing prediction, hope to fix it

The color of the restored image by run processing.py is not right on DND dataset.

hi,can you give the unprocessing_srgb_loss code,I run the process.py to process bayer RGGB into sRGB imag on DND dataset.But the color of the restored image is not right.

Can you publicly release Pouring and Penn_Action Dataset new annotations@debidatta

Recruiter Contacts

Can you post:

recruiter contacts
if posted opening does not march what you can contribute best in contact information where you can discuss or send in a proposal or speculative application

frechet_video_distance

Hello. Thanks for sharing remarkable work.

I want to use the "frechet_video_distance.py" for my research.
But I can't find how to give my video as input.

May I get some help for it?

Trouble training temporal cycle consistency model on a new dataset

I'm getting this error while trying to train the model on a new dataset:

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_<lambda>_59670}} Name: , Feature list 'frame_labels' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults? [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext]] [Op:__inference_<lambda>_59670]

Trouble in running /mol_dqn/experimental

each .py files in /mol_dqn/experimental contains such codes:

from mol_dqn.chemgraph.mcts import deep_q_networks from mol_dqn.chemgraph.mcts import molecules as molecules_mdp from mol_dqn.chemgraph.mcts import run_dqn

But where is the module mol_dqn.chemgraph.mcts?

Enas_lm child model

In the child.py file, why w_skip[0] is not used for L2 weight reg ?
var_s = [w_prev] + w_skip[1:]

google-research/enas_lm/src/child.py

Line 69 in 4305853

# extract the relevant variables, so that you only do L2-reg on them.

Does w_skip[0] means the weight of node 1 and node 2 (as the red arrow shows )?

TCC gives different embeddings when --frames_per_batch are diff

Hi, I get different embeddings when I run extracting_embeddings.py with the same input and setups except --frames_per_batch. For example:

--frames_per_batch == 1:

<tf.Tensor: id=10110, shape=(128,), dtype=float32, numpy=
array([ 0.7134198 ,  0.73715377, -0.40302375, -0.8916309 ,  0.46344516,
        1.0306991 , -0.616913  , -0.3954195 , -1.4110051 ,  0.05280016,
        0.02743702, -0.47917336, -0.7063019 ,  0.00664697, -0.16832595,
       -0.38978115,  0.28905347, -0.71937335,  0.39218065, -0.2223233 ,
        0.24361739, -0.08869804, -0.9321748 , -0.8480654 ,  0.45671615,
       -0.90358734, -1.3239552 , -0.18341677,  0.22246726, -0.84119105,
        0.41529498,  0.2421515 , -0.12988764,  1.2223002 , -1.2660636 ,
        0.31256717,  1.018894  ,  0.6738411 ,  0.18867303, -0.17254871,
       -1.4501228 ,  1.0448513 , -1.07593   , -0.9447051 , -0.38788766,
        0.20399381, -0.46668968, -0.00173404,  0.36895347,  0.49572152,
        0.11630958, -0.4594518 ,  0.11987424,  1.1069762 , -0.4460541 ,
       -0.5169652 ,  0.3923389 , -0.2448386 ,  0.9658608 ,  0.23109348,
        0.16036353, -0.82762504, -0.20896168, -0.5168912 ,  0.07127902,
        0.25286838,  0.02297238, -1.3294024 ,  0.39561197,  0.5771644 ,
        1.5672271 ,  0.6967077 , -0.5723567 ,  0.3235898 ,  0.7618949 ,
        0.91371095, -0.26501146,  0.10041602,  0.3094164 , -0.27465546,
       -0.083619  , -0.49142212, -0.49883616,  0.08733553, -0.02642126,
       -0.46290073,  0.58624184,  0.8576831 ,  0.23795792,  0.26929522,
        0.40708646,  0.96988857, -1.0064505 ,  0.8797252 ,  0.6761626 ,
        0.970005  , -0.05762405, -0.51743686,  1.468217  , -1.2853948 ,
        0.7068154 ,  0.18635778, -0.16756667,  0.25616115, -1.3703632 ,
       -0.09829516,  0.38073325, -0.5046847 ,  0.30129814, -1.6381143 ,
       -0.41361213, -0.37367654, -0.00442669,  0.24185468, -0.23331967,
        0.46492097, -0.874537  , -0.11011802, -0.29310864, -0.6257624 ,
       -0.13382947, -1.569699  , -0.60893744, -0.9276579 , -0.27602094,
       -0.10803759, -0.69294   ,  0.72386044], dtype=float32)>

--frames_per_batch == 10, and first embedding I get is:

(Pdb) p emb_feats[0]
<tf.Tensor: id=10111, shape=(128,), dtype=float32, numpy=
array([ 0.69585   ,  0.7938789 , -0.40682542, -0.9104449 ,  0.44788656,
        1.0696075 , -0.6545431 , -0.39065826, -1.4388212 ,  0.07879334,
        0.07056048, -0.4912063 , -0.6906045 ,  0.05483497, -0.1918716 ,
       -0.4480733 ,  0.2745942 , -0.69838053,  0.33875132, -0.12655792,
        0.20775412, -0.07518089, -0.9712121 , -0.8582906 ,  0.4940186 ,
       -0.9054945 , -1.3323177 , -0.1797872 ,  0.21466169, -0.842904  ,
        0.4238041 ,  0.18746372, -0.16505198,  1.2667109 , -1.3068607 ,
        0.34267744,  1.0190938 ,  0.6861747 ,  0.10797802, -0.1484639 ,
       -1.4422283 ,  1.0217737 , -1.0802454 , -1.0336714 , -0.3141645 ,
        0.20748997, -0.5034793 , -0.01016946,  0.38425395,  0.4835653 ,
        0.17350678, -0.48503914,  0.09994452,  1.171491  , -0.44424957,
       -0.5473531 ,  0.31822965, -0.22695364,  0.9586529 ,  0.23552223,
        0.19058312, -0.87618846, -0.2797962 , -0.4805327 ,  0.06645918,
        0.21550886, -0.0261419 , -1.3357202 ,  0.36320177,  0.5580914 ,
        1.5055286 ,  0.6597656 , -0.587545  ,  0.3728123 ,  0.78129125,
        0.9374202 , -0.26444304,  0.07730328,  0.28997815, -0.29767683,
       -0.05417863, -0.5795138 , -0.52169895,  0.17690559, -0.04242799,
       -0.5164021 ,  0.563585  ,  0.86161417,  0.2007392 ,  0.1909954 ,
        0.46636117,  1.003835  , -0.98218834,  0.8831477 ,  0.6481319 ,
        0.9996616 , -0.04697989, -0.49288058,  1.4274698 , -1.319891  ,
        0.6997739 ,  0.17412183, -0.15896532,  0.23632345, -1.3237288 ,
       -0.10929373,  0.36972514, -0.5411339 ,  0.30892903, -1.6210634 ,
       -0.3404241 , -0.35616207,  0.01820717,  0.22067611, -0.23308174,
        0.42857918, -0.8491688 , -0.0896553 , -0.35461238, -0.635367  ,
       -0.16244371, -1.6196773 , -0.5958075 , -0.9506139 , -0.3112893 ,
       -0.12266631, -0.6807967 ,  0.72940886], dtype=float32)>

I thought the embeddings should be independent with --frames_per_batch param and should be able to get consistent results, is there something I'm missing?

Symbolic Computation in TF

We've been doing some symbolic computation/mathematics for PyMC in the symbolic-pymc project, and, since we're moving to TensorFlow [Probability] and you folks have also done related things in TFP & Edward2, I would like to get your input on this kind of work in the context of TF[P].

More specifically, is anyone else working on tools for symbolic assessment/manipulation of TF graphs? We've had to do a bit of work to make TF graphs "symbolically manipulatable" and I'm always wondering if there's a better way, or if I'm missing out on any larger, concerted efforts to do so.

problems when running bam/run_classifier.py

Hi there, a problem occured when running the bam/run_classifier.py.

Tensorflow seems to lock the events.out.tfevents file until the whole program end, when execute the command utils.rmkdir(config.checkpoints_dir) at run_classfier.py line 271, tf.gfile.DeleteRecursively appears to can't be done and raise the error given below

Traceback (most recent call last):
  File "D:/bam/run_classifier.py", line 281, in <module>
    tf.app.run()
  File "C:\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "D:/bam/run_classifier.py", line 276, in main
    utils.rmkdir(config.checkpoints_dir)
  File "D:\bam\helpers\utils.py", line 71, in rmkdir
    rmrf(path)
  File "D:\bam\helpers\utils.py", line 60, in rmrf
    tf.gfile.DeleteRecursively(path)
  File "C:\Python36\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 563, in delete_recursively
    delete_recursively_v2(dirname)
  File "C:\Python36\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 577, in delete_recursively_v2
    pywrap_tensorflow.DeleteRecursively(compat.as_bytes(path), status)
  File "C:\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: Failed to remove a directory: bam_dir\models\debug-model\checkpoints; Directory not empty

The enviroment currently use : tensorflow-gpu 1.13.1 windows10.
However, when run this code on centos, the problem doesn't exist.
Really hope to give me some advice

Load node2vec graph into persona graph_embedding

Is it possibile to load a node2vec graph format like

node1_id_int node2_id_int <weight_float, optional>

Into the persona graph embedding?
Also node2vec provides several graph samples (like FB Egonet, etc.) in https://snap.stanford.edu/node2vec/, shall I first convert those graph to the NetworkX format before running in

python3 -m graph_embedding.persona.persona --input_graph=${graph} \
   --output_clustering=${clustering_output}

Thank you.

FailedPreconditionError:2 root erros

Hi, I am running into the issue related to FailedPreconditionError: 2 root error, when I am trying to run the train.py within Unprocessing Images for Learned Raw Denoising. Do you have any idea why this will happen and how to address it. I am using tf_gpu_1.12.0, python 3.5.8, cuda 9.0, Linux. Thanks

About Time-varying Convex Optimization

It's amazing and so magical! Where is the source paper and code?

schema_guided_dst baseline: difference between data_utils.get_num_dialog_examples() and the number of examples in dstc8_single_domain_train_examples.tf_record

Hi, I doubt that data_utils.get_num_dialog_examples() returns correct number.

In dstc8_single_domain & train, data_utils.get_num_dialog_examples() returns 82588, but the number of examples in dstc8_single_domain_train_examples.tf_record is 41294. I think these two numbers should be the same. Is it right? (The former is around as double as the latter because get_num_dialog_examples() counts USER and SYSTEM turns together. I think it's wrong.)

AttributeError: 'KukaGraspingProceduralEnv' object has no attribute 'cid'

In GUI render mode at del function I am getting object has no attribute 'cid' error.

How can i render GUI while training?

Undefined name: NltkAndPunctTokenizer() does not exist in qanet/util/tokenizer_util.py

https://github.com/google-research/google-research/search?q=NltkAndPunctTokenizer&unscoped_q=NltkAndPunctTokenizer

No module named tensorflow.google

Hi, what the hell tensorflow.google is? I checked you official tf API and related topics on websites, but found nothing. My tf version is 1.12.0.
Thanks

No key events and phase labels were found in the Penn Action dataset

Hello, the Penn Action dataset referred to in your paper does not have the key events and phase labels you mentioned after downloading. Can you disclose your work in order to reproduce your work?@debidatta
Thank you

Undefined name: ‘transfer’ in ./qanet/squad_helper.py

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./qanet/squad_helper.py:166:22: F821 undefined name 'transfer'
    encoder_states = transfer.elmo_tensor_from_chars(
                     ^
./qanet/squad_helper.py:181:22: F821 undefined name 'transfer'
    encoder_states = transfer.elmo_tensor_from_sentences(
                     ^

Should these be embed_elmo_chars() and embed_elmo_sentences()?

training simulation for unprocessing net is stuck

Hello,
Thanks for releasing the code for 'Unprocessing Images ... Raw Denoising'. Upon trying the training process, I see that the training simulation gets stuck at this point -

This is my run command -
python train.py --model_dir='./ckpts/' --train_pattern=/disk1/aashishsharma/Datasets/MIRFlickr_Dataset/train/* --test_pattern=/disk1/aashishsharma/Datasets/MIRFlickr_Dataset/test/*

Anybody knows this problem? any workaround? Thanks!

demogen: loading resnet models fails on GPU

The problem described in previous issue is resolved when working with tensorflow with enabled GPU support, but then there is a zoo of behaviors:

5 of the saved resnet models loads correctly
Most fail with Not found: Key resnet/group_norm/beta not found in checkpoint
Many fail with Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [1,32,1,1] rhs shape= [32]
Few fail with ValueError: Trying to share variable resnet/conv2d/kernel, but specified shape (3, 3, 3, 32) and found shape (3, 3, 3, 16).

Correctly loaded:

resnet cifar10 resnet_wide_1.0x_batchnorm_aug_decay_0.0_1
resnet cifar10 resnet_wide_1.0x_batchnorm_aug_decay_0.0_lr_0.001_1

resnet cifar100 resnet_wide_1.0x_batchnorm_aug_decay_0.0_1
resnet cifar100 resnet_wide_1.0x_batchnorm_aug_decay_0.0_lr_0.001_1
resnet cifar100 resnet_wide_1.0x_batchnorm_aug_decay_0.0_lr_0.1_1

Not found:

resnet cifar10 resnet_wide_1.0x_batchnorm_aug_decay_0.0_2
2019-07-31 13:22:08.859328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:22:08.859377: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:22:08.859386: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:22:08.859393: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:22:08.859405: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:22:08.859413: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:22:08.859420: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:22:08.859428: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:22:08.859802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:22:08.859822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 13:22:08.859826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-07-31 13:22:08.859829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-07-31 13:22:08.860214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
Collecting 3072 neurons from 4 layers (5024 samples, 10 objects)
W0731 13:22:08.969132 139832240281408 deprecation.py:323] From ~/google-research/demogen/models/resnet.py:47: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.BatchNormalization instead.  In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.batch_normalization` documentation).
2019-07-31 13:22:10.279506: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key resnet/group_norm/beta not found in checkpoint
Traceback (most recent call last):
  File "demogen/parse_tuning.py", line 84, in <module>
    all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
  File "~/google-research/demogen/extract_layers_util.py", line 98, in extract_layers
    model_config.load_parameters(param_path, sess)
  File "~/google-research/demogen/model_config.py", line 262, in load_parameters
    saver.restore(tf_session, model_dir)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 1302, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
  (0) Not found: Key resnet/group_norm/beta not found in checkpoint
         [[node save_2/RestoreV2 (defined at ~/google-research/demogen/model_config.py:261) ]]
  (1) Not found: Key resnet/group_norm/beta not found in checkpoint
         [[node save_2/RestoreV2 (defined at ~/google-research/demogen/model_config.py:261) ]]
         [[save_2/RestoreV2/_383]]
0 successful operations.
0 derived errors ignored.

Original stack trace for u'save_2/RestoreV2':
  File "demogen/parse_tuning.py", line 84, in <module>
    all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
  File "~/google-research/demogen/extract_layers_util.py", line 98, in extract_layers
    model_config.load_parameters(param_path, sess)
  File "~/google-research/demogen/model_config.py", line 261, in load_parameters
    saver = tf.train.Saver(model_var_list)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Invalid argument:

resnet cifar10 resnet_wide_1.0x_groupnorm_aug_decay_0.0_1
W0731 13:25:40.192379 140184543594304 deprecation_wrapper.py:119] From ~/google-research/demogen/extract_layers_util.py:68: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-07-31 13:25:40.193601: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-31 13:25:40.530512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:25:40.530699: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:25:40.531574: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:25:40.532371: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:25:40.532577: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:25:40.533520: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:25:40.534268: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:25:40.536462: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:25:40.537216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:25:40.537544: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-07-31 13:25:40.596500: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b0d5471f80 executing computations on platform CUDA. Devices:
2019-07-31 13:25:40.596528: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-07-31 13:25:40.627506: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3300000000 Hz
2019-07-31 13:25:40.628479: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b0d3885e60 executing computations on platform Host. Devices:
2019-07-31 13:25:40.628495: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-07-31 13:25:40.628967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:25:40.629006: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:25:40.629014: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:25:40.629021: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:25:40.629036: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:25:40.629043: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:25:40.629066: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:25:40.629073: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:25:40.629738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:25:40.629757: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:25:40.630519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 13:25:40.630526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-07-31 13:25:40.630529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-07-31 13:25:40.631262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
W0731 13:25:40.637204 140184543594304 deprecation.py:323] From ~/.local/lib64/python2.7/site-packages/tensor2tensor/data_generators/problem.py:680: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0731 13:25:40.649481 140184543594304 deprecation_wrapper.py:119] From ~/.local/lib64/python2.7/site-packages/tensor2tensor/data_generators/image_utils.py:169: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

W0731 13:25:40.820377 140184543594304 deprecation.py:323] From ~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/image_ops_impl.py:1514: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
W0731 13:25:40.825278 140184543594304 deprecation.py:323] From ~/google-research/demogen/data_util.py:76: make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
W0731 13:25:40.837275 140184543594304 deprecation_wrapper.py:119] From ~/google-research/demogen/extract_layers_util.py:76: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

Collecting 3072 neurons from 4 layers (5024 samples, 10 objects)
W0731 13:25:40.838011 140184543594304 deprecation_wrapper.py:119] From ~/google-research/demogen/models/resnet.py:383: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

W0731 13:25:40.838236 140184543594304 deprecation.py:323] From ~/google-research/demogen/models/resnet.py:136: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
W0731 13:25:41.450165 140184543594304 deprecation.py:323] From ~/google-research/demogen/models/resnet.py:430: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
W0731 13:25:41.451108 140184543594304 deprecation.py:506] From ~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0731 13:25:42.371058 140184543594304 deprecation_wrapper.py:119] From ~/google-research/demogen/model_config.py:261: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W0731 13:25:42.421227 140184543594304 deprecation.py:323] From ~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "demogen/parse_tuning.py", line 84, in <module>
    all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
  File "~/google-research/demogen/extract_layers_util.py", line 98, in extract_layers
    model_config.load_parameters(param_path, sess)
  File "~/google-research/demogen/model_config.py", line 262, in load_parameters
    saver.restore(tf_session, model_dir)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 1322, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
  (0) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [1,32,1,1] rhs shape= [32]
         [[node save/Assign_50 (defined at ~/google-research/demogen/model_config.py:261) ]]
         [[save/RestoreV2/_120]]
  (1) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [1,32,1,1] rhs shape= [32]
         [[node save/Assign_50 (defined at ~/google-research/demogen/model_config.py:261) ]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node save/Assign_50:
 resnet/group_norm_15/beta (defined at ~/google-research/demogen/models/resnet.py:66)

Input Source operations connected to node save/Assign_50:
 resnet/group_norm_15/beta (defined at ~/google-research/demogen/models/resnet.py:66)

Original stack trace for u'save/Assign_50':
  File "demogen/parse_tuning.py", line 84, in <module>
    all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
  File "~/google-research/demogen/extract_layers_util.py", line 98, in extract_layers
    model_config.load_parameters(param_path, sess)
  File "~/google-research/demogen/model_config.py", line 261, in load_parameters
    saver = tf.train.Saver(model_var_list)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saver.py", line 350, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 72, in restore
    self.op.get_shape().is_fully_defined())
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 227, in assign
    validate_shape=validate_shape)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 66, in assign
    use_locking=use_locking, name=name)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

ValueError logs:

resnet cifar10 resnet_wide_1.0x_groupnorm__decay_0.002_lr_0.001_3
2019-07-31 13:19:29.317723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:19:29.317779: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:19:29.317789: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:19:29.317803: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:19:29.317811: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:19:29.317819: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:19:29.317826: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:19:29.317834: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:19:29.318213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:19:29.318235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 13:19:29.318239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-07-31 13:19:29.318243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-07-31 13:19:29.318637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
Collecting 3072 neurons from 4 layers (5024 samples, 10 objects)
2019-07-31 13:19:36.919128: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key resnet/batch_normalization/beta not found in checkpoint
resnet cifar10 resnet_wide_2.0x_batchnorm_aug_decay_0.0_1
2019-07-31 13:19:37.275569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
2019-07-31 13:19:37.275624: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-31 13:19:37.275643: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-31 13:19:37.275652: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-31 13:19:37.275659: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-31 13:19:37.275667: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-31 13:19:37.275675: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-31 13:19:37.275691: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-31 13:19:37.276063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-31 13:19:37.276086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-31 13:19:37.276090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-07-31 13:19:37.276094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-07-31 13:19:37.276490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
Collecting 3072 neurons from 4 layers (5024 samples, 10 objects)
Traceback (most recent call last):
  File "demogen/parse_tuning.py", line 84, in <module>
    all_activations, samples_per_object, layer_names, layer_indices, layer_n_neurons = elu.extract_layers(input_fn, root_dir, model_config)
  File "~/google-research/demogen/extract_layers_util.py", line 89, in extract_layers
    end_points_collection=end_points_collection)
  File "~/google-research/demogen/models/resnet.py", line 391, in __call__
    strides=self.conv_stride, data_format=self.data_format)
  File "~/google-research/demogen/models/resnet.py", line 136, in conv2d_fixed_padding
    data_format=data_format)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/convolutional.py", line 424, in conv2d
    return layer.apply(inputs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/base.py", line 537, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 591, in __call__
    self._maybe_build(inputs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1881, in _maybe_build
    self.build(input_shapes)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/layers/convolutional.py", line 165, in build
    dtype=self.dtype)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/layers/base.py", line 450, in add_weight
    **kwargs)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 384, in add_weight
    aggregation=aggregation)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/training/tracking/base.py", line 663, in _add_variable_with_custom_getter
    **kwargs_for_getter)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable
    aggregation=aggregation)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable
    aggregation=aggregation)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable
    aggregation=aggregation)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter
    aggregation=aggregation)
  File "~/.local/lib64/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 869, in _get_single_variable
    (name, shape, found_var.get_shape()))
ValueError: Trying to share variable resnet/conv2d/kernel, but specified shape (3, 3, 3, 32) and found shape (3, 3, 3, 16).

how to run multiple references with best score?

Trouble in running rouge package

I was trying to use the rouge package in a multi reference scenario? Is it possible to provide an example in the rouge/readme on how one can do that?
In the current example in "How to run" where do you define the directory of the files?

Also, I am wondering how do you compute rouge for multi-reference scenario. Looking into the code you do some sort of bootstrap aggregation while in the original paper here (https://www.aclweb.org/anthology/W04-1013) in section 2 looks like they do simple micro-averaging while in section 2.1 it seems they do maximization over pairewise computation in rouge!

Thanks,
MP

Rouge - Handling empty last line of prediction file

Hi,
What's the best way to handle mismatched lines in source and target. Currently, the code fails on line 111:

# Check whether num_targets < num_predictions
if next(pred_gen, None) is not None or next(pred_gen, None) is not None:
  raise ValueError("Must have equal number of lines across target and "
                   "prediction files. Mismatch between files: %s, %s." %
                   (target_filename, prediction_filename))

My predictions files have more sentences than the target and the pyrouge package which is built on the PERL rouge handles this without breaking. Should I be pre-processing my target files to have same no: of lines as the prediction?

Can this be handled the rouge package level instead similar to the perl one?
Commenting out this check leads to 10 points lower in all metrics as compares to the pyrouge outputs.

Let me know if any additional code/output files are required to better understand this issue.

depth_from_video_in_the_wild: not able to reproduce the result

@gariel-google Dear author, Thanks for sharing the source code of the paper.
I was trying to reproduce the result of the paper using your code. However, with your default setting (batch size=4, learning_rate=0.0002, etc.) training from scratch, the result I got it's quite far from what you stated in the paper (Abs Rel 0.147 for the best checkpoint within around 370k-th step vs 0.128 in the paper). For your information, I am using the evaluation code from sfmlearner as what struct2depth does.
Therefore, may I know what's setting for obtaining the paper's result? Or is there anything critical part missing in the current released code (maybe pretrained checkpoint for example)?
Thank you in advance.

Code readability and potential confusion in tcc/visualize_alignment.py

Hello,
a section of the tcc code in visualize_alignment.py has high potential for confusion and misuse. The align function is defined as follows:

def align(candidate_feats, query_feats, use_dtw):
  """Align videos based on nearest neighbor in embedding space."""
  if use_dtw:
    _, _, _, path = dtw(candidate_feats, query_feats, dist=dist_fn)
    _, uix = np.unique(path[0], return_index=True)
    nns = path[1][uix]
  else:
    nns = []
    for i in range(len(candidate_feats)):
      nn_frame_id, _ = get_nn(query_feats, candidate_feats[i])
      nns.append(nn_frame_id)
  return nns

The function call is:
nns.append(align(embs[query], embs[candidate], use_dtw))

The positional arguments for the query and candidate features are reversed. Clearly, we do not want to iterate over the candidate frame matching it to the reference. There is no logical error as the arguments are passed in to the function in reverse order but it may lead to issues downstream if these functions are built upon.

The function definition should read:

def align(query_feats,candidate_feats, use_dtw):
  """Align videos based on nearest neighbor in embedding space."""
  if use_dtw:
    _, _, _, path = dtw(query_feats,candidate_feats, dist=dist_fn)
    _, uix = np.unique(path[0], return_index=True)
    nns = path[1][uix]
  else:
    nns = []
    for i in range(len(query_feats)):
      nn_frame_id, _ = get_nn(query_feats[i], candidate_feats)
      nns.append(nn_frame_id)
  return nns

depth_from_video_in_the_wild: image size for pretrained models

Hi @gariel-google, are the models that you provide trained on images 416x128? When I tried inference with other resolutions it doesn't work well at all.

If it's indeed 416x128, have you tried training with higher resolutions? I know some previous work use 416x128 for training, but recently most methods use higher resolutions and experiments have demonstrated higher resolutions lead to better results. Is it something related to the GPU memory issue?

Setup.py

Hi, what do you think about creating setup.py file to enable pip install git+...
Don't mind contributing if you find this helpful.

Thanks.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 1: invalid continuation byte

saver = tf.train.import_meta_graph(FLAGS.model_ckpt + '.meta')

File "C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1449, in import_meta_graph
**kwargs)[0]

File "C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1463, in _import_meta_graph_with_return_elements
meta_graph_def = meta_graph.read_meta_graph_file(meta_graph_or_file)

File "C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\meta_graph.py", line 695, in read_meta_graph_file
text_format.Merge(file_content.decode("utf-8"), meta_graph_def)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 1: invalid continuation byte

schema_guided_dst baseline code causes error when running on Cloud TPU

I use Tensorflow 1.14.

Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/dhlee347_estsoft_com/google-research/schema_guided_dst/baseline/train_and_predict.py", line 908, in
tf.compat.v1.app.run(main)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/dhlee347_estsoft_com/google-research/schema_guided_dst/baseline/train_and_predict.py", line 854, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1411, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1169, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1158, in _run
self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 487, in init
self._assert_fetchable(graph, fetch.op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 500, in _assert_fetchable
'Operation %r has been marked as not fetchable.' % op.name)
ValueError: Operation u'truediv' has been marked as not fetchable.
ERROR:tensorflow:Closing session due to error Step was cancelled by an explicit call to Session::Close().

In enas_lm model

In the paper, they said
"For the output, we simply average all the loose ends, i.e. the nodes that are not selected as inputs to any other nodes"

however in the code below

next_s = tf.add_n(layers[1:]) / tf.cast(num_layers, dtype=tf.float32)
all_s = all_s.write(step, next_s)

it seems like averaging all the outputs of the nodes..
Is this the right implementation?

google-research / google-research Goto Github PK

google-research's Introduction

Google Research

google-research's People

Contributors

Stargazers

Watchers

Forkers

google-research's Issues

Knowledge transfer during iterative sparsification

Knowledge reconstitution

Recommend Projects

Recommend Topics

Recommend Org