tensorpack / tensorpack Goto Github PK

View Code? Open in Web Editor NEW

6.3K 198.0 1.8K 8.88 MB

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

License: Apache License 2.0

Python 99.82% Shell 0.18%

tensorflow deep-learning reinforcement-learning neural-networks machine-learning

tensorpack's Introduction

Tensorpack is a neural network training interface based on graph-mode TensorFlow.

Features:

It's Yet Another TF high-level API, with the following highlights:

Focus on training speed.

Speed comes for free with Tensorpack -- it uses TensorFlow in the efficient way with no extra overhead. On common CNNs, it runs training 1.2~5x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack.
Scalable data-parallel multi-GPU / distributed training strategy is off-the-shelf to use. See tensorpack/benchmarks for more benchmarks.

Squeeze the best data loading performance of Python with tensorpack.dataflow.

Symbolic programming (e.g. tf.data) does not offer the data processing flexibility needed in research. Tensorpack squeezes the most performance out of pure Python with various autoparallelization strategies.

Focus on reproducible and flexible research:

Built and used by researchers, we provide high-quality reproducible implementation of papers.

It's not a model wrapper.

There are too many symbolic function wrappers already. Tensorpack includes only a few common layers. You can use any TF symbolic functions inside Tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/....

See tutorials and documentations to know more about these features.

Examples:

We refuse toy examples. Instead of showing tiny CNNs trained on MNIST/Cifar10, we provide training scripts that reproduce well-known papers.

We refuse low-quality implementations. Unlike most open source repos which only implement papers, Tensorpack examples faithfully reproduce papers, demonstrating its flexibility for actual research.

Vision:

Train ResNet and other models on ImageNet
Train Mask/Faster R-CNN on COCO object detection
Unsupervised learning with Momentum Contrast (MoCo)
Adversarial training with state-of-the-art robustness
Generative Adversarial Network(GAN) variants, including DCGAN, InfoGAN, Conditional GAN, WGAN, BEGAN, DiscoGAN, Image to Image, CycleGAN
DoReFa-Net: train binary / low-bitwidth CNN on ImageNet
Fully-convolutional Network for Holistically-Nested Edge Detection(HED)
Spatial Transformer Networks on MNIST addition
Visualize CNN saliency maps

Reinforcement Learning:

Deep Q-Network(DQN) variants on Atari games, including DQN, DoubleDQN, DuelingDQN.
Asynchronous Advantage Actor-Critic(A3C) with demos on OpenAI Gym

Speech / NLP:

Install:

Dependencies:

Python 3.3+.
Python bindings for OpenCV. (Optional, but required by a lot of features)
TensorFlow ≥ 1.5
- TF is not not required if you only want to use tensorpack.dataflow alone as a data processing library
- When using TF2, tensorpack uses its TF1 compatibility mode. Note that a few examples in the repo are not yet migrated to support TF2.

pip install --upgrade git+https://github.com/tensorpack/tensorpack.git
# or add `--user` to install to user's local directories

Please note that tensorpack is not yet stable. If you use tensorpack in your code, remember to mark the exact version of tensorpack you use as your dependencies.

Citing Tensorpack:

If you use Tensorpack in your research or wish to refer to the examples, please cite with:

@misc{wu2016tensorpack,
  title={Tensorpack},
  author={Wu, Yuxin and others},
  howpublished={\url{https://github.com/tensorpack/}},
  year={2016}
}

tensorpack's People

Contributors

Stargazers

Watchers

Forkers

saifrahmed syzh1991 gongenhao jothecat chengyang317 desperado1992 rahulbprakash ml-ai-nlp-ir nagyistge asaydin chagge amirstar apo-j ml-lab paseam superjohnior zhongxingpeng czhu95 floodsung beijinggao yk vyraun ricky1203 mvpduncan ikvision yixuanli wanjinchang benjamesbabala sujaynarumanchi guzhaoyuan revilokeb skaasj pzz2011 yif0 yuyincug elviswf sheuan lunardog erichuang0771 ceciliagoca macromaniac pkumusic clairett zsc splendor-kill thurachel hongzhili sande12142 cjluo aliscifp lepikhin jamesr66a whuguozili kaixianglin lyxgjzsya tranlm dylanthomas hyuntaehwang shobhitsinghal624 chengchengowen linzichuan runngezhang angusg aistrych piotrmilos patwie acmonster mot12341234 badalgeek prwik giddyinc zhiqiangwan silverlining21 ben020514 emrys-lee a-maci jimmycai91 junmyung giranntu anida-qin wesky93 yenchenlin sjtuzhanglj canbuoy andrewliao11 pbarker liaoheping kyate ryannnxu kkothari93 liuguoyou nitinh minganlin yangzlthu hope-yao wiibrew bingzhewu shihmengli peratham cpehle

tensorpack's Issues

Could you teach me how to know the test accuracy?

Thank you for reading my issue.
Could you teach me how to know the test accuracy?
I could know about validation error but I couldn't know how to test the new data which are not used in training phase.

I am trying to use DoReFa Net

Quantize scheme for FPGA

This paper uses round((2^k-1)x)/(2^k-1) to quantize x, which may be not suitable for FPGA to represent. For example, if k=2, the quantized valve is 0, 0.33, 0.67, 1, however FPGA only could represents, 0, 0.25, 0.5, 0.75, if not using lookup table.

Equation (1) from DoReFa Net - XNOR-Bitcount equivalent for bitwise dot product

Hello, I'm hoping you could help me understand Equation (1) from the DoReFa-Net paper. You say that

the following equivalence computes the dot product of two bit vectors x and y:
Σ_ix_iy_i = bitcount(xnor(x_i, y_i)), x_i, y_i ∈ {0, 1} ∀ i

If we evaluate with two bit-vectors, a := {1, 1, 0} and b := {0, 1, 0} this equivalence does not seem to hold.

a := 1 1 0
b := 0 1 0
let c := xnor(a, b) == 0 1 1
bitcount(c) = 2
bit-wise dot product of (a, b) == 1 * 0 + 1 * 1 + 0 * 0 = 1
2 != 1

What am I misunderstanding? Thank you.

How to convert the trained tensorflow model to numpy format

How to convert the trained tensorflow model to numpy format that you provided in the DoReFa demo? thanks.

Exception gym.error.Error: Error('env has been garbage collected.

After running MsPacman-v0 I'm getting this error after about an hour of it working fine
$
$ python run-atari.py --load MsPacman-v0.tfmodel --env MsPacman-v0

....
('Total:', 7500.0)
('Total:', 6770.0)
('Total:', 6970.0)
('Total:', 6110.0)
Exception gym.error.Error: Error('env has been garbage collected. To keep using a monitor, you must keep around a reference to the env object. (HINT: try assigning the env to a variable in your code.)',) in <bound method AtariEnv.del of <gym.envs.atari.atari_env.AtariEnv object at 0x11c966450>> ignored
Exception gym.error.Error: Error('env has been garbage collected. To keep using a monitor, you must keep around a reference to the env object. (HINT: try assigning the env to a variable in your code.)',) in <bound method Monitor.del of <gym.monitoring.monitor.Monitor object at 0x11c966410>> ignored
$
$
$
$

Training a model in Env Pong-v0 from scratch has not progress

Hi, ppwwyyxx,

I am trying to using your code to learn A3C. Could you tell me how long does it take to train a model fro scratch?

the command I use:
./train-atari.py --env Pong-v0 --gpu 0

I use one Tesla K40 and Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz, after training 30 minutes it's still show as follows:

[1112 22:35:24 @concurrency.py:24] Starting EnqueueThread
[1112 22:35:24 @base.py:130] Start training with global_step=0
  0%|                                                               |0/6000[00:00<?,?it/s]

Is there anything wrong? Thanks a lot

A small question

Python 2

@ppwwyyxx awesome work and thanks for sharing it. Do you know which parts really require python 2 or how non trivial it may be to adapt it to python 3?

Examples no training progress

I setup tensorpack and i can run all the examples. But the training shows no progress at all. For Cifar 10 and mnist the training and validation error doesnt change at all over multiple epochs.

I use tensorflow 0.9 and a GTX 1080. A hint of any kind would be helpfull. I like the architecture of the tensorpack and i would like to use it.

Quantize the weights to ternary (-1, 0 1)?

I would like to quantize the weights to ternary (-1, 0 +1) , which could increase the diversity of the conv. kernel while won't afford much more logic resource in FPGA, I wrote it like:

E = tf.stop_gradient(tf.reduce_mean(tf.abs(x)))
clip_x = tf.clip_by_value(x/E, -2.0, 2.0) / 2.0
with G.gradient_override_map({"Floor": "Identity"}):
    return tf.round(clip_x) * E

However, it seems not work fine? Is there any good approach to implement this?

Missing license file

The license file is missing

A3C batch modification

I wonder what kind of modification to the original A3C algorithm you have made in your Batch A3C variant? Could you describe it in pseudo code?

resnet on imagenet

Not an issue per se but more of a question: could you provide resent configs (34, 50, 101, 1001, etc) for training on imagenet dataset?

Thanks

Before and After Step Design

Hi,
Could you please alow acces to "before_step" and "after_step" ? sometimes it is crucial to

run something more often then once per epoch (or just with different frequency)
to run something explicitely BEFORE train_op (like some tensor statistics)
it makes life easier defining timers etc. (it is crucial to benchmark times of train_op, other_run_ops, loading data etc.)

Equation (3) From DoReFa July v2 Paper - only defining a single fixed-point int, not a sequence?

Hello again, thanks for the updated paper - equation 1 now makes sense to me. Could you please help me understand your intent with equation 3?

You say the following:

However, it does not seem like x and y are sequences of integers, it seems like they are single integers. For example, x is some M-bitwidth fixed-point integer, and y is some K-bitwidth fixed point integer. The reason being, if they were sequences of multiple M-bitwidth (and K-bitwidth) integers, say p many M-bitwidth integers and q many K-bitwidth integers, then the bitwise dot product would need to iterate over these sequences.

Currently, the bitwise dot product as defined only executes MK many summations. There is no Σ for the p many M-bitwidth integers, nor for the q many K-bitwidth integers.

Also, you define x and y as summations of bits to varying powers of 2 (beginning part of the quote above). It seems like they could only represent a single fixed-point integer, not a sequence of fixed-point integers.

Please let me know if I am misunderstanding something, thank you.

EDIT: Two more points:
(1) I think we would also need the constraint that p == q, i.e. that the two sequences are of equivalent lengths, if they are to represent the dot product of two vectors.

(2) The bitcount operation inside the summation seems irrelevant. The bitcount of a bitwise and will be 1 iff the bitwise and evaluates to 1, and 0 otherwise. In other words, it is redundant with the and operation.

I feel I may be misunderstanding what your notation means. Please help me clarify, thank you.

Fix performance degredation in multi tower training.

When training large models with multi tower, iteration speed can drop significantly after running for a while. Need to investigatet this.

Quantizing Gradients - Meaning of max0() operator in DoReFa v2 paper?

Thank you for your help so far.

(1) In section 2.5 on quantizing gradients you use an operator called max₀ but do not define it. I did not find a definition in the XNOR or BNN papers either. What does this operator do? How is it different from the regular max() operator?

(2) Second, you say that dr / 2max₀(|dr|) + 1/2 is an affine transform to map the gradient into [0,1], but it seems like in your code you apply an additional step to manually clip the values. Why do you need this additional step?

Code: https://github.com/ppwwyyxx/tensorpack/blob/master/examples/DoReFa-Net/dorefa.py

 def grad_fg(op, x):
            rank = x.get_shape().ndims
            assert rank is not None
            maxx = tf.reduce_max(tf.abs(x), list(range(1,rank)), keep_dims=True)
            x = x / maxx
            n = float(2**bitG-1)
            x = x * 0.5 + 0.5 + tf.random_uniform(
                    tf.shape(x), minval=-0.5/n, maxval=0.5/n)
            x = tf.clip_by_value(x, 0.0, 1.0) # this is the extra step not in the paper
            x = quantize(x, bitG) - 0.5
            return x * maxx * 2

(3) I am also having trouble understanding this line, could you please explain? - maxx = tf.reduce_max(tf.abs(x), list(range(1,rank)), keep_dims=True).

It seems like list(range(1,rank)) is somehow related to your statement that "Here dr = ∂c/∂r is the back-propagated gradient of the output r of some layer, and the maximum is taken over all axis of the gradient tensor dr except for the mini-batch axis (therefore each instance in a mini-batch will have its own scaling factor)", but I do not understand this sentence either. Thank you for your help!

Multi Task Learning

Does support for multi task learning exist? If not, which way would fit this architecture? I am talking about a setup where a new task is chosen for each minibatch pass.

Thank you for any help you can provide.

I am making DoReFa Net for cifar image set.

Thank you for reading my issue.
Now I am making DoReFa Net for cifar image data set.
I would like to implement " .BatchNorm() " in the source code ,but
a error was occured.
The error is this.

Traceback (most recent call last):
File "cifar-dorefa.py", line 197, in
SimpleTrainer(config).train()
File "/home/tomohiro/github/tensorpack/tensorpack/train/trainer.py", line 84, in train
self.main_loop()
File "/home/tomohiro/github/tensorpack/tensorpack/train/base.py", line 108, in main_loop
callbacks.setup_graph(self) # TODO use weakref instead?
File "/home/tomohiro/github/tensorpack/tensorpack/callbacks/base.py", line 52, in setup_graph
self._setup_graph()
File "/home/tomohiro/github/tensorpack/tensorpack/callbacks/group.py", line 126, in _setup_graph
cb.setup_graph(self.trainer)
File "/home/tomohiro/github/tensorpack/tensorpack/callbacks/base.py", line 52, in setup_graph
self._setup_graph()
File "/home/tomohiro/github/tensorpack/tensorpack/callbacks/inference.py", line 88, in _setup_graph
input_names, self.output_tensors)
File "/home/tomohiro/github/tensorpack/tensorpack/train/trainer.py", line 117, in get_predict_func
return self.predictor_factory.get_predictor(input_names, output_names, 0)
File "/home/tomohiro/github/tensorpack/tensorpack/train/trainer.py", line 42, in get_predictor
self._build_predict_tower()
File "/home/tomohiro/github/tensorpack/tensorpack/train/trainer.py", line 55, in _build_predict_tower
self.model, self.towers, prefix=self.PREFIX)
File "/home/tomohiro/github/tensorpack/tensorpack/predict/base.py", line 112, in build_multi_tower_prediction_graph
model._build_graph(input_vars, False)
File "cifar-dorefa.py", line 76, in _build_graph
.BatchNorm('bn2')
File "/home/tomohiro/github/tensorpack/tensorpack/models/init.py", line 53, in f
ret = layer(name, self._t, _args, *_kwargs)
File "/home/tomohiro/github/tensorpack/tensorpack/models/_common.py", line 54, in wrapped_func
outputs = func(_args, *_actual_args)
File "/home/tomohiro/github/tensorpack/tensorpack/models/batch_norm.py", line 70, in BatchNorm
assert not use_local_stat
AssertionError

And my program is this.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# File: cifar-convnet.py
# Author: Yuxin Wu <[email protected]>
import tensorflow as tf
import argparse
import numpy as np
import os

from tensorpack import *
import tensorpack.tfutils.symbolic_functions as symbf
from tensorpack.tfutils.summary import *
from dorefa import get_dorefa
"""
A small convnet model for Cifar10 or Cifar100 dataset.

Cifar10:
    90% validation accuracy after 40k step.
    91% accuracy after 80k step.
    19.3 step/s on Tesla M40

Not a good model for Cifar100, just for demonstration.
"""
BITW = 1
BITA = 2
BITG = 6
BATCH_SIZE = 32
class Model(ModelDesc):
    def __init__(self, cifar_classnum):
        super(Model, self).__init__()
        self.cifar_classnum = cifar_classnum

    def _get_input_vars(self):
        return [InputVar(tf.float32, [None, 30, 30, 3], 'input'),
                InputVar(tf.int32, [None], 'label')]

    def _build_graph(self, input_vars, is_training):

        image, label = input_vars
        image = image / 4.0     # just to make range smaller

        fw, fa, fg = get_dorefa(BITW, BITA, BITG)
        # monkey-patch tf.get_variable to apply fw
        old_get_variable = tf.get_variable
        def new_get_variable(name, shape=None, **kwargs):
            v = old_get_variable(name, shape, **kwargs)
            # don't binarize first and last layer
            if name != 'W' or 'conv0' in v.op.name or 'fct' in v.op.name:
                return v
            else:
                logger.info("Binarizing weight {}".format(v.op.name))
                return fw(v)
        tf.get_variable = new_get_variable

        def nonlin(x):
            if BITA == 32:
                return tf.nn.relu(x)    # still use relu for 32bit cases
            return tf.clip_by_value(x, 0.0, 1.0)

        def activate(x):
            return fa(nonlin(x))
        def cabs(x):
            return tf.minimum(1.0, tf.abs(x), name='cabs')

        keep_prob = tf.constant(0.5 if is_training else 1.0)

        if is_training:
            tf.image_summary("train_image", image, 10)

        with argscope(BatchNorm, decay=0.9, epsilon=1e-4), \
                argscope(FullyConnected, use_bias=False, nl=tf.identity), \
                argscope(Conv2D, nl=BNReLU(is_training), use_bias=False, kernel_shape=3):
            logits = LinearWrap(image) \
                    .Conv2D('conv1.1', out_channel=64)\
                    .Conv2D('conv1.2', out_channel=64) \
                    .BatchNorm('bn2')\
                    .apply(fg)\
                    .apply(activate)\
                    .MaxPooling('pool1', 3, stride=2, padding='SAME') \
                    .apply(activate)\
                    .Conv2D('conv2.1', out_channel=128)\
                    .apply(activate)\
                    .Conv2D('conv2.2', out_channel=128)\
                    .MaxPooling('pool2', 3, stride=2, padding='SAME') \
                    .apply(activate)\
                    .Conv2D('conv3.1', out_channel=128, padding='VALID') \
                    .apply(fg)\
                    .BatchNorm('bn1')\
                    .apply(activate)\
                    .Conv2D('conv3.2', out_channel=128, padding='VALID') \
                    .apply(activate)\
                    .FullyConnected('fc0', 1024 + 512,
                           b_init=tf.constant_initializer(0.1)) \
                    .tf.nn.dropout(keep_prob) \
                    .FullyConnected('fc1', 512,
                           b_init=tf.constant_initializer(0.1)) \
                    .apply(cabs)\
                    .FullyConnected('linear', out_dim=self.cifar_classnum, nl=tf.identity)()

        cost = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, label)
        cost = tf.reduce_mean(cost, name='cross_entropy_loss')
        tf.get_variable = old_get_variable
        prob = tf.nn.softmax(logits, name='output')

        # compute the number of failed samples, for ClassificationError to use at test time
        wrong = symbf.prediction_incorrect(logits, label)
        nr_wrong = tf.reduce_sum(wrong, name='wrong')
        # monitor training error
        add_moving_summary(tf.reduce_mean(wrong, name='train_error'))

        # weight decay on all W of fc layers
        wd_cost = tf.mul(0.004,
                         regularize_cost('fc.*/W', tf.nn.l2_loss),
                         name='regularize_loss')
        add_moving_summary(cost, wd_cost)

        add_param_summary([('.*/W', ['histogram'])])   # monitor W
        self.cost = tf.add_n([cost, wd_cost], name='cost')

def get_data(train_or_test, cifar_classnum):
    isTrain = train_or_test == 'train'
    if cifar_classnum == 10:
        ds = dataset.Cifar10(train_or_test)
    else:
        ds = dataset.Cifar100(train_or_test)
    if isTrain:
        augmentors = [
            imgaug.RandomCrop((30, 30)),
            imgaug.Flip(horiz=True),
            imgaug.Brightness(63),
            imgaug.Contrast((0.2,1.8)),
            imgaug.GaussianDeform(
                [(0.2, 0.2), (0.2, 0.8), (0.8,0.8), (0.8,0.2)],
                (30,30), 0.2, 3),
            imgaug.MeanVarianceNormalize(all_channel=True)
        ]
    else:
        augmentors = [
            imgaug.CenterCrop((30, 30)),
            imgaug.MeanVarianceNormalize(all_channel=True)
        ]
    ds = AugmentImageComponent(ds, augmentors)
    ds = BatchData(ds, 128, remainder=not isTrain)
    if isTrain:
        ds = PrefetchData(ds, 3, 2)
    return ds
def get_config(cifar_classnum):
    logger.auto_set_dir()

    # prepare dataset
    dataset_train = get_data('train', cifar_classnum)
    step_per_epoch = dataset_train.size()
    dataset_test = get_data('test', cifar_classnum)

    sess_config = get_default_sess_config(0.5)

    nr_gpu = get_nr_gpu()
    lr = tf.train.exponential_decay(
        learning_rate=1e-2,
        global_step=get_global_step_var(),
        decay_steps=step_per_epoch * (30 if nr_gpu == 1 else 20),
        decay_rate=0.5, staircase=True, name='learning_rate')
    tf.scalar_summary('learning_rate', lr)

    return TrainConfig(
        dataset=dataset_train,
        optimizer=tf.train.AdamOptimizer(lr, epsilon=1e-3),
        callbacks=Callbacks([
            StatPrinter(),
            ModelSaver(),
            InferenceRunner(dataset_train, ClassificationError())#dataset_testに書き換える
        ]),
        session_config=sess_config,
        model=Model(cifar_classnum),
        step_per_epoch=step_per_epoch,
        max_epoch=250,
    )

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', help='comma separated list of GPU(s) to use.') # nargs='*' in multi mode
    parser.add_argument('--load', help='load model')
    parser.add_argument('--classnum', help='10 for cifar10 or 100 for cifar100',
                        type=int, default=10)
    args = parser.parse_args()

    if args.gpu:
        os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
    else:
        os.environ['CUDA_VISIBLE_DEVICES'] = '0'

    with tf.Graph().as_default():
        config = get_config(args.classnum)
        if args.load:
            config.session_init = SaverRestore(args.load)
        if args.gpu:
            config.nr_tower = len(args.gpu.split(','))
        #QueueInpuTrainer(config).train()
        SimpleTrainer(config).train()

I changed the with statement as "with argscope(BatchNorm, decay=0.9, epsilon=1e-4, use_local_stat=is_training), ",and the another error was occured.

Traceback (most recent call last):
  File "cifar-dorefa.py", line 204, in <module>
    QueueInputTrainer(config).train()
  File "/home/tomohiro/github/tensorpack/tensorpack/train/trainer.py", line 222, in train
    grads = self._single_tower_grad()
  File "/home/tomohiro/github/tensorpack/tensorpack/train/trainer.py", line 204, in _single_tower_grad
    self.model.build_graph(self.dequed_inputs, True)
  File "/home/tomohiro/github/tensorpack/tensorpack/models/model_desc.py", line 60, in build_graph
    self._build_graph(model_inputs, is_training)
  File "cifar-dorefa.py", line 79, in _build_graph
    .Conv2D('conv1.1', out_channel=64)\
  File "/home/tomohiro/github/tensorpack/tensorpack/models/__init__.py", line 53, in f
    ret = layer(name, self._t, *args, **kwargs)
  File "/home/tomohiro/github/tensorpack/tensorpack/models/_common.py", line 54, in wrapped_func
    outputs = func(*args, **actual_args)
  File "/home/tomohiro/github/tensorpack/tensorpack/models/conv2d.py", line 62, in Conv2D
    return nl(tf.nn.bias_add(conv, b) if use_bias else conv, name='output')
  File "/home/tomohiro/github/tensorpack/tensorpack/models/nonlin.py", line 74, in BNReLU
    x = BatchNorm('bn', x, is_training, **kwargs)
  File "/home/tomohiro/github/tensorpack/tensorpack/models/_common.py", line 54, in wrapped_func
    outputs = func(*args, **actual_args)
TypeError: BatchNorm() got multiple values for keyword argument 'use_local_stat'

Sorry for very long sentences,but if my code is completed,then I can contribute for yours.

Image Augmentors

Error when enabling the float64 in train and inference

I want to enable float64 in the train and inference, I only change the input type from float32 to float64, but I got the following error...what is wrong? I checked the tf document, it should support float64.

Input 'filter' of 'Conv2D' Op has type float32 that does not match type float64 of argument 'input'

batch A3C?

Hi Yuxin,

In your A3C implementation for Atari game you mentioned that you "used a modified version where each batch contains transitions from different simulators, which I called "Batch-A3C"" (https://github.com/ppwwyyxx/tensorpack/tree/master/examples/Atari2600), could you give me more detailed information on this? And does it improve the performance of original A3C? Thanks!

Error on custom gym-env with train-atari.py

I'm created my own gym environment, and am trying to use your code on it. The state is an 84x84x3 image and has discrete actions. Everything runs fine up until it starts the graph, and I get the following error.

File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 894, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (2, 84, 84, 16) for Tensor u'state:0', which has shape '(?, 84, 84, 12)'

Have you seen anything like this in your development, or do you have any clue as to what is going wrong?

Thanks so much. This is a really fantastic package.

Half VGG

Hi guys!
I saw at CVPR 2016 your demo with real-time network which segments people and works on mobile. Are you planning to make model and code for it open source?
Thank you

Using bitwise convolutions with negative weights?

Could not reopen the issue, please see here for more context: #27 (comment)

The sign-vs-unsign problem is more relevant in FPGA. But as we are only doing summation, unsign numbers should be fine.

How do you implement this bitwise dot product kernel (equation 3 in section 2.1, DoReFa v2 paper) for negative weights?

The quantize_k function defined in Section 2.2 as Equation 5 outputs a number r_o ∈ [0, 1]. The affine transform on F_w^k(r_i) in Equation 9 takes the output of a quantize_k function and multiplies by 2 and subtracts 1: F_w^k(r_i) = 2 * quantize_k(stuff) - 1.

Thus F_w^k(r_i) ∈ [-1, 1].

However, the procedure you define in Equation 3 only works for unsigned values. If some values x_i in the sequence x or y_i in the sequence y are negative, then their contribution to the dot product is a subtraction, not an addition, so the simple bitcount(and()) operation no longer suffices.

How did you change the bitwise dot product procedure to account for negative weights?

One possibility:

Add an additional sign bit to all M-bit fixed point integers x_i ∈ x and all K-bit fixed point integers y_i ∈ y.
This bit is 1 if the number x_i is negative, and 0 if x_i is positive (likewise for the y_i), but does not count as a place-value bit for multiplication.
Let bitwise_and_{(m, k)} = and(c_m(x), c_k(y)), ∀(m, k), ignoring the sign bits.
Let bitwise_sign = xor(x_{i_{signed bit}}, y_{i_{signed bit}}). This gives us the sign of the product of x_i and y_i
∀ bitwise_and_{(m, k)} ∀(m, k), note that bitwise_sign_{(x_i, y_i)} is a vector giving the sign for each element in bitwise_and_{(m, k)}.
For each pair of vectors ( bitwise_and_{(m, k)}, bitwise_sign_{(x_i, y_i)} ) ∀(m,k), drop all members of bitwise_and_{(m, k)} and their corresponding signs in bitwise_sign_{(x_i, y_i)} where bitwise_and_{(m, k)} == 0. This leaves us with the cases where the bitwise multiplication produced a 1, along with their signs.
For each pair of vectors ( bitwise_and_{(m, k)}, bitwise_sign_{(x_i, y_i)} ) ∀(m,k), compute bitcount[ bitwise_sign_{(x_i, y_i)} ] to get the total number of negatives for the (m*k) place-value. The total number of positives is given by len(bitwise_sign_{(x_i, y_i)}) - bitcount[ bitwise_sign_{(x_i, y_i)} ].
Use the negative and positive accumulations in 7 to get the signed contribution to the dot product.

Multi-Task Learning

In reference to #29.

I am also interested in implementing a multi-task learning model using tensorpack - similar to the "alternating training" example in https://jg8610.github.io/Multi-Task/. In this example, there are 2 different datasets for 2 tasks, and we want to train a model which uses 1 shared layer, and 1 task-specific layer for each task. While the example using plain tensorflow is clear, I am not sure what the best approach is using tensorpack.

If we generate two different DataFlow objects such that each generates different data for each task, how would you use the SyncMultiGPUTrainer with two different DataFlow objects? Since this trainer uses the QueueInputTrainer is it possible to be enqueuing two different DataFlow objects?

Also, for the task specific layers, would it better to use separate layers with separate cost functions (as in the example) or a single layer with selectable weights? How would you tell the trainer to select the appropriate weights during training?

Any help / advice is appreciated.

DoReFa Classification Error

When running the DoReFa alexnet-126.py classification example on a single image, I encounter the following error:

File "./alexnet-dorefa.py", line 304, in <module>
    run_image(Model(), ParamRestore(np.load(args.load).item()), args.run)
...
tensorflow.python.framework.errors.InvalidArgumentError: AttrValue must not have reference type value of float_ref
     for attr 'dtype'
    ; NodeDef: Placeholder = Placeholder[dtype=DT_FLOAT_REF, shape=[], _device="/device:CPU:0"](); Op<name=Placeholder; signature= -> output:dtype; attr=dtype:type; attr=shape:shape,default=[]>

I am running Python 2.7, and this issue occurs regardless of execution on CPU or GPU. Any idea could be causing this?

Gradient Update Step - DoReFa v2 Paper

Thank you for your help. I have three questions about the update procedure for updating weights.

When you initialize the weights on first-run of the neural net, do you initialize to low-bitwidth samples from the normal distribution, or do you initialize to full-precision values?
What bit-width does the update step for the weights use? In the algorithm on page 6 it looks like W_k^t+1 = Update(W_k, gW_k, η) is operating on full precision weights. Why not use the quantized weights instead?
How do you calculate ∂W_k^b / ∂W_k? This should be the partial derivative of the quantized weights with respect to the full precision weights, but I do not know how you calculate that.

Thank you.

Is there a bug in imgaug

Hi, I have been recently using your great tensorpack, but I think the imgaug part may seem exists some bug.
The code below just flip a dataflow, and the result is rather confusing. Perhaps you can try and see the result?

`
%matplotlib inline
import matplotlib.pyplot as plt
from tensorpack import *
from tensorpack.tfutils.symbolic_functions import *
from tensorpack.tfutils.summary import *

cifar10 = dataset.Cifar10('train', shuffle=False)
cifar10.reset_state()
for img_label in cifar10.get_data():
plt.figure()
plt.imshow(img_label[0])
break

flip_cifar10 = AugmentImageComponent(cifar10, [imgaug.Flip(horiz=True, prob=1.0),])
flip_cifar10.reset_state()
for img_label in flip_cifar10.get_data():
plt.figure()
plt.imshow(img_label[0])
break
`

hope to see your doc...

Intresting work, hope you can write more README, so that we can contribute

I tried Pretrained alexnet model. But I can not.

I tried to run:
./alexnet-dorefa.py --load alexnet-126.npy --run a.jpg --dorefa 1,2,6

but taceback occured.

Traceback (most recent call last):
File "./alexnet-dorefa.py", line 305, in
run_image(Model(), ParamRestore(np.load(args.load, encoding='latin1').item()), args.run)
File "./alexnet-dorefa.py", line 256, in run_image
meta = dataset.ILSVRCMeta()
File "/home/sounansu/work/tensorflow/tensorpack/tensorpack/dataflow/dataset/ilsvrc.py", line 34, in init
self.caffepb = get_caffe_pb()
File "/home/sounansu/work/tensorflow/tensorpack/tensorpack/utils/loadcaffe.py", line 83, in get_caffe_pb
proto_path = download(CAFFE_PROTO_URL, dir)
File "/home/sounansu/work/tensorflow/tensorpack/tensorpack/utils/fs.py", line 39, in download
logger.error("Failed to download {}".format(url))
NameError: global name 'logger' is not defined

Please advice!

Get the output of each layer

Is there a way to get the output of each layer?

Thanks?

Error with only 1 gpu

I got an error when I tried to train a model with only 1 gpu, not sure if the error goes away with 2 or not.

Here is the command line
python2.7 train-atari.py --env Breakout-v0 --gpu 0
Here is the error message.
Traceback (most recent call last):
File "train-atari.py", line 258, in
AsyncMultiGPUTrainer(config, predict_tower=predict_tower).train()
TypeError: Can't instantiate abstract class AsyncMultiGPUTrainer with abstract methods get_predict_func

Thanks for any help with this. It looks like a great package. I was able to get run_atari.py to run and got good output movies.

protoc error

Which version of protoc is required by tensorpack/tensorpack/utils/loadcaffe.py line 119?
I have 2.6.1 and I get an error saying
"caffe proto compilation failed! Did you install protoc?"
AssertionError: caffe proto compilation failed! Did you install protoc?

I am on Ubuntu 16.04, running CUDA 8 and CUDNN 5.

pooling layers

Hi,

Do you have any idea if removing pooling layers would affect the training results and convergence rate for reinforcement learning on Atari games?

Thanks!

DoReFa for resnet/inception on ImageNet

Do you have implementations for DoReFa for resnet and/or inception on Imagenet?
If not any pointers on how to get started?

Bug in inference get_output_tensors method

_get_output_vars method is not defined, should be _get_output_tensors.
InferenceRunner should call "vc.get_output_tensors()" as opposed to "vc._get_output_tensors()" in line 94.

Rookie Here, Could Someone Please Explain How to Run This? I'm Easy. All Modules are loaded

It's not you, tensorpack. It's me.
I'm 100% ready to go with everything installed I just don't know how to code what needs to be coded here.
I've installed gym. The full version.
I have every program and every module set to go in my virtualenv but I'm not sure what to write in run-atari.py, train-atari.py, I'd like to see this play Breakout so I have the Breakout-v0.tfmodel file, all in my folder with the modules. There's also different algorithms for the games, like this one...
https://gym.openai.com/evaluations/eval_L55gczPrQJamMGihq9tzA
I'd appreciate if someone could tell me where this code goes as well.

I've run SpaceInvadors-v0 from a tutorial, the repo of which is here...
https://github.com/llSourcell/Game-AI
If there's an easy way to swap SpaceInvaders with another game and algorithm in this setup, that works for me too.

I've installed additional modules so just to make it clear, if someone could spell out, in specific code what needs to happen to deploy...
run-atari.py and train-atari.py, and have them play this file... Breakout-v0.tfmodel
and where this algorithm goes... https://gym.openai.com/evaluations/eval_L55gczPrQJamMGihq9tzA

Thank You!

GPU memory cost

Could you tell me the GPU memory cost and batch size in your resnet-101 and resnet-152 training.

No GPU version?

Is there a version that works without having any GPU?

DoReFa accuracy

Question on accuracy of DoReFa Alexnet on Imagenet dataset.

With "--dorefa 1,2,6" I am getting train-error-top1: 0.51935, val-error-top1: 0.30192 and train-error-top5: 0.26953

This is better than the top-1 single-crop validation error of 51% mentioned in the comments in the alexnet-dorefa.py. Are the numbers I am seeing expected or I am getting garbage? The above numbers are at 48 epochs on a 2 GPU Titan X Pascal machine.

Thanks.

Image segmentation code available?

Hello,

Thank you very much for sharing your codes, very impressive! Now I am using resnet for some implementation and wondering if you already have image segmentation codes or examples. If you have, would you let me know where to find, please?

Appreciate on your help already!

Super class init method not called for some image augmentors

Several image augmentors (RandomCrop, RandomCropRandomShape ...) do not call super class init method, hence self.rng is not set until reset is explicitly called.
This makes usage in standalone mode infeasible with the error self.rng variable is not defined.

The numerial difference between the tensorflow and numpy

I wrote my own conv and fc layer based on numpy for data comparison of my FPGA implementation. However, I found that results from the tensorflow in your framwork and my nunpy implementation has tiny difference. For example, for the fc layer, I give the same input, while the two outputs keep a ratio of 1.0165, I don't know where does it come from? I guess it maybe from the tf.reduce_mean, which may has tiny difference from numpy's mean function. So how I can get the E value of each layer in the inference stage, so that I can check it, or what other reason do you think can also induce this difference?, thanks.

[25 17:32:54 [email protected]:tensorpack] Found cifar10 data in /home/eli/Downloads/tensorpack-master/tensorpack/dataflow/dataset/cifar10_data.
[25 17:32:55 [email protected]:tensorpack] Found cifar10 data in /home/eli/Downloads/tensorpack-master/tensorpack/dataflow/dataset/cifar10_data.
Traceback (most recent call last):
File "cifar10-resnet.py", line 196, in
QueueInputTrainer(config).train()
File "/home/eli/Downloads/tensorpack-master/tensorpack/train/trainer.py", line 185, in train
grads = self._single_tower_grad()
File "/home/eli/Downloads/tensorpack-master/tensorpack/train/trainer.py", line 134, in _single_tower_grad
cost_var = self.model.get_cost(model_inputs, is_training=True)
File "/home/eli/Downloads/tensorpack-master/tensorpack/models/model_desc.py", line 53, in get_cost
return self._get_cost(input_vars, is_training)
File "cifar10-resnet.py", line 83, in _get_cost
l = conv('conv0', image, 16, 1)
File "cifar10-resnet.py", line 52, in conv
W_init=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/channel)))
File "", line 2, in Conv2D
TypeError: wrapper() takes exactly 1 argument (11 given)
[25 17:32:56 [email protected]:tensorpack] Prefetch process exiting...
[25 17:32:56 [email protected]:tensorpack] Prefetch process exited.

Async Training

Google Drive link is dead

Your google drive link to the pretrained is dead
https://drive.google.com/a/%20megvii.com/folderview?id=0B308TeQzmFDLa0xOeVQwcXg1ZjQ
Please re-up them. Thanks in advanced.