tensorpack / benchmarks Goto Github PK

View Code? Open in Web Editor NEW

95.0 95.0 32.0 192 KB

Use TensorFlow efficiently

License: The Unlicense

Python 98.90% Shell 1.10%

benchmarks's Introduction

Tensorpack is a neural network training interface based on graph-mode TensorFlow.

Features:

It's Yet Another TF high-level API, with the following highlights:

Focus on training speed.

Speed comes for free with Tensorpack -- it uses TensorFlow in the efficient way with no extra overhead. On common CNNs, it runs training 1.2~5x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack.
Scalable data-parallel multi-GPU / distributed training strategy is off-the-shelf to use. See tensorpack/benchmarks for more benchmarks.

Squeeze the best data loading performance of Python with tensorpack.dataflow.

Symbolic programming (e.g. tf.data) does not offer the data processing flexibility needed in research. Tensorpack squeezes the most performance out of pure Python with various autoparallelization strategies.

Focus on reproducible and flexible research:

Built and used by researchers, we provide high-quality reproducible implementation of papers.

It's not a model wrapper.

There are too many symbolic function wrappers already. Tensorpack includes only a few common layers. You can use any TF symbolic functions inside Tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/....

See tutorials and documentations to know more about these features.

Examples:

We refuse toy examples. Instead of showing tiny CNNs trained on MNIST/Cifar10, we provide training scripts that reproduce well-known papers.

We refuse low-quality implementations. Unlike most open source repos which only implement papers, Tensorpack examples faithfully reproduce papers, demonstrating its flexibility for actual research.

Vision:

Train ResNet and other models on ImageNet
Train Mask/Faster R-CNN on COCO object detection
Unsupervised learning with Momentum Contrast (MoCo)
Adversarial training with state-of-the-art robustness
Generative Adversarial Network(GAN) variants, including DCGAN, InfoGAN, Conditional GAN, WGAN, BEGAN, DiscoGAN, Image to Image, CycleGAN
DoReFa-Net: train binary / low-bitwidth CNN on ImageNet
Fully-convolutional Network for Holistically-Nested Edge Detection(HED)
Spatial Transformer Networks on MNIST addition
Visualize CNN saliency maps

Reinforcement Learning:

Deep Q-Network(DQN) variants on Atari games, including DQN, DoubleDQN, DuelingDQN.
Asynchronous Advantage Actor-Critic(A3C) with demos on OpenAI Gym

Speech / NLP:

Install:

Dependencies:

Python 3.3+.
Python bindings for OpenCV. (Optional, but required by a lot of features)
TensorFlow ≥ 1.5
- TF is not not required if you only want to use tensorpack.dataflow alone as a data processing library
- When using TF2, tensorpack uses its TF1 compatibility mode. Note that a few examples in the repo are not yet migrated to support TF2.

pip install --upgrade git+https://github.com/tensorpack/tensorpack.git
# or add `--user` to install to user's local directories

Please note that tensorpack is not yet stable. If you use tensorpack in your code, remember to mark the exact version of tensorpack you use as your dependencies.

Citing Tensorpack:

If you use Tensorpack in your research or wish to refer to the examples, please cite with:

@misc{wu2016tensorpack,
  title={Tensorpack},
  author={Wu, Yuxin and others},
  howpublished={\url{https://github.com/tensorpack/}},
  year={2016}
}

benchmarks's People

Contributors

Stargazers

Watchers

benchmarks's Issues

Some questions about dataflow

I have read the doc of tensorpack, but found that the dataflow is confusing me as follows:

Why using 'lmdb' format to keep serialize data, rather than TFRecord, while the latter has native TF api support?
According to this benchmark, tf.data is probably faster than dataflow, is it?

xx.lmdb- lock

After using /benchmarks//ImageNet/dump-lmdb.py to generate lmdb file, I find that there are two files, xx.lmdb and xx.lmdb-lock.

Does it mean that there is something wrong during generating?

What the performance of serve_data.py utul?

I have a poor scaling results for even two workers using Horovod on my environment, want to locate the reason. Hence the question: what are the typical (on your environment) results of running
./serve-data.py --data ~/data/imagenet/ --batch 32 --benchmark ?

#horovod# data Size does not match

I modified my code to run it on cifar10, the error trace tells me the size mismatched(one is as twice as the other). the errors are as follows:

here are my codes:
code1
code2

I run it as(I have two gpus on my local machine):
python3 server_data.py --batch 32
mpirun -np 2 --output-filename test.log python3 cifar10-sparse-densenet-bc-horovod.py --batch 32 --gpu=0,1

any suggestion would be appreciated!

Training with multi-GPUs cannot converge

Hello,

I use the same(only modify the ipc address from 'ipc://@imagenet-train-b{}'.format(batch) to 'ipc:///tmp' ) py scripts from benchmarks/ResNet-Horovod to train the baseline model. But unfortunately, I found the result did not converge as expected. The environment and commands I used are listed below:

Python version: python3
Hhorovod version:0.16.0
Tensorpack version: 0.9.4

mpirun --allow-run-as-root -np 4 python3 imagenet-resnet-horovod.py -d 50 --logdir ${LOG_DIR} --data /data/glusterfs_data/11082824/Data/CLS-LOC/ --batch 64 --no-zmq-ops

and some of the results are:
[0416 05:11:23 @monitor.py:467] train-error-top1: 0.64265
[0416 05:24:37 @monitor.py:467] train-error-top1: 0.63513
[0416 05:36:53 @monitor.py:467] train-error-top1: 0.63307
[0416 05:48:24 @monitor.py:467] train-error-top1: 0.63591
[0416 06:00:05 @monitor.py:467] train-error-top1: 0.64311
[0416 06:12:09 @monitor.py:467] train-error-top1: 0.62931
[0416 06:23:40 @monitor.py:467] train-error-top1: 0.62378
[0416 06:34:44 @monitor.py:467] train-error-top1: 0.62418
[0416 06:45:55 @monitor.py:467] train-error-top1: 0.6367
[0416 06:56:54 @monitor.py:467] train-error-top1: 0.62604
[0416 07:07:54 @monitor.py:467] train-error-top1: 0.6293
[0416 07:19:04 @monitor.py:467] train-error-top1: 0.62163
[0416 07:30:15 @monitor.py:467] train-error-top1: 0.61299
[0416 07:41:18 @monitor.py:467] train-error-top1: 0.6069
[0416 07:53:04 @monitor.py:467] train-error-top1: 0.60632
[0416 08:04:33 @monitor.py:467] train-error-top1: 0.60605
[0416 08:16:11 @monitor.py:467] train-error-top1: 0.61349
[0416 08:28:04 @monitor.py:467] train-error-top1: 0.60073
[0416 08:40:10 @monitor.py:467] train-error-top1: 0.60852
[0416 08:52:47 @monitor.py:467] train-error-top1: 0.59606
[0416 09:05:11 @monitor.py:467] train-error-top1: 0.59727
[0416 09:17:37 @monitor.py:467] train-error-top1: 0.60537
[0416 09:30:35 @monitor.py:467] train-error-top1: 0.60737
[0416 09:42:55 @monitor.py:467] train-error-top1: 0.59626
[0416 09:55:24 @monitor.py:467] train-error-top1: 0.61557
[0416 10:08:30 @monitor.py:467] train-error-top1: 0.60756
[0416 10:26:31 @monitor.py:467] train-error-top1: 0.58888
[0416 10:41:43 @monitor.py:467] train-error-top1: 0.61388
[0416 10:54:00 @monitor.py:467] train-error-top1: 0.62244
[0416 11:06:38 @monitor.py:467] train-error-top1: 0.59488
[0416 11:18:57 @monitor.py:467] train-error-top1: 0.60776
[0416 11:31:28 @monitor.py:467] train-error-top1: 0.60432
[0416 11:44:13 @monitor.py:467] train-error-top1: 0.59858

As you can see, even I finished running all the epochs the top1 error is still very high.

Thanks for your help

ResNet-Horovod example doesn't use DistributedOptimizer?

I'm looking at the ResNet-Horovod example, and it doesn't use the horovod DistributedOptimizer which is mentioned in the horovod docs. Is it not necessary?

# Add Horovod Distributed Optimizer
opt = hvd.DistributedOptimizer(opt)

Question on speed for ResNet 50 on single-GPU

Hi Yuxin, in your benchmarks, the speed of Tensorpack is 333 imgs per second.

ResNet50 on fake ImageNet | 333 | 266

My question is, is this the limit of ResNet50 on single-GPU? Thanks.