Giter VIP home page Giter VIP logo

benchmarks's Introduction

Tensorpack

Tensorpack is a neural network training interface based on graph-mode TensorFlow.

ReadTheDoc Gitter chat model-zoo

Features:

It's Yet Another TF high-level API, with the following highlights:

  1. Focus on training speed.
  • Speed comes for free with Tensorpack -- it uses TensorFlow in the efficient way with no extra overhead. On common CNNs, it runs training 1.2~5x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack.

  • Scalable data-parallel multi-GPU / distributed training strategy is off-the-shelf to use. See tensorpack/benchmarks for more benchmarks.

  1. Squeeze the best data loading performance of Python with tensorpack.dataflow.
  • Symbolic programming (e.g. tf.data) does not offer the data processing flexibility needed in research. Tensorpack squeezes the most performance out of pure Python with various autoparallelization strategies.
  1. Focus on reproducible and flexible research:
  1. It's not a model wrapper.
  • There are too many symbolic function wrappers already. Tensorpack includes only a few common layers. You can use any TF symbolic functions inside Tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/....

See tutorials and documentations to know more about these features.

Examples:

We refuse toy examples. Instead of showing tiny CNNs trained on MNIST/Cifar10, we provide training scripts that reproduce well-known papers.

We refuse low-quality implementations. Unlike most open source repos which only implement papers, Tensorpack examples faithfully reproduce papers, demonstrating its flexibility for actual research.

Vision:

Reinforcement Learning:

Speech / NLP:

Install:

Dependencies:

  • Python 3.3+.
  • Python bindings for OpenCV. (Optional, but required by a lot of features)
  • TensorFlow โ‰ฅ 1.5
    • TF is not not required if you only want to use tensorpack.dataflow alone as a data processing library
    • When using TF2, tensorpack uses its TF1 compatibility mode. Note that a few examples in the repo are not yet migrated to support TF2.
pip install --upgrade git+https://github.com/tensorpack/tensorpack.git
# or add `--user` to install to user's local directories

Please note that tensorpack is not yet stable. If you use tensorpack in your code, remember to mark the exact version of tensorpack you use as your dependencies.

Citing Tensorpack:

If you use Tensorpack in your research or wish to refer to the examples, please cite with:

@misc{wu2016tensorpack,
  title={Tensorpack},
  author={Wu, Yuxin and others},
  howpublished={\url{https://github.com/tensorpack/}},
  year={2016}
}

benchmarks's People

Contributors

ppwwyyxx avatar see-- avatar vfdev-5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

benchmarks's Issues

Some questions about dataflow

I have read the doc of tensorpack, but found that the dataflow is confusing me as follows:

  1. Why using 'lmdb' format to keep serialize data, rather than TFRecord, while the latter has native TF api support?

  2. According to this benchmark, tf.data is probably faster than dataflow, is it?

xx.lmdb- lock

After using /benchmarks//ImageNet/dump-lmdb.py to generate lmdb file, I find that there are two files, xx.lmdb and xx.lmdb-lock.

Does it mean that there is something wrong during generating?

What the performance of serve_data.py utul?

I have a poor scaling results for even two workers using Horovod on my environment, want to locate the reason. Hence the question: what are the typical (on your environment) results of running
./serve-data.py --data ~/data/imagenet/ --batch 32 --benchmark ?

#horovod# data Size does not match

I modified my code to run it on cifar10, the error trace tells me the size mismatched(one is as twice as the other). the errors are as follows:
sizeerror

here are my codes:
code1
code2

I run it as(I have two gpus on my local machine):
python3 server_data.py --batch 32
mpirun -np 2 --output-filename test.log python3 cifar10-sparse-densenet-bc-horovod.py --batch 32 --gpu=0,1

any suggestion would be appreciated!

Training with multi-GPUs cannot converge

Hello,

I use the same(only modify the ipc address from 'ipc://@imagenet-train-b{}'.format(batch) to 'ipc:///tmp' ) py scripts from benchmarks/ResNet-Horovod to train the baseline model. But unfortunately, I found the result did not converge as expected. The environment and commands I used are listed below:

Python version: python3
Hhorovod version:0.16.0
Tensorpack version: 0.9.4

mpirun --allow-run-as-root -np 4 python3 imagenet-resnet-horovod.py -d 50 --logdir ${LOG_DIR} --data /data/glusterfs_data/11082824/Data/CLS-LOC/ --batch 64 --no-zmq-ops

and some of the results are:
[0416 05:11:23 @monitor.py:467] train-error-top1: 0.64265
[0416 05:24:37 @monitor.py:467] train-error-top1: 0.63513
[0416 05:36:53 @monitor.py:467] train-error-top1: 0.63307
[0416 05:48:24 @monitor.py:467] train-error-top1: 0.63591
[0416 06:00:05 @monitor.py:467] train-error-top1: 0.64311
[0416 06:12:09 @monitor.py:467] train-error-top1: 0.62931
[0416 06:23:40 @monitor.py:467] train-error-top1: 0.62378
[0416 06:34:44 @monitor.py:467] train-error-top1: 0.62418
[0416 06:45:55 @monitor.py:467] train-error-top1: 0.6367
[0416 06:56:54 @monitor.py:467] train-error-top1: 0.62604
[0416 07:07:54 @monitor.py:467] train-error-top1: 0.6293
[0416 07:19:04 @monitor.py:467] train-error-top1: 0.62163
[0416 07:30:15 @monitor.py:467] train-error-top1: 0.61299
[0416 07:41:18 @monitor.py:467] train-error-top1: 0.6069
[0416 07:53:04 @monitor.py:467] train-error-top1: 0.60632
[0416 08:04:33 @monitor.py:467] train-error-top1: 0.60605
[0416 08:16:11 @monitor.py:467] train-error-top1: 0.61349
[0416 08:28:04 @monitor.py:467] train-error-top1: 0.60073
[0416 08:40:10 @monitor.py:467] train-error-top1: 0.60852
[0416 08:52:47 @monitor.py:467] train-error-top1: 0.59606
[0416 09:05:11 @monitor.py:467] train-error-top1: 0.59727
[0416 09:17:37 @monitor.py:467] train-error-top1: 0.60537
[0416 09:30:35 @monitor.py:467] train-error-top1: 0.60737
[0416 09:42:55 @monitor.py:467] train-error-top1: 0.59626
[0416 09:55:24 @monitor.py:467] train-error-top1: 0.61557
[0416 10:08:30 @monitor.py:467] train-error-top1: 0.60756
[0416 10:26:31 @monitor.py:467] train-error-top1: 0.58888
[0416 10:41:43 @monitor.py:467] train-error-top1: 0.61388
[0416 10:54:00 @monitor.py:467] train-error-top1: 0.62244
[0416 11:06:38 @monitor.py:467] train-error-top1: 0.59488
[0416 11:18:57 @monitor.py:467] train-error-top1: 0.60776
[0416 11:31:28 @monitor.py:467] train-error-top1: 0.60432
[0416 11:44:13 @monitor.py:467] train-error-top1: 0.59858

As you can see, even I finished running all the epochs the top1 error is still very high.

Thanks for your help

ResNet-Horovod example doesn't use DistributedOptimizer?

I'm looking at the ResNet-Horovod example, and it doesn't use the horovod DistributedOptimizer which is mentioned in the horovod docs. Is it not necessary?

# Add Horovod Distributed Optimizer
opt = hvd.DistributedOptimizer(opt)

Question on speed for ResNet 50 on single-GPU

Hi Yuxin, in your benchmarks, the speed of Tensorpack is 333 imgs per second.

ResNet50 on fake ImageNet | 333 | 266

My question is, is this the limit of ResNet50 on single-GPU? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.