uoa-cares / built Goto Github PK

BuilT - Build a Trainer of deep neural networks

License: MIT License

Python 100.00%

dnn builder experiments pytorch torchvision tensorboard wandb sacred trainer gui

built's Introduction

BuilT(Build a Trainer)

Easily build a trainer for your Depp Neural Network model and experiment as many as you want to find optimal combination of components(model, optimizer, scheduler) and hyper-parameters in a well-organized manner.

No more boilerplate code to train and evaluate your DNN model. just focus on your model.
Simply swap your dataset, model, optimizer and scheduler in the configuration file to find optimal combination. Your code doesn't need to be changed!!!.
Support Cross Validation, OOF(Out of Fold) Prediction
Support WandB(https://wandb.ai/) or tensorboard logging.
Support checkpoint management(Save and load a model. Resume the previous training)
BuilT easily integrates with Kaggle(https://www.kaggle.com/) notebook. (todo: add notebook link)

Installation

Please follow the instruction below to install BuilT.

Installation of BuilT package from the source code

git clone https://github.com/UoA-CARES/BuilT.git
cd BuilT
python setup.py install

Installation of BuilT package using pip

BuilT can be installed using pip(https://pypi.org/project/BuilT/).

pip install built

Usage

Configuration

Builder

Trainer

Dataset

Model

Loss

Optimizer

Scheduler

Logger

Metric

Inference

Ensemble

Examples

MNIST hand-written image classification

(todo)

Sentiment Classification

(todo)

Developer Guide

(todo)

conda create -n conda_BuilT python=3.7
conda activate conda_BuilT
pip install -r requirements.txt

Reference

https://packaging.python.org/tutorials/packaging-projects/

built's People

Contributors

Stargazers

Watchers

Forkers

afters-cool csut017 rrayhka

built's Issues

Postprocessing

Support recursive & ordered postprocessing

Setup script

Add a setup script to install the BuilT module

Clear a warning

/home/anyone/projects/BuilT/built/utils/smooth_label_loss.py:38: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
scores = self.LogSoftmax(dec_outs)

AUC metric

Need to support AUC from metrics

yaml gui builder

Static web based yaml file builder for easily write yaml file for training

Exception handling

Needs to add an exception handler to each base category. It's quite hard to debug without this.

Bug for replace_placeholder

If yaml has a configuration as below,
in_path: "text2emospch/input/sentiment-extraction"
transformer_path: "{in_path}/bert-base-uncased/"

Expected behavior

> print(config['transformer_path'])
text2emospch/input/sentiment-extraction/bert-base-uncased

Actual behavior

> print(config['transformer_path'])
{in_path}/bert-base-uncased

BuilT version
v0.0.4

Python version (e.g. python -version)
3.7

Multi-gpu support

For training and evaluation, it needs to support multi-gpu.

Option for iteration of coverage scale

Support list type for ensemble

if the output is a list of tensors, it should correctly split and merge them again for ensemble outputs.

Not Epoch count logged

Expected behavior
Epoch should be logged on training by default.

Actual behavior
Only global step is logged not Epoch count.

BuilT version
v0.0.4

Python version (e.g. python -version)
3.7

CAM for coverage model

Needs to add CAM visualization using the hidden stats of the transformer from the model.

segfault when running 'sh ./train.sh'

System versions
OS Platform and Distribution: Ubuntu 18.04.3 LTS
Linux Kernel: 5.4.0-51-generic
TensorFlow installed from (source or binary): pip install within conda env
TensorFlow version: 2.1.0
Python version: 3.7.9
Installed using virtualenv? pip? conda?: conda
CUDA/cuDNN version: 11.1, Driver Version: 455.23.05
GPU model and memory: TITAN Xp

I followed the instructions from nlp branch but encountered the following error messages

(conda_BuilT) testmony@testmony-desktop:~/workspace/tweet/BuilT (nlp)$ sh ./train.sh
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/testmony/.kaggle/kaggle.json'
Downloading tweet-sentiment-extraction.zip to input
 72%|███████████████████████████████████████▍               | 1.00M/1.39M [00:00<00:00, 3.27MB/s]
100%|███████████████████████████████████████████████████████| 1.39M/1.39M [00:00<00:00, 3.55MB/s]
Archive:  input/tweet-sentiment-extraction.zip
  inflating: input/tweet-sentiment-extraction/sample_submission.csv  
  inflating: input/tweet-sentiment-extraction/test.csv  
  inflating: input/tweet-sentiment-extraction/train.csv  
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/testmony/.kaggle/kaggle.json'
Downloading roberta-base.zip to input/roberta-base
100%|████████████████████████████████████████████████████████▉| 291M/291M [01:11<00:00, 4.86MB/s]
100%|█████████████████████████████████████████████████████████| 291M/291M [01:11<00:00, 4.26MB/s]
WARNING - orsum - No observers have been added to this run
INFO - orsum - Running command 'train'
INFO - orsum - Started
{ 'dataset': { 'name': 'TweetDataset',
               'params': { 'csv_path': 'tweet/input/tweet-sentiment-extraction/train.csv',
                           'max_len': 96,
                           'model_path': 'tweet/input/roberta-base/'},
               'splits': None},
  'description': 'Tweet Sentiment Classification',
  'evaluation': {'batch_size': 8},
  'forward_hook': {'name': 'TweetForwardHook'},
  'logger_hook': { 'name': 'DefaultLogger',
                   'params': {'use_tensorboard': True, 'use_wandb': False}},
  'loss': {'name': 'TweetLoss'},
  'metric_hook': {'name': 'TweetMetric'},
  'model': { 'name': 'TweetExtractModel',
             'params': { 'drop_out_rate': 0.1,
                         'num_classes': 3,
                         'transformer_path': 'tweet/input/roberta-base/',
                         'transformer_type': 'roberta'}},
  'optimizer': {'name': 'AdamW', 'params': {'lr': 3e-05}},
  'post_forward_hook': {'name': 'TweetPostForwardHook'},
  'scheduler': { 'name': 'MultiStepLR',
                 'params': {'gamma': 0.1, 'milestones': [3, 4, 5]}},
  'seed': 478623980,
  'splitter': { 'name': 'TweetSplitter',
                'params': { 'csv_path': 'tweet/input/tweet-sentiment-extraction/train.csv',
                            'n_splits': 5,
                            'random_state': 42,
                            'shuffle': True}},
  'train': { 'batch_size': 256,
             'dir': 'train_dirs/tweet_classification',
             'gradient_accumulation_step': 1,
             'name': '',
             'num_epochs': 2},
  'transforms': { 'name': '',
                  'num_preprocessor': 1,
                  'params': [ {'ToTensor': None, 'name': 'ToTensor'},
                              { 'Normalize': None,
                                'name': 'Normalize',
                                'params': { 'mean': [0.1307],
                                            'std': [0.3081]}}]},
  'wandb': {'sweep': {'name': 'Sweep', 'use': False, 'yaml': 'sweep.yaml'}}}
./tweet
<module 'tweet_coverage_model' from './tweet/tweet_coverage_model.py'> is loaded
2020-11-18 23:00:34.809912: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-11-18 23:00:34.809979: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-11-18 23:00:34.809986: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO - matplotlib.font_manager - Could not open font file /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf: In FT2Font: Can not load face.  Unknown file format.
INFO - matplotlib.font_manager - generated new fontManager
<module 'tweet_dataset' from './tweet/tweet_dataset.py'> is loaded
<module 'tweet_extract_model' from './tweet/tweet_extract_model.py'> is loaded
<module 'tweet_forward_hook' from './tweet/tweet_forward_hook.py'> is loaded
<module 'tweet_loss' from './tweet/tweet_loss.py'> is loaded
<module 'tweet_metric' from './tweet/tweet_metric.py'> is loaded
<module 'tweet_post_forward_hook' from './tweet/tweet_post_forward_hook.py'> is loaded
<module 'tweet_splitter' from './tweet/tweet_splitter.py'> is loaded
Training start: 0 fold
Segmentation fault (core dumped)

and dmesg output

python[10332]: segfault at 7f0b0000556f ip 00007f0b0000556f sp 00007ffe2b94ad48 error 7 in _pywrap_tensorflow_internal.so[7f0af6474000+41514000]
[ 1562.083556] Code: 85 c0 74 5f 48 8d 7d a0 ba 03 00 00 00 48 89 fe c5 f8 77 ff d0 48 8b 83 50 01 00 00 48 85 c0 74 11 48 8d bb 40 01 00 00 ba 03 <00> 00 00 48 89 fe ff d0 48 8b bb 08 01 00 00 48 85 ff 74 05 e8 78

and pip list output

Package                Version
---------------------- -------------------
absl-py                0.11.0
astor                  0.8.1
astroid                2.4.2
cached-property        1.5.2
cachetools             4.1.1
certifi                2020.6.20
chardet                3.0.4
click                  7.1.2
colorama               0.4.4
configparser           5.0.1
cycler                 0.10.0
decorator              4.4.2
docker-pycreds         0.4.0
docopt                 0.6.2
easydict               1.9
efficientnet-pytorch   0.6.3
filelock               3.0.12
gast                   0.2.2
gitdb                  4.0.5
GitPython              3.1.11
google-auth            1.23.0
google-auth-oauthlib   0.4.2
google-pasta           0.2.0
gql                    0.2.0
graphql-core           1.1
grpcio                 1.33.2
h5py                   3.1.0
idna                   2.10
imageio                2.6.1
importlib-metadata     2.0.0
isort                  4.3.21
joblib                 0.17.0
jsonpickle             1.4.1
kaggle                 1.5.9
Keras-Applications     1.0.8
Keras-Preprocessing    1.1.2
kiwisolver             1.3.1
lazy-object-proxy      1.4.3
Markdown               3.3.3
matplotlib             3.1.3
mccabe                 0.6.1
munch                  2.5.0
networkx               2.5
numpy                  1.18.1
nvidia-ml-py3          7.352.0
oauthlib               3.1.0
opencv-python          4.2.0.32
opt-einsum             3.3.0
packaging              20.4
pandas                 1.0.0
pathtools              0.1.2
pep8                   1.7.1
pickleshare            0.7.5
Pillow                 7.0.0
pip                    20.2.4
promise                2.3
protobuf               3.14.0
psutil                 5.7.3
py-cpuinfo             7.0.0
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pylint                 2.3.1
pyparsing              2.4.7
python-dateutil        2.8.1
python-slugify         4.0.1
pytz                   2020.4
PyWavelets             1.1.1
PyYAML                 5.3.1
regex                  2020.11.13
requests               2.25.0
requests-oauthlib      1.3.0
rsa                    4.6
sacred                 0.8.1
sacremoses             0.0.43
scikit-image           0.16.2
scikit-learn           0.23.1
scipy                  1.4.1
sentencepiece          0.1.94
sentry-sdk             0.19.3
setuptools             50.3.1.post20201107
shortuuid              1.0.1
six                    1.15.0
slugify                0.0.1
smmap                  3.0.4
subprocess32           3.5.4
tensorboard            2.2.2
tensorboard-plugin-wit 1.7.0
tensorflow-estimator   2.1.0
tensorflow-gpu         2.1.0
termcolor              1.1.0
text-unidecode         1.3
threadpoolctl          2.1.0
tokenizer              2.0.6
tokenizers             0.8.1rc2
torch                  1.4.0
torchvision            0.5.0
tqdm                   4.47.0
transformers           3.1.0
typed-ast              1.4.1
urllib3                1.26.2
wandb                  0.9.4
watchdog               0.10.3
Werkzeug               1.0.1
wheel                  0.35.1
wrapt                  1.12.1
zipp                   3.4.0

Metric class abstraction

By providing methods for metric keys, it'll improve code readability and reduce errors from child classes.

Logging to wandb for ensemble

After ensemble trained models, it should properly log results to wandb based on the configuration

versions of transformers and tokenizers packages

When encountering the following error,

File "/home/workspace/BuilT/tweet/src/tweet_dataset.py", line 30, in __init__
    self.tokenizer = tokenizers.ByteLevelBPETokenizer(
TypeError: __init__() got an unexpected keyword argument 'vocab_file'

a possible workaround maybe is checking your transformers and tokenizers package via pip list.

transformers           3.1.0
tokenizers             0.8.1rc2

is confirmed working version.

To update these packages, you can simply do pip install transformers==3.1.0 tokenizers==0.8.1rc2
Hope this help someone...

Test dataset evaulation

Dataset for testing(neither training nor validation) needs to be evaluated after each epoch of training.

Training summary

After training all configurations, it briefly needs to summarize the training results in a terminal.

adding datetime for train_dirs

Currently, all trained model and splitted sets are stored under train_dirs and maybe overwritten when multiple experiments are conducted. It may be suggested to add datetime (e.g., year, date, time, hours) as suffix of the train_dirs (e.g., train_dirs/tweet/classification/2020-11-23-13-20/roberta-base/ or similar)

for_sensors_results branch erorr

ERROR - orsum - Failed after 0:06:15!
Traceback (most recent calls WITHOUT Sacred internals):
File "run.py", line 454, in train
run.finish()
AttributeError: 'Run' object has no attribute 'finish'

replace_placeholder() needs to be encapsulated

Currently, it's exposed to users and sometimes users forget to call it.
But it can be called in the initialization routine of the builder object. And it'll make the process simple.

error while running Ensemble via vscode

The number of models to ensemble: 5
ERROR - orsum - Failed after 0:07:14!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/workspace/kg/BuilT/run.py", line 40, in ensemble
ensembled_output = ensembler.forward_models()
File "/home/workspace/kg/BuilT/built/ensembler.py", line 61, in forward_models
output = torch.sigmoid(
TypeError: can't multiply sequence by non-int of type 'float'

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.