Giter VIP home page Giter VIP logo

built's Introduction

Build Status codecov

BuilT(Build a Trainer)

Easily build a trainer for your Depp Neural Network model and experiment as many as you want to find optimal combination of components(model, optimizer, scheduler) and hyper-parameters in a well-organized manner.

  • No more boilerplate code to train and evaluate your DNN model. just focus on your model.
  • Simply swap your dataset, model, optimizer and scheduler in the configuration file to find optimal combination. Your code doesn't need to be changed!!!.
  • Support Cross Validation, OOF(Out of Fold) Prediction
  • Support WandB(https://wandb.ai/) or tensorboard logging.
  • Support checkpoint management(Save and load a model. Resume the previous training)
  • BuilT easily integrates with Kaggle(https://www.kaggle.com/) notebook. (todo: add notebook link)

Installation

Please follow the instruction below to install BuilT.

Installation of BuilT package from the source code

git clone https://github.com/UoA-CARES/BuilT.git
cd BuilT
python setup.py install

Installation of BuilT package using pip

BuilT can be installed using pip(https://pypi.org/project/BuilT/).

pip install built

Usage

Configuration

Builder

Trainer

Dataset

Model

Loss

Optimizer

Scheduler

Logger

Metric

Inference

Ensemble

Examples

MNIST hand-written image classification

(todo)

Sentiment Classification

(todo)

Developer Guide

(todo)

conda create -n conda_BuilT python=3.7
conda activate conda_BuilT
pip install -r requirements.txt

Reference

https://packaging.python.org/tutorials/packaging-projects/

built's People

Contributors

jlim262 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

built's Issues

Setup script

Add a setup script to install the BuilT module

Clear a warning

/home/anyone/projects/BuilT/built/utils/smooth_label_loss.py:38: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
scores = self.LogSoftmax(dec_outs)

yaml gui builder

  • Static web based yaml file builder for easily write yaml file for training

Exception handling

Needs to add an exception handler to each base category. It's quite hard to debug without this.

Bug for replace_placeholder

If yaml has a configuration as below,
in_path: "text2emospch/input/sentiment-extraction"
transformer_path: "{in_path}/bert-base-uncased/"

Expected behavior

> print(config['transformer_path'])
text2emospch/input/sentiment-extraction/bert-base-uncased

Actual behavior

> print(config['transformer_path'])
{in_path}/bert-base-uncased

BuilT version
v0.0.4

Python version (e.g. python -version)
3.7

Not Epoch count logged

Expected behavior
Epoch should be logged on training by default.

Actual behavior
Only global step is logged not Epoch count.

BuilT version
v0.0.4

Python version (e.g. python -version)
3.7

CAM for coverage model

Needs to add CAM visualization using the hidden stats of the transformer from the model.

segfault when running 'sh ./train.sh'

System versions
OS Platform and Distribution: Ubuntu 18.04.3 LTS
Linux Kernel: 5.4.0-51-generic
TensorFlow installed from (source or binary): pip install within conda env
TensorFlow version: 2.1.0
Python version: 3.7.9
Installed using virtualenv? pip? conda?: conda
CUDA/cuDNN version: 11.1, Driver Version: 455.23.05
GPU model and memory: TITAN Xp

I followed the instructions from nlp branch but encountered the following error messages

(conda_BuilT) testmony@testmony-desktop:~/workspace/tweet/BuilT (nlp)$ sh ./train.sh
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/testmony/.kaggle/kaggle.json'
Downloading tweet-sentiment-extraction.zip to input
 72%|███████████████████████████████████████▍               | 1.00M/1.39M [00:00<00:00, 3.27MB/s]
100%|███████████████████████████████████████████████████████| 1.39M/1.39M [00:00<00:00, 3.55MB/s]
Archive:  input/tweet-sentiment-extraction.zip
  inflating: input/tweet-sentiment-extraction/sample_submission.csv  
  inflating: input/tweet-sentiment-extraction/test.csv  
  inflating: input/tweet-sentiment-extraction/train.csv  
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/testmony/.kaggle/kaggle.json'
Downloading roberta-base.zip to input/roberta-base
100%|████████████████████████████████████████████████████████▉| 291M/291M [01:11<00:00, 4.86MB/s]
100%|█████████████████████████████████████████████████████████| 291M/291M [01:11<00:00, 4.26MB/s]
WARNING - orsum - No observers have been added to this run
INFO - orsum - Running command 'train'
INFO - orsum - Started
{ 'dataset': { 'name': 'TweetDataset',
               'params': { 'csv_path': 'tweet/input/tweet-sentiment-extraction/train.csv',
                           'max_len': 96,
                           'model_path': 'tweet/input/roberta-base/'},
               'splits': None},
  'description': 'Tweet Sentiment Classification',
  'evaluation': {'batch_size': 8},
  'forward_hook': {'name': 'TweetForwardHook'},
  'logger_hook': { 'name': 'DefaultLogger',
                   'params': {'use_tensorboard': True, 'use_wandb': False}},
  'loss': {'name': 'TweetLoss'},
  'metric_hook': {'name': 'TweetMetric'},
  'model': { 'name': 'TweetExtractModel',
             'params': { 'drop_out_rate': 0.1,
                         'num_classes': 3,
                         'transformer_path': 'tweet/input/roberta-base/',
                         'transformer_type': 'roberta'}},
  'optimizer': {'name': 'AdamW', 'params': {'lr': 3e-05}},
  'post_forward_hook': {'name': 'TweetPostForwardHook'},
  'scheduler': { 'name': 'MultiStepLR',
                 'params': {'gamma': 0.1, 'milestones': [3, 4, 5]}},
  'seed': 478623980,
  'splitter': { 'name': 'TweetSplitter',
                'params': { 'csv_path': 'tweet/input/tweet-sentiment-extraction/train.csv',
                            'n_splits': 5,
                            'random_state': 42,
                            'shuffle': True}},
  'train': { 'batch_size': 256,
             'dir': 'train_dirs/tweet_classification',
             'gradient_accumulation_step': 1,
             'name': '',
             'num_epochs': 2},
  'transforms': { 'name': '',
                  'num_preprocessor': 1,
                  'params': [ {'ToTensor': None, 'name': 'ToTensor'},
                              { 'Normalize': None,
                                'name': 'Normalize',
                                'params': { 'mean': [0.1307],
                                            'std': [0.3081]}}]},
  'wandb': {'sweep': {'name': 'Sweep', 'use': False, 'yaml': 'sweep.yaml'}}}
./tweet
<module 'tweet_coverage_model' from './tweet/tweet_coverage_model.py'> is loaded
2020-11-18 23:00:34.809912: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-11-18 23:00:34.809979: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-11-18 23:00:34.809986: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
INFO - matplotlib.font_manager - Could not open font file /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf: In FT2Font: Can not load face.  Unknown file format.
INFO - matplotlib.font_manager - generated new fontManager
<module 'tweet_dataset' from './tweet/tweet_dataset.py'> is loaded
<module 'tweet_extract_model' from './tweet/tweet_extract_model.py'> is loaded
<module 'tweet_forward_hook' from './tweet/tweet_forward_hook.py'> is loaded
<module 'tweet_loss' from './tweet/tweet_loss.py'> is loaded
<module 'tweet_metric' from './tweet/tweet_metric.py'> is loaded
<module 'tweet_post_forward_hook' from './tweet/tweet_post_forward_hook.py'> is loaded
<module 'tweet_splitter' from './tweet/tweet_splitter.py'> is loaded
Training start: 0 fold
Segmentation fault (core dumped)

and dmesg output

python[10332]: segfault at 7f0b0000556f ip 00007f0b0000556f sp 00007ffe2b94ad48 error 7 in _pywrap_tensorflow_internal.so[7f0af6474000+41514000]
[ 1562.083556] Code: 85 c0 74 5f 48 8d 7d a0 ba 03 00 00 00 48 89 fe c5 f8 77 ff d0 48 8b 83 50 01 00 00 48 85 c0 74 11 48 8d bb 40 01 00 00 ba 03 <00> 00 00 48 89 fe ff d0 48 8b bb 08 01 00 00 48 85 ff 74 05 e8 78

and pip list output

Package                Version
---------------------- -------------------
absl-py                0.11.0
astor                  0.8.1
astroid                2.4.2
cached-property        1.5.2
cachetools             4.1.1
certifi                2020.6.20
chardet                3.0.4
click                  7.1.2
colorama               0.4.4
configparser           5.0.1
cycler                 0.10.0
decorator              4.4.2
docker-pycreds         0.4.0
docopt                 0.6.2
easydict               1.9
efficientnet-pytorch   0.6.3
filelock               3.0.12
gast                   0.2.2
gitdb                  4.0.5
GitPython              3.1.11
google-auth            1.23.0
google-auth-oauthlib   0.4.2
google-pasta           0.2.0
gql                    0.2.0
graphql-core           1.1
grpcio                 1.33.2
h5py                   3.1.0
idna                   2.10
imageio                2.6.1
importlib-metadata     2.0.0
isort                  4.3.21
joblib                 0.17.0
jsonpickle             1.4.1
kaggle                 1.5.9
Keras-Applications     1.0.8
Keras-Preprocessing    1.1.2
kiwisolver             1.3.1
lazy-object-proxy      1.4.3
Markdown               3.3.3
matplotlib             3.1.3
mccabe                 0.6.1
munch                  2.5.0
networkx               2.5
numpy                  1.18.1
nvidia-ml-py3          7.352.0
oauthlib               3.1.0
opencv-python          4.2.0.32
opt-einsum             3.3.0
packaging              20.4
pandas                 1.0.0
pathtools              0.1.2
pep8                   1.7.1
pickleshare            0.7.5
Pillow                 7.0.0
pip                    20.2.4
promise                2.3
protobuf               3.14.0
psutil                 5.7.3
py-cpuinfo             7.0.0
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pylint                 2.3.1
pyparsing              2.4.7
python-dateutil        2.8.1
python-slugify         4.0.1
pytz                   2020.4
PyWavelets             1.1.1
PyYAML                 5.3.1
regex                  2020.11.13
requests               2.25.0
requests-oauthlib      1.3.0
rsa                    4.6
sacred                 0.8.1
sacremoses             0.0.43
scikit-image           0.16.2
scikit-learn           0.23.1
scipy                  1.4.1
sentencepiece          0.1.94
sentry-sdk             0.19.3
setuptools             50.3.1.post20201107
shortuuid              1.0.1
six                    1.15.0
slugify                0.0.1
smmap                  3.0.4
subprocess32           3.5.4
tensorboard            2.2.2
tensorboard-plugin-wit 1.7.0
tensorflow-estimator   2.1.0
tensorflow-gpu         2.1.0
termcolor              1.1.0
text-unidecode         1.3
threadpoolctl          2.1.0
tokenizer              2.0.6
tokenizers             0.8.1rc2
torch                  1.4.0
torchvision            0.5.0
tqdm                   4.47.0
transformers           3.1.0
typed-ast              1.4.1
urllib3                1.26.2
wandb                  0.9.4
watchdog               0.10.3
Werkzeug               1.0.1
wheel                  0.35.1
wrapt                  1.12.1
zipp                   3.4.0

Metric class abstraction

By providing methods for metric keys, it'll improve code readability and reduce errors from child classes.

versions of transformers and tokenizers packages

When encountering the following error,

File "/home/workspace/BuilT/tweet/src/tweet_dataset.py", line 30, in __init__
    self.tokenizer = tokenizers.ByteLevelBPETokenizer(
TypeError: __init__() got an unexpected keyword argument 'vocab_file'

a possible workaround maybe is checking your transformers and tokenizers package via pip list.

transformers           3.1.0
tokenizers             0.8.1rc2

is confirmed working version.

To update these packages, you can simply do pip install transformers==3.1.0 tokenizers==0.8.1rc2
Hope this help someone...

Test dataset evaulation

Dataset for testing(neither training nor validation) needs to be evaluated after each epoch of training.

Training summary

After training all configurations, it briefly needs to summarize the training results in a terminal.

adding datetime for train_dirs

Currently, all trained model and splitted sets are stored under train_dirs and maybe overwritten when multiple experiments are conducted. It may be suggested to add datetime (e.g., year, date, time, hours) as suffix of the train_dirs (e.g., train_dirs/tweet/classification/2020-11-23-13-20/roberta-base/ or similar)

for_sensors_results branch erorr

ERROR - orsum - Failed after 0:06:15!
Traceback (most recent calls WITHOUT Sacred internals):
File "run.py", line 454, in train
run.finish()
AttributeError: 'Run' object has no attribute 'finish'

error while running Ensemble via vscode

The number of models to ensemble: 5
ERROR - orsum - Failed after 0:07:14!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/workspace/kg/BuilT/run.py", line 40, in ensemble
ensembled_output = ensembler.forward_models()
File "/home/workspace/kg/BuilT/built/ensembler.py", line 61, in forward_models
output = torch.sigmoid(
TypeError: can't multiply sequence by non-int of type 'float'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.