wandb / wandb Goto Github PK

🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

License: MIT License

Python 86.95% Shell 0.32% Jupyter Notebook 0.14% C 0.02% Dockerfile 0.06% Go 9.45% C++ 0.15% Swift 2.13% Rust 0.78%

machine-learning experiment-track deep-learning keras tensorflow pytorch hyperparameter-search reinforcement-learning mlops data-science

wandb's Introduction

Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production machine learning models. Get started with W&B today, sign up for a free account!

🎓 W&B is free for students, educators, and academic researchers. For more information, visit https://wandb.ai/site/research.

Want to use Weights & Biases for seamless collaboration between your ML or Data Science team? Looking for Production-grade MLOps at scale? Sign up to one of our plans or contact the Sales Team.

Documentation

See the W&B Developer Guide and API Reference Guide for a full technical description of the W&B platform.

Quickstart

Get started with W&B in four steps:

First, sign up for a free W&B account.
Second, install the W&B SDK with pip. Navigate to your terminal and type the following command:

pip install wandb

Third, log into W&B:

wandb.login()

Use the example code snippet below as a template to integrate W&B to your Python script:

import wandb

# Start a W&B Run with wandb.init
run = wandb.init(project="my_first_project")

# Save model inputs and hyperparameters in a wandb.config object
config = run.config
config.learning_rate = 0.01

# Model training code here ...

# Log metrics over time to visualize performance with wandb.log
for i in range(10):
    run.log({"loss": loss})

That's it! Navigate to the W&B App to view a dashboard of your first W&B Experiment. Use the W&B App to compare multiple experiments in a unified place, dive into the results of a single run, and much more!

Example W&B Dashboard that shows Runs from an Experiment.

Integrations

Use your favorite framework with W&B. W&B integrations make it fast and easy to set up experiment tracking and data versioning inside existing projects. For more information on how to integrate W&B with the framework of your choice, see the Integrations chapter in the W&B Developer Guide.

🔥 PyTorch

Call .watch and pass in your PyTorch model to automatically log gradients and store the network topology. Next, use .log to track other metrics. The following example demonstrates an example of how to do this:

import wandb

# 1. Start a new run
run = wandb.init(project="gpt4")

# 2. Save model inputs and hyperparameters
config = run.config
config.dropout = 0.01

# 3. Log gradients and model parameters
run.watch(model)
for batch_idx, (data, target) in enumerate(train_loader):
    ...
    if batch_idx % args.log_interval == 0:
        # 4. Log metrics to visualize performance
        run.log({"loss": loss})

Run an example Google Colab Notebook.
Read the Developer Guide for technical details on how to integrate PyTorch with W&B.
Explore W&B Reports.

🌊 TensorFlow/Keras

Use W&B Callbacks to automatically save metrics to W&B when you call `model.fit` during training.

The following code example demonstrates how your script might look like when you integrate W&B with Keras:

# This script needs these libraries to be installed:
#   tensorflow, numpy

import wandb
from wandb.keras import WandbMetricsLogger, WandbModelCheckpoint

import random
import numpy as np
import tensorflow as tf


# Start a run, tracking hyperparameters
run = wandb.init(
    # set the wandb project where this run will be logged
    project="my-awesome-project",
    # track hyperparameters and run metadata with wandb.config
    config={
        "layer_1": 512,
        "activation_1": "relu",
        "dropout": random.uniform(0.01, 0.80),
        "layer_2": 10,
        "activation_2": "softmax",
        "optimizer": "sgd",
        "loss": "sparse_categorical_crossentropy",
        "metric": "accuracy",
        "epoch": 8,
        "batch_size": 256,
    },
)

# [optional] use wandb.config as your config
config = run.config

# get the data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train, y_train = x_train[::5], y_train[::5]
x_test, y_test = x_test[::20], y_test[::20]
labels = [str(digit) for digit in range(np.max(y_train) + 1)]

# build a model
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(config.layer_1, activation=config.activation_1),
        tf.keras.layers.Dropout(config.dropout),
        tf.keras.layers.Dense(config.layer_2, activation=config.activation_2),
    ]
)

# compile the model
model.compile(optimizer=config.optimizer, loss=config.loss, metrics=[config.metric])

# WandbMetricsLogger will log train and validation metrics to wandb
# WandbModelCheckpoint will upload model checkpoints to wandb
history = model.fit(
    x=x_train,
    y=y_train,
    epochs=config.epoch,
    batch_size=config.batch_size,
    validation_data=(x_test, y_test),
    callbacks=[
        WandbMetricsLogger(log_freq=5),
        WandbModelCheckpoint("models"),
    ],
)

# [optional] finish the wandb run, necessary in notebooks
run.finish()

Get started integrating your Keras model with W&B today:

Run an example Google Colab Notebook
Read the Developer Guide for technical details on how to integrate Keras with W&B.
Explore W&B Reports.

🤗 Hugging Face Transformers

Pass wandb to the report_to argument when you run a script using a Hugging Face Trainer. W&B will automatically log losses, evaluation metrics, model topology, and gradients.

Note: The environment you run your script in must have wandb installed.

The following example demonstrates how to integrate W&B with Hugging Face:

# This script needs these libraries to be installed:
#   numpy, transformers, datasets

import wandb

import os
import numpy as np
from datasets import load_dataset
from transformers import TrainingArguments, Trainer
from transformers import AutoTokenizer, AutoModelForSequenceClassification


def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {"accuracy": np.mean(predictions == labels)}


# download prepare the data
dataset = load_dataset("yelp_review_full")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

small_train_dataset = dataset["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = dataset["test"].shuffle(seed=42).select(range(300))

small_train_dataset = small_train_dataset.map(tokenize_function, batched=True)
small_eval_dataset = small_train_dataset.map(tokenize_function, batched=True)

# download the model
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=5
)

# set the wandb project where this run will be logged
os.environ["WANDB_PROJECT"] = "my-awesome-project"

# save your trained model checkpoint to wandb
os.environ["WANDB_LOG_MODEL"] = "true"

# turn off watch to log faster
os.environ["WANDB_WATCH"] = "false"

# pass "wandb" to the `report_to` parameter to turn on wandb logging
training_args = TrainingArguments(
    output_dir="models",
    report_to="wandb",
    logging_steps=5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    evaluation_strategy="steps",
    eval_steps=20,
    max_steps=100,
    save_steps=100,
)

# define the trainer and start training
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)
trainer.train()

# [optional] finish the wandb run, necessary in notebooks
wandb.finish()

Run an example Google Colab Notebook.
Read the Developer Guide for technical details on how to integrate Hugging Face with W&B.

⚡️ PyTorch Lightning

Build scalable, structured, high-performance PyTorch models with Lightning and log them with W&B.

# This script needs these libraries to be installed:
#   torch, torchvision, pytorch_lightning

import wandb

import os
from torch import optim, nn, utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor

import pytorch_lightning as pl
from pytorch_lightning.loggers import WandbLogger


class LitAutoEncoder(pl.LightningModule):
    def __init__(self, lr=1e-3, inp_size=28, optimizer="Adam"):
        super().__init__()

        self.encoder = nn.Sequential(
            nn.Linear(inp_size * inp_size, 64), nn.ReLU(), nn.Linear(64, 3)
        )
        self.decoder = nn.Sequential(
            nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, inp_size * inp_size)
        )
        self.lr = lr

        # save hyperparameters to self.hparamsm auto-logged by wandb
        self.save_hyperparameters()

    def training_step(self, batch, batch_idx):
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)

        # log metrics to wandb
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = optim.Adam(self.parameters(), lr=self.lr)
        return optimizer


# init the autoencoder
autoencoder = LitAutoEncoder(lr=1e-3, inp_size=28)

# setup data
batch_size = 32
dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(dataset, shuffle=True)

# initialise the wandb logger and name your wandb project
wandb_logger = WandbLogger(project="my-awesome-project")

# add your batch size to the wandb config
wandb_logger.experiment.config["batch_size"] = batch_size

# pass wandb_logger to the Trainer
trainer = pl.Trainer(limit_train_batches=750, max_epochs=5, logger=wandb_logger)

# train the model
trainer.fit(model=autoencoder, train_dataloaders=train_loader)

# [optional] finish the wandb run, necessary in notebooks
wandb.finish()

Run an example Google Colab Notebook.
Read the Developer Guide for technical details on how to integrate PyTorch Lightning with W&B.

💨 XGBoost

Use W&B Callbacks to automatically save metrics to W&B when you call `model.fit` during training.

The following code example demonstrates how your script might look like when you integrate W&B with XGBoost:

# This script needs these libraries to be installed:
#   numpy, xgboost

import wandb
from wandb.xgboost import WandbCallback

import numpy as np
import xgboost as xgb


# setup parameters for xgboost
param = {
    "objective": "multi:softmax",
    "eta": 0.1,
    "max_depth": 6,
    "nthread": 4,
    "num_class": 6,
}

# start a new wandb run to track this script
run = wandb.init(
    # set the wandb project where this run will be logged
    project="my-awesome-project",
    # track hyperparameters and run metadata
    config=param,
)

# download data from wandb Artifacts and prep data
run.use_artifact("wandb/intro/dermatology_data:v0", type="dataset").download(".")
data = np.loadtxt(
    "./dermatology.data",
    delimiter=",",
    converters={33: lambda x: int(x == "?"), 34: lambda x: int(x) - 1},
)
sz = data.shape

train = data[: int(sz[0] * 0.7), :]
test = data[int(sz[0] * 0.7) :, :]

train_X = train[:, :33]
train_Y = train[:, 34]

test_X = test[:, :33]
test_Y = test[:, 34]

xg_train = xgb.DMatrix(train_X, label=train_Y)
xg_test = xgb.DMatrix(test_X, label=test_Y)
watchlist = [(xg_train, "train"), (xg_test, "test")]

# add another config to the wandb run
num_round = 5
run.config["num_round"] = 5
run.config["data_shape"] = sz

# pass WandbCallback to the booster to log its configs and metrics
bst = xgb.train(
    param, xg_train, num_round, evals=watchlist, callbacks=[WandbCallback()]
)

# get prediction
pred = bst.predict(xg_test)
error_rate = np.sum(pred != test_Y) / test_Y.shape[0]

# log your test metric to wandb
run.summary["Error Rate"] = error_rate

# [optional] finish the wandb run, necessary in notebooks
run.finish()

Run an example Google Colab Notebook.
Read the Developer Guide for technical details on how to integrate XGBoost with W&B.

🧮 Sci-Kit Learn

Use wandb to visualize and compare your scikit-learn models' performance:

# This script needs these libraries to be installed:
#   numpy, sklearn

import wandb
from wandb.sklearn import plot_precision_recall, plot_feature_importances
from wandb.sklearn import plot_class_proportions, plot_learning_curve, plot_roc

import numpy as np
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split


# load and process data
wbcd = datasets.load_breast_cancer()
feature_names = wbcd.feature_names
labels = wbcd.target_names

test_size = 0.2
X_train, X_test, y_train, y_test = train_test_split(
    wbcd.data, wbcd.target, test_size=test_size
)

# train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
model_params = model.get_params()

# get predictions
y_pred = model.predict(X_test)
y_probas = model.predict_proba(X_test)
importances = model.feature_importances_
indices = np.argsort(importances)[::-1]

# start a new wandb run and add your model hyperparameters
run = wandb.init(project="my-awesome-project", config=model_params)

# Add additional configs to wandb
run.config.update(
    {
        "test_size": test_size,
        "train_len": len(X_train),
        "test_len": len(X_test),
    }
)

# log additional visualisations to wandb
plot_class_proportions(y_train, y_test, labels)
plot_learning_curve(model, X_train, y_train)
plot_roc(y_test, y_probas, labels)
plot_precision_recall(y_test, y_probas, labels)
plot_feature_importances(model)

# [optional] finish the wandb run, necessary in notebooks
run.finish()

Run an example Google Colab Notebook.
Read the Developer Guide for technical details on how to integrate Scikit-Learn with W&B.

W&B Hosting Options

Weights & Biases is available in the cloud or installed on your private infrastructure. Set up a W&B Server in a production environment in one of three ways:

Production Cloud: Set up a production deployment on a private cloud in just a few steps using terraform scripts provided by W&B.
Dedicated Cloud: A managed, dedicated deployment on W&B's single-tenant infrastructure in your choice of cloud region.
On-Prem/Bare Metal: W&B supports setting up a production server on most bare metal servers in your on-premise data centers. Quickly get started by running wandb server to easily start hosting W&B on your local infrastructure.

See the Hosting documentation in the W&B Developer Guide for more information.

Contribution guidelines

Weights & Biases ❤️ open source, and we welcome contributions from the community! See the Contribution guide for more information on the development workflow and the internals of the wandb library. For wandb bugs and feature requests, visit GitHub Issues or contact [email protected] .

W&B Community

Be a part of the growing W&B Community and interact with the W&B team in our Discord. Stay connected with the latest ML updates and tutorials with W&B Fully Connected.

License

MIT License

wandb's People

Contributors

Stargazers

Watchers

Forkers

connorhough arroadie rguerrettaz borisdayma pbamotra hhy5277 shaunstanislauslau yushu-liu mohirio cclauss mojothejojo casillas-qf leo-xxx youtang1993 literaryprogrammer repson juliansteam nbardy monoloxo laudehenri tmke8 stjordanis cuimengonly lordofavernus guoanjin mikewlange intoffset bguisard ahmedcs keawang cctail akshan115 jakubczakon ruyuej frankxwang guysmoilov jeanpierrethach tokestermw jungx098 danmou leoyichen dreamflasher giftcarder-api yanickschraner rsomani95 jwanga sanketsaurav hmattie corysaildrone anuragsinghchaudhary sitic bujji00 letsmakeadeal liuyuuan gampx mpgek louiskirsch galatolofederico yuanjie-ai lisadunlap usama13o yue1harriet1 sanyam07 margaritaderaney chrka ktobah hujiatao0 neomatrix369 eracah shange1996 wangmj6 lyrics-wangkl oddecust zbpjlc clhne wangxl12 captainwoon xxxhycl2010 33ss33ss33 johndan070 heartfu wangsheng1991 wnov june01 nipi64310 cike0cop calclavia tqcai jotterbach techworkerbee templeblock dylanturpin nanqiai storife shoujieqiu ishandutta2007 auto-ml qiyuehan xinyuan-zheng yhf888666

wandb's Issues

summary only updated at end of run

Which is a change I made to fix a hang (if we modified a file while uploading, the upload might hang). We can fix this by copying files to a temp file before uploading.

Preserve job's stdout and stderr

We should save the jobs stdout and stderr. We should prefix each line with a UTC microsecond timestamp (timestamped on the client side, since the timestamp is used for interleaving), and call the saved files stdout and stderr in cloud storage. In the UI we can interleave these by default but give the user the option to show one or the other.

Files may not match cloud after run completes

We should compare md5s of files on disk to cloud stored files to ensure we have the most up to date versions.

You do not have permission to access Project: l2k owned by mnist-test as anonymous

Weights and Biases version: 0.4.21
Python version: 2.7
Operating System: os x

Description

Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/site-packages/watchdog/observers/api.py", line 199, in run
    self.dispatch_events(self.event_queue, self.timeout)
  File "/usr/local/lib/python2.7/site-packages/watchdog/observers/api.py", line 368, in dispatch_events
    handler.dispatch(event)
  File "/usr/local/lib/python2.7/site-packages/watchdog/events.py", line 454, in dispatch
    _method_map[event_type](event)
  File "/usr/local/lib/python2.7/site-packages/wandb/sync.py", line 358, in on_file_created
    self._get_handler(event.src_path, save_name).on_created()
  File "/usr/local/lib/python2.7/site-packages/wandb/sync.py", line 136, in on_created
    self.on_modified()
  File "/usr/local/lib/python2.7/site-packages/wandb/sync.py", line 142, in on_modified
    progress=False)
  File "/usr/local/lib/python2.7/site-packages/wandb/api.py", line 96, in wrapper
    raise CommError(message)
CommError: You do not have permission to access Project: l2k owned by mnist-test  as anonymous

Fix up API retry logic

We now have util.request_with_retry that does somewhat smart retrying. We should probably use it everywhere, including instead of gql.Client which retries without backoff.

Later we'll refine our network issue handling design. For example if our servers go down we can still let the job finish by saving all files to local disk, then present them with "wandb servers seem to be down, this process will remain open until everything is ok and sync'd. Or you can ctrl-c this process and run 'wandb sync' at a later time to finish saving your data"

There should probably be a message if you wandb pull on an empty bucket

Right now it looks like

  % wandb pull bottle-angle                                                                                                     !10664
Downloading: bottle-angle/bottle-angle

And then returns happily. Maybe it should say "bottle-angle/bottle-angle is empty"

graphing not working?

Weights and Biases version: 0.4.24
Python version: 2.7.12
Operating System: linux

Description

I'm using WandbKerasCallback() but I'm not getting a graph or I'm only getting a single point on the graph. Weirdly, the point moves around inside a single epoch.

https://app.wandb.ai/l2k/simple-test/runs/q0ji253u

description editor should cause program to abort if file is not saved.

With git when I want to bail from description editing (vim) I do ":q!" which doesn't save the file. This causes git to abort

"wandb.Error: history.add expects dict" when adding OrderedDict

Instead of isinstance check to see if dict-like by calling methods.

Traceback (most recent call last):
File "./run_pg.py", line 79, in
run_policy_gradient_algorithm(env, agent, callback=callback, usercfg = cfg)
File "/Users/shawn/code/modular_rl/modular_rl/core.py", line 117, in run_policy_gradient_algorithm
if callback: callback(stats)
File "./run_pg.py", line 74, in callback
run.history.add(stats.items())
File "/Users/shawn/.pyenv/versions/crowdai-run/lib/python2.7/site-packages/wandb/history.py", line 28, in add
raise wandb.Error('history.add expects dict')
wandb.Error: history.add expects dict

Ability to disable exception wrapping for debugging

Passing WANDB_DEBUG=1 through the wandb cli environment should disable exception wrapping, so that we can see the full stack trace for cli exceptions.

Exceptions in my training script don't cause it to die.

Probably because of the wrapping we're doing.

When this occurred for me I had commented out all of the @normalize_exceptions lines in api.py

Something funny is happening with this log

https://app.wandb.ai/l2k/segmentor/runs/s4s9ic

wandb init throws error

Weights and Biases version: 0.4.4
Python version: 3.6.1
Operating System: ubuntu

Description

I did

wandb init

got

bottle-angle - This model finds the angle of a bottle.
lassen - Project to build neural net from scratch.
mnist-sample - Messing around with simple handwritten digits models
Traceback (most recent call last):
  File "/home/l2k/.pyenv/versions/3.6.1/bin/wandb", line 11, in <module>
    sys.exit(cli())
  File "/home/l2k/.pyenv/versions/3.6.1/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/l2k/.pyenv/versions/3.6.1/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/l2k/.pyenv/versions/3.6.1/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/l2k/.pyenv/versions/3.6.1/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/l2k/.pyenv/versions/3.6.1/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/l2k/.pyenv/versions/3.6.1/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/l2k/.pyenv/versions/3.6.1/lib/python3.6/site-packages/wandb/cli.py", line 38, in wrapper
    return func(*args, **kwargs)
  File "/home/l2k/.pyenv/versions/3.6.1/lib/python3.6/site-packages/wandb/cli.py", line 309, in init
    question = inquirer.List('project', message="Which project should we use?", choices=project_names + ["Create New"])
AttributeError: module 'inquirer' has no attribute 'List'

Make sure we're setting up paths properly

Allow the user to specify relative paths, but only store absolute paths. For example Run.dir should probably be an absolute path. There may be other places to look for this. This issue will remind me to do that :)

Whaaaaaaat blew up on bash for windows

Weights and Biases version: latest
Python version: 3.6
Operating System: Windows 10

Description

Seems like a funky bash for windows env caused shit to blow up.

What I Did

$ wandb init
Let's setup this directory for W&B!
Opening [https://app.wandb.ai/profile] in a new tab in your default browser.
Not authenticated! Paste an API key from your profile: 1a5139a74345175a78e5e63f9354d5ac5d32968c
Appending to netrc C:\Users\bharathr/.netrc
What username or org should we use? [bharath374]: qualcomm
Traceback (most recent call last):
  File "c:\apps\anaconda3\envs\wandb\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\apps\anaconda3\envs\wandb\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\apps\Anaconda3\envs\wandb\Scripts\wandb.exe\__main__.py", line 9, in <module>
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\wandb\cli.py", line 50, in wrapper
    return func(*args, **kwargs)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\wandb\cli.py", line 403, in init
    project = prompt_for_project(ctx, entity)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\wandb\cli.py", line 96, in prompt_for_project
    project = whaaaaat.prompt([question])['project_name']
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\whaaaaat\prompt.py", line 65, in prompt
    eventloop=eventloop)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\prompt_toolkit\shortcuts.py", line 576, in run_application
    output=create_output(true_color=true_color))
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\prompt_toolkit\shortcuts.py", line 118, in create_output
    return Win32Output(stdout)
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\prompt_toolkit\terminal\win32_output.py", line 80, in __init__
    info = self.get_win32_screen_buffer_info()
  File "c:\apps\anaconda3\envs\wandb\lib\site-packages\prompt_toolkit\terminal\win32_output.py", line 172, in get_win32_screen_buffer_info
    raise NoConsoleScreenBufferError
prompt_toolkit.terminal.win32_output.NoConsoleScreenBufferError: Found xterm, while expecting a Windows console. Maybe try to run this program using "winpty" or run it in cmd.exe instead. Or otherwise, in case of Cygwin, use the Python executable that is compiled for Cygwin.
(wandb)

put a newline after displaying stderr

all my stuff looks like

Using TensorFlow backend.
Syncing https://app.wandb.ai/l2k/perceptron-test/runs/z5b9r5
Starting...
Running
Pushing log
Synced https://app.wandb.ai/l2k/perceptron-test/runs/z5b9r5
ERROR: Error when checking target: expected dense_1 to have shape (None, 10) but got array with shape (10000, 1)ubuntu@ip-172-31-6-151:~/ml-class$

~/.netrc indentation gets screwed up by wandb login

Try "wandb login" a few times, it seems to add indentation to ~/.netrc

wandb without a project configured throws an error

Weights and Biases version: 4.1
Python version: 2.7
Operating System: ubuntu

wandb sdf

l2k@l2k-Oryx-Pro:~$ wandb sdf
Traceback (most recent call last):
  File "/home/l2k/.local/bin/wandb", line 11, in <module>
    sys.exit(cli())
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 1061, in invoke
    cmd_name, cmd, args = self.resolve_command(ctx, args)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 1100, in resolve_command
    cmd = self.get_command(ctx, cmd_name)
  File "/home/l2k/.local/lib/python2.7/site-packages/wandb/cli.py", line 60, in get_command
    project, bucket = api.parse_slug(cmd_name)
  File "/home/l2k/.local/lib/python2.7/site-packages/wandb/api.py", line 128, in parse_slug
    raise Error("No default project configured.")
wandb.api.Error: No default project configured.

Files not synced

Log here: https://app.wandb.ai/shawn/gaze/runs/i9cvt9

These two files exist and should have been synced:

Final model: ./run-170906-103948/model-final.h5
Epochs csv: ./run-170906-103948/epochs.csv

But the run only has one file (stdout)

would be nice runs page showed the name of the machine the code ran on

-L

wandb-history.jsonl tailing not working in python2.7

Using l2k's wandb-examples/tensorflow/mnist_cnn.py

It looks like FileTailer's self._file.read() never returns any data. But it works in python 3.6

Should fail more gracefully when you forget to do wandb init

Weights and Biases version: 0.4.21
Python version: 2.7.11
Operating System: os X

Description

If I forget to run wandb init in my directory I get an unhelpful error message

`Traceback (most recent call last):

File "/usr/local/bin/wandb", line 11, in

sys.exit(cli())

File "/usr/local/lib/python2.7/site-packages/click/core.py", line 722, in call

return self.main(*args, **kwargs)

File "/usr/local/lib/python2.7/site-packages/click/core.py", line 697, in main

rv = self.invoke(ctx)

File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "/usr/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke

return ctx.invoke(self.callback, **ctx.params)

File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke

return callback(*args, **kwargs)

File "/usr/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func

return f(get_current_context(), *args, **kwargs)

File "/usr/local/lib/python2.7/site-packages/wandb/cli.py", line 37, in wrapper

return func(*args, **kwargs)

File "/usr/local/lib/python2.7/site-packages/wandb/cli.py", line 369, in run

dir = wandb_run.run_dir_path(id, dry=False)

File "/usr/local/lib/python2.7/site-packages/wandb/wandb_run.py", line 58, in run_dir_path

return os.path.join(__stage_dir__, '%s-%s' % (prefix, run_id))

File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.py", line 70, in join

elif path == '' or path.endswith('/'):

AttributeError: 'NoneType' object has no attribute 'endswith'

Permissions errors should be prettier

Weights and Biases version: 0.4
Python version:
Operating System:

Description

If I try to read or write to a project I don’t own I shouldn’t see a trace back.

What I Did

Insure you have a user that isn’t an admin and try to write to a project that isn’t yours.

traceback when dryrunning with no wandb dir

wandb.run isn't set. We might want to fail in this case like a non dry-run would. But doing so naively breaks unit tests.

Crashes

Weights and Biases version: latest
Python version: 2.7
Operating System: ubuntu

Description

I installed wandb and typed wandb init

Traceback (most recent call last):
File "/home/l2k/.local/bin/wandb", line 7, in
from wandb.cli import cli
File "/home/l2k/.local/lib/python2.7/site-packages/wandb/init.py", line 8, in
from .sync import Sync
File "/home/l2k/.local/lib/python2.7/site-packages/wandb/sync.py", line 3, in
from watchdog.observers import Observer
ImportError: No module named watchdog.observers

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

Importing WandBKerasCallback without running wandb init leads to strange error

Weights and Biases version: 0.4.22
Python version: 2.7
Operating System: OS X

Description

Traceback (most recent call last):
File "mnist-cnn.py", line 6, in
from wandb.wandb_keras import WandBKerasCallback
File "/Users/l2k/client/wandb/init.py", line 144, in
_do_sync(run.dir)
File "/Users/l2k/client/wandb/init.py", line 107, in _do_sync
syncer.watch(files='*')
File "/Users/l2k/client/wandb/sync.py", line 278, in watch
config=self._config.as_dict(), description=self._description, host=socket.gethostname())
File "/Users/l2k/client/wandb/api.py", line 398, in upsert_run
'host': host, 'debug': os.getenv('DEBUG')})
File "/usr/local/lib/python2.7/site-packages/gql/client.py", line 52, in execute
raise Exception(str(result.errors[0]))
Exception: {u'message': u"'_model'", u'code': 500, u'locations': [{u'column': 3, u'line': 2}]}

Runs are bouncing around when I train multiple at same time

Weights and Biases version: 0.4.17

Client not saving description

Weights and Biases version: 4.24
Python version: 3.6
Operating System: High Sierra

Description

Locally I'm only seeing the run_id with no description

What I Did

cd wandb-example-simple
wandb run train.py

Another network error

wandb: Script ended.
wandb: Run summary:
wandb: epoch 7
wandb: acc 0.9425042652743575
wandb: Run history:
wandb: epoch ▁▂▃▃▄▅▆▆▇█
wandb: acc ▃▇▄▇▅▁▇█▅▄
wandb: Waiting for final file modifications.
wandb: Syncing files in wandb/run-5z0byl:
wandb: config.yaml
wandb: diff.patch
wandb: modify_file.txt
wandb: wandb-history.jsonl
wandb: events.out.tfevents.1485040358.gpu
wandb: wandb-summary.json
wandb: model.txt
Exception in thread Thread-9:000 of 53295.000 bytes uploaded
Traceback (most recent call last):
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 386, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 382, in _make_request
httplib_response = conn.getresponse()
File "/Users/shawn/.pyenv/versions/3.6.0/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/Users/shawn/.pyenv/versions/3.6.0/lib/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/Users/shawn/.pyenv/versions/3.6.0/lib/python3.6/http/client.py", line 266, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/adapters.py", line 423, in send
timeout=timeout
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 649, in urlopen
_stacktrace=sys.exc_info()[2])
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/util/retry.py", line 347, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 386, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 382, in _make_request
httplib_response = conn.getresponse()
File "/Users/shawn/.pyenv/versions/3.6.0/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/Users/shawn/.pyenv/versions/3.6.0/lib/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/Users/shawn/.pyenv/versions/3.6.0/lib/python3.6/http/client.py", line 266, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/shawn/code/wandb/client/wandb/api.py", line 525, in upload_file
url, data=progress, headers=extra_headers)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/api.py", line 124, in put
return request('put', url, data=data, **kwargs)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/adapters.py", line 473, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/shawn/code/wandb/client/wandb/api.py", line 73, in wrapper
return func(*args, **kwargs)
File "/Users/shawn/code/wandb/client/wandb/api.py", line 533, in upload_file
completed = int(status.headers['Range'].split("-")[-1])
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/requests/structures.py", line 54, in getitem
return self._store[key.lower()][1]
KeyError: 'range'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/shawn/code/wandb/client/wandb/api.py", line 73, in wrapper
return func(*args, **kwargs)
File "/Users/shawn/code/wandb/client/wandb/api.py", line 608, in push
responses.append(self.upload_file(file_info['url'], open_file, progress))
File "/Users/shawn/code/wandb/client/wandb/api.py", line 93, in wrapper
raise CommError(message)
wandb.api.CommError: 'range'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/shawn/.pyenv/versions/3.6.0/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/Users/shawn/code/wandb/client/wandb/file_pusher.py", line 26, in run
self._push_function(self.save_name, self.path)
File "/Users/shawn/code/wandb/client/wandb/sync.py", line 266, in push_function
progress=lambda _, total: self._stats.update_progress(path, total))
File "/Users/shawn/code/wandb/client/wandb/api.py", line 93, in wrapper
raise CommError(message)
wandb.api.CommError: 'range'

wandb: File changed while uploading, restarting: wandb-summary.json
wandb: Synced https://app.wandb.ai/shawn/example-simple1/runs/5z0byl
wandb: job (python) Process exited with code: 0

Change logic that determines which files to sync

This logic determines which files are synced

            if len(files) > 0:
                self._handler._patterns = ["*"+file for file in files]
            else:
                self._handler._patterns = ["*.h5", "*.hdf5", "*.json", "*.meta", "*checkpoint*"]

I propose:

If user provides files (which can be globs), do what they say verbatim. The "*"+file logic above doesn't work with my gaze code (see #19), which passes in paths like "./run-170906-113928/epochs.csv"
By default sync any new files in the directory tree rooted at cwd.

Consider run_id generation strategy

Currently we generate run_ids as a random combination of 6 letters and digits. There are 36**6 = ~2billion possible combinations. Is this enough to make collisions extremely unlikely, given the birthday problem?

'latest' has a special meaning when used as a run_id. Should we blacklist it from being generated? Probably.

We've also discussed using a monotonically increasing number per project as the run_id.

diff.patch goes into wandb during dryrun

Should go into the generated run directory.

Network error caused crash

Someother stuff 6
Repeat: 0, Epoch: 7, accuracy: 0.544050224345607
Someother stuff 7
Repeat: 0, Epoch: 8, accuracy: 0.31182500329903007
Someother stuff 8
Repeat: 0, Epoch: 9, accuracy: 0.8873673942273215
Someother stuff 9

wandb: Script ended.
wandb: Run summary:
wandb: epoch 0
wandb: acc 0.9474932868908995
wandb: Run history:
wandb: epoch ▁▂▃▃▄▅▆▆▇█
wandb: acc █▁▃▂█▇▃▅▃█
wandb: Waiting for final file modifications.
wandb: Syncing files in wandb/run-r56z43:
wandb: config.yaml
wandb: diff.patch
wandb: modify_file.txt
wandb: wandb-history.jsonl
wandb: events.out.tfevents.1485040358.gpu
wandb: wandb-summary.json
wandb: model.txt
Exception in thread Thread-14:00 of 1091637.000 bytes uploaded
Traceback (most recent call last):
File "/Users/shawn/code/wandb/client/wandb/api.py", line 73, in wrapper
return func(*args, **kwargs)
File "/Users/shawn/code/wandb/client/wandb/api.py", line 442, in upload_urls
'files': [file for file in files]
File "/Users/shawn/.pyenv/versions/wandb_cli-3.6/lib/python3.6/site-packages/gql-0.1.0-py3.6.egg/gql/client.py", line 52, in execute
raise Exception(str(result.errors[0]))
Exception: {'message': 'The API call urlfetch.Fetch() took too long to respond and was cancelled.', 'code': 500, 'locations': [{'column': 13, 'line': 9}]}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/shawn/code/wandb/client/wandb/api.py", line 73, in wrapper
return func(*args, **kwargs)
File "/Users/shawn/code/wandb/client/wandb/api.py", line 597, in push
project, files, run, entity, description)
File "/Users/shawn/code/wandb/client/wandb/api.py", line 93, in wrapper
raise CommError(message)
wandb.api.CommError: The API call urlfetch.Fetch() took too long to respond and was cancelled.

During handling of the above exception, another exception occurred:

Error in atexit._run_exitfuncs:0 of 1091637.000 bytes uploaded
Traceback (most recent call last):
File "/Users/shawn/code/wandb/client/wandb/sync.py", line 400, in stop
local_md5 = util.md5_file(local_path)
File "/Users/shawn/code/wandb/client/wandb/util.py", line 186, in md5_file
hash_md5 = hashlib.md5()
NameError: name 'hashlib' is not defined
wandb: job (python) Process exited with code: 0

Consider printing wandb log warnings/errors to console

Currently they all go to wandb.log, it might also be nice to display them to the user (for example network errors). But probably we don't inject them into the user's saved training.log? We also may want to save wandb.log to wandb for each run.

weird line breaks in the logging

Weights and Biases version: 4.24
Python version: 2.7
Operating System: linux

Description

from https://app.wandb.ai/l2k2/perceptron/runs/z7z6bhqc

Epoch 14/100 60000/60000 [==============================] - 2s - loss: 0.2445 - val_loss: 0.2666������������������������������������������������������� 
Epoch 15/100 17440/60000 [=======>......................] - ETA: 1s - loss: 0.2434���������������������������������������������������������������������
60000/60000 [==============================] - 2s - loss: 0.2341 - val_loss: 0.2728������������������������������������������������������� Epoch 30/100 60000/60000 [==============================] - 2s - loss: 0.2338 - val_loss: 0.2757�������������������������������������������������������

CLI hangs indefinitely when it can't talk to host

Weights and Biases version: feature/streaming_and_wandb_run
Python version: 3.6
Operating System: Mac

Description

If the cli can't talk to the server, it will hang retrying indefinitely

What I Did

Just set the base uri to something invalid or to a not running local instance and run a command that talks to the server.

If I modify the files near the end of my script, they don't get uploaded.

Version 0.4.17

If I modify the files near the end of my script, they don't get uploaded.

wandb.sync() swallowing my script's traceback?

It looks like my training script bailed at train.py:76, which attempts to open an h5 file.

Running from /wandb/gaze at this commit:
https://github.com/wandb/gaze/tree/63e2ddb54f38f8a4a0e621b7bf5b3c1c1b509795

Log when wandb.sync call is not commented out:

(gaze) Shawns-MacBook-Pro:gaze shawn$ python train.py --max_epochs=1  ~/code/gaze-data/ds-office-shawn-1/ ~/code/gaze-data/ds-office-shawn-2/
Using TensorFlow backend.
Syncing https://app.wandb.ai/shawn/gaze/runs/5bp7ws
Pushing log
Synced https://app.wandb.ai/shawn/gaze/runs/5bp7ws
ERROR: unable to create file (File accessability: Unable to open file)(gaze) Shawns-MacBook-Pro:gaze shawn$

Log when wandb.sync call is commented out:

(gaze) Shawns-MacBook-Pro:gaze shawn$ python train.py --env=floyd --max_epochs=1  ~/code/gaze-data/ds-office-shawn-1/ ~/code/gaze-data/ds-office-shawn-2/
Using TensorFlow backend.
Traceback (most recent call last):
  File "train.py", line 158, in <module>
    main(sys.argv)
  File "train.py", line 76, in main
    [h5py.File(d)['face_img'] for d in args.dataset])
  File "/Users/shawn/.pyenv/versions/gaze/lib/python2.7/site-packages/h5py/_hl/files.py", line 207, in __init__
    fid = make_fid(name, mode, userblock_size, fapl)
  File "/Users/shawn/.pyenv/versions/gaze/lib/python2.7/site-packages/h5py/_hl/files.py", line 98, in make_fid
    fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
  File "h5f.pyx", line 90, in h5py.h5f.create (h5py/h5f.c:1709)
IOError: unable to create file (File accessability: Unable to open file)

wandb init shouldn't upsertModel if the project isn't owned by the user

Description

If a user selects a project that isn't their own, we shouldn't upsert the model. Currently we do it to set the git remote of the project, which might be something we can get rid of all-together. Maybe we only call upsert model if the project doesn't have a git remote? Unclear the best thing to do here.

crash when "wandb init" with new project.

"wandb init" and select "create new" when asked which project.

History quietly accepts a single string and then behaves weirdly

Weights and Biases version: 0.4.19
Python version: 2.7
Operating System: os x

Description

If I call:
history = wandb.History('loss')

I get a csv that looks like
l,o,s,s
,,,
,,,
,,,
,,,
,,,
,,,
,,,

I would prefer it to fail and ask me to initialize with an array.

ugly error message when execution file is missing

Weights and Biases version: 0.4.25
Python version: 2.7
Operating System: linux

Description

If i do wandb run file.py and file.py doesn't exist

Traceback (most recent call last):
  File "/usr/local/bin/wandb", line 11, in <module>
    sys.exit(cli())
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/wandb/cli.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/wandb/cli.py", line 49, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/wandb/cli.py", line 522, in run
    proc.run()
  File "/usr/local/lib/python2.7/dist-packages/wandb/util.py", line 79, in run
    self._popen = subprocess.Popen(self._args, env=self._env)
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

wandb error when no .wandb should exit

Log:

shawn@wbeast:~/code/gaze$ CUDA_VISIBLE_DEVICES=0 python train.py --env=wandb --max_epochs=2 ~/data/gaze/prep-ds-office-shawn-2-0.h5 ~/data/gaze/prep-ds-office-shawn-1-0.h5
Using TensorFlow backend.
WARNING:root:Unable to persist config, no .wandb directory exists.  Run `wandb config init` in this directory.
!!! Fatal W&B Error: '_model'
ERROR:wandb.sync:Traceback (most recent call last):

  File "/home/shawn/.local/lib/python2.7/site-packages/wandb/sync.py", line 87, in watch
    config=self.config.__dict__, description=self._description)

  File "/home/shawn/.local/lib/python2.7/site-packages/wandb/api.py", line 64, in wrapper
    raise Error(message)

Error: '_model'

('face_train shape: ', (2835, 260, 260, 3))
('masks_train shape:', (2835, 36, 64))
('y_train shape: ', (2835, 2))
('face_val shape: ', (119, 260, 260, 3))
('mask_val shape:', (119, 36, 64))
('y_val shape: ', (119, 2))
('model output_shape: ', (None, 2))
Train on 2835 samples, validate on 119 samples
Epoch 1/2
2017-09-06 09:36:49.221831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.6705
pciBusID 0000:06:00.0
Total memory: 10.91GiB
Free memory: 10.76GiB
2017-09-06 09:36:49.221857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-09-06 09:36:49.221863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y
2017-09-06 09:36:49.221873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0)

Ugly error messages in log streaming

Weights and Biases version: 0.4.19
Python version: 2.7
Operating System: os x

Description

Error messages from the log thread showing up in my console like:

Exception in thread Thread-240234:
Traceback (most recent call last):
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/site-packages/wandb/streaming_log.py", line 122, in write
super(StreamingLog, self).write(chars)
File "/usr/local/lib/python2.7/site-packages/wandb/streaming_log.py", line 77, in flush
self.posting = False
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 216, in exit
self.release()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 204, in release
raise RuntimeError("cannot release un-acquired lock")
RuntimeError: cannot release un-acquired lock

wandb.config.update doesn't work

Weights and Biases version: 0.4.22
Python version: 2.7
Operating System: OS X

Description

wandb.config.update(FLAGS) leads to the following error - maybe this is expected behavior?

Traceback (most recent call last):
File "mnist_cnn.py", line 29, in
wandb.config.update(FLAGS)
AttributeError: 'module' object has no attribute 'update'

wandb.history.add produces confusing error message

Should say module object has no attribute 'history', but we most have history in wandb. Let's fix that.

Traceback (most recent call last):
File "./run_pg.py", line 79, in
run_policy_gradient_algorithm(env, agent, callback=callback, usercfg = cfg)
File "/Users/shawn/code/modular_rl/modular_rl/core.py", line 117, in run_policy_gradient_algorithm
if callback: callback(stats)
File "./run_pg.py", line 74, in callback
wandb.history.add(stats.items())
AttributeError: 'module' object has no attribute 'add'

Another python 2 issue

Weights and Biases version: 0.4.1
Python version: 2.7
Operating System: ubuntu

Description

wandb init

What entity should we scope to? [models]: Traceback (most recent call last):
  File "/home/l2k/.local/bin/wandb", line 11, in <module>
    sys.exit(cli())
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/wandb/cli.py", line 38, in wrapper
    return func(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/wandb/cli.py", line 298, in init
    result = ctx.invoke(projects, entity=entity)
  File "/home/l2k/.local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/wandb/cli.py", line 38, in wrapper
    return func(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/wandb/cli.py", line 86, in projects
    projects = api.list_projects(entity=entity)
  File "/home/l2k/.local/lib/python2.7/site-packages/wandb/api.py", line 41, in wrapper
    return func(*args, **kwargs)
  File "/home/l2k/.local/lib/python2.7/site-packages/wandb/api.py", line 156, in list_projects
    'entity': entity or self.config('entity')})['models'])
  File "/home/l2k/.local/lib/python2.7/site-packages/gql/client.py", line 52, in execute
    raise Exception(str(result.errors[0]))
Exception: {u'message': u'Resource not found', u'locations': [{u'column': 3, u'line': 2}]}

Don't always auto-create wandb/

Currently for example if you run "wandb --help", we'll create a wandb directory, wherever you happen to be.

We need to think about the correct behavior here.

funny error when setting up wandb on server

Weights and Biases version: 0.4.16
Python version: 2.7.12
Operating System: ubuntu

Let's setup this directory for W&B!
You can find your API keys here: https://app.wandb.ai/profile
Not authenticated! Paste an API key from your profile []: 53326a0434aab4846d127b65fd0d79471071715f
Appending to netrc /home/ubuntu/.netrc
Traceback (most recent call last):
File "/usr/local/bin/wandb", line 11, in
sys.exit(cli())
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/wandb/cli.py", line 34, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/wandb/cli.py", line 349, in init
entity = click.prompt("What username or org should we use?", default=api.viewer().get('entity', 'models'))
AttributeError: 'NoneType' object has no attribute 'get'

ctrl-c "wandb run" can be confusing

"wandb run" popens the user's training script. When you hit ctrl-c, both processes receive the SIGINT signal (https://stackoverflow.com/questions/19807134/python-sub-process-ctrlc)

But if the child process doesn't want to die, like in our case where we have an exit hook that may continue trying to save files, things look strange. The parent process dies and the user gets access to their terminal, but the child process continues to run in the background, occasionally printing things.

A few options:

when parent receives ctrl-c, SIGINT the child twice. That should actually make it die though it's probably unexpected.
handle SIGINT in the child and die immediately. This sucks because we may be interfering with the user's script's signal handling
in our exit hook detect the cause of death and die faster in most cases (other than normal exit)
keep the parent alive on ctrl-c until the child dies.
and more!

At first glance a combo of 3 and 4 might be a good solution.

wandb / wandb Goto Github PK

wandb's Introduction

Documentation

Quickstart

Integrations

W&B Hosting Options

Contribution guidelines

W&B Community

License

wandb's People

Contributors

Stargazers

Watchers

Forkers

wandb's Issues

Description

Description

Description

Description

What I Did

Description

Description

What I Did

Description

Description

Description

What I Did

Description

Description

What I Did

Description

Description

Description

Description

Description

Description

Recommend Projects

Recommend Topics

Recommend Org