Hello, I tried to run the model-based genetic baseline by following your command.<

Sure, the following is the full stack trace: <div class="snippet-clipboard-content

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Device error about lambo HOT 7 CLOSED

samuelstanton commented on June 29, 2024

Device error

from lambo.

Comments (7)

samuelstanton commented on June 29, 2024

Can you tell me more about your setup? are you running the code on a machine with more than one GPU?

from lambo.

yuyang-0825 commented on June 29, 2024

Hi,
I followed the steps in the README to install it and ran it on a machine with only one GPU.
And after debugging, I found that in the place where the error was reported:

File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/torch/distributions/lkj_cholesky.py", line 117, in log_prob
unnormalized_log_pdf = torch.sum(order * diag_elems.log(), dim=-1)

The device of order is cpu and the device of diag_elems is cuda:0. I think that's the problem.
Thank you.

from lambo.

samuelstanton commented on June 29, 2024

can you post the full stack trace?

from lambo.

yuyang-0825 commented on June 29, 2024

Sure, the following is the full stack trace:

logger:
  _target_: upcycle.logging.DataFrameLogger
  log_dir: data/experiments/test/vibrant-flower-23/2022-06-17_09-42-45
task:
  _target_: lambo.tasks.regex.RegexTask
  regex_list:
  - (?=AV)
  - (?=VC)
  - (?=CA)
  obj_dim: 3
  log_prefix: regex
  min_len: 32
  max_len: 36
  num_start_examples: 512
  batch_size: 16
  max_num_edits: null
  max_ngram_size: 1
  allow_len_change: true
acquisition:
  _target_: lambo.acquisitions.ehvi.NoisyEHVI
  num_samples: 2
  batch_size: 16
encoder:
  _target_: lambo.models.lm_elements.LanguageModel
  name: mlm_cnn
  model:
    _target_: lambo.models.shared_elements.mCNN
    tokenizer:
      _target_: lambo.utils.ResidueTokenizer
    max_len: 36
    embed_dim: 64
    latent_dim: 16
    out_dim: 16
    kernel_size: 5
    p: 0.0
    layernorm: true
    max_len_delta: 0
  batch_size: 32
  num_epochs: 128
  patience: 32
  lr: 0.001
  max_shift: 0
  mask_ratio: 0.125
optimizer:
  _target_: lambo.optimizers.pymoo.ModelBasedGeneticOptimizer
  _recursive_: false
  num_rounds: 64
  num_gens: 32
  seed: 0
  concentrate_pool: 1
  residue_sampler: uniform
  resampling_weight: 1.0
  encoder_obj: mll
  algorithm:
    _target_: pymoo.algorithms.soo.nonconvex.ga.GA
    pop_size: 16
    n_offsprings: null
    sampling:
      _target_: lambo.optimizers.sampler.BatchSampler
      batch_size: 16
    crossover:
      _target_: lambo.optimizers.crossover.BatchCrossover
      prob: 0.25
      prob_per_query: 0.25
    mutation:
      _target_: lambo.optimizers.mutation.LocalMutation
      prob: 1.0
      eta: 16
      safe_mut: false
    eliminate_duplicates: true
tokenizer:
  _target_: lambo.utils.ResidueTokenizer
surrogate:
  _target_: lambo.models.gp_models.MultiTaskExactGP
  max_shift: 0
  mask_size: 0
  bootstrap_ratio: null
  min_num_train: 128
  task_noise_init: 0.25
  gp_lr: 0.005
  enc_lr: 0.005
  bs: 32
  eval_bs: 16
  num_epochs: 256
  holdout_ratio: 0.2
  early_stopping: true
  patience: 32
  eval_period: 2
  out_dim: 3
  feature_dim: 16
  encoder_wd: 0.0001
  rank: null
  task_covar_prior:
    _target_: gpytorch.priors.LKJCovariancePrior
    'n': 3
    eta: 2.0
    sd_prior:
      _target_: gpytorch.priors.SmoothedBoxPrior
      a: 0.0001
      b: 1.0
  data_covar_module:
    _target_: gpytorch.kernels.MaternKernel
    ard_num_dims: 16
    lengthscale_prior:
      _target_: gpytorch.priors.NormalPrior
      loc: 0.7
      scale: 0.01
  likelihood:
    _target_: gpytorch.likelihoods.MultitaskGaussianLikelihood
    num_tasks: 3
    has_global_noise: false
    noise_constraint:
      _target_: gpytorch.constraints.GreaterThan
      lower_bound: 0.0001
seed: 0
trial_id: 0
project_name: lambo
version: v0.2.1
data_dir: data/experiments
exp_name: test
job_name: vibrant-flower-23
timestamp: 2022-06-17_09-42-45
log_dir: data/experiments/test
wandb_mode: online

GPU available: True
|    |   round_idx |   hypervol_abs |   hypervol_rel |   num_bb_evals |   time_elapsed |
|---:|------------:|---------------:|---------------:|---------------:|---------------:|
|  0 |           0 |          2.048 |              1 |              0 |     0.00963354 |

 best candidates
|    |   obj_val_0 |   obj_val_1 |   obj_val_2 |
|---:|------------:|------------:|------------:|
|  0 |     -2.0000 |     -2.0000 |     -2.0000 |

active set contracted to 4 pareto points
active set augmented with 12 random points
402 train, 56 val, 54 test

---- preparing checkpoint ----
starting val NLL: 1.6021

---- fitting all params ----
[2022-06-17 10:09:25,165][root][ERROR] - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Traceback (most recent call last):
  File "/home/bcell/home/lambo/scripts/black_box_opt.py", line 55, in main
    metrics = optimizer.optimize(
  File "/home/bcell/home/lambo/lambo/optimizers/pymoo.py", line 189, in optimize
    problem = self._create_inner_task(
  File "/home/bcell/home/lambo/lambo/optimizers/pymoo.py", line 389, in _create_inner_task
    records = self.surrogate_model.fit(
  File "/home/bcell/home/lambo/lambo/models/gp_models.py", line 321, in fit
    return fit_gp_surrogate(**fit_kwargs)
  File "/home/bcell/home/lambo/lambo/models/gp_utils.py", line 208, in fit_gp_surrogate
    enc_sup_loss = fit_encoder_only(
  File "/home/bcell/home/lambo/lambo/models/gp_utils.py", line 76, in fit_encoder_only
    loss = gp_train_step(surrogate, optimizer, inputs, targets, mll)
  File "/home/bcell/home/lambo/lambo/models/gp_utils.py", line 60, in gp_train_step
    loss = -mll(output, targets).mean()
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/module.py", line 30, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 63, in forward
    res = self._add_other_terms(res, params)
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 43, in _add_other_terms
    res.add_(prior.log_prob(closure(module)).sum())
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/priors/lkj_prior.py", line 105, in log_prob
    log_prob_corr = self.correlation_prior.log_prob(correlations)
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/priors/lkj_prior.py", line 62, in log_prob
    return super().log_prob(X_cholesky)
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/priors/prior.py", line 27, in log_prob
    return super(Prior, self).log_prob(self.transform(x))
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/torch/distributions/lkj_cholesky.py", line 117, in log_prob
    unnormalized_log_pdf = torch.sum(order * diag_elems.log(), dim=-1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

from lambo.

samuelstanton commented on June 29, 2024

I just tested things again starting with a fresh conda env, following the installation instructions in the README, and I wasn't able to reproduce the issue. Mostly likely this is a problem with your virtual python environment. What versions of pytorch, gpytorch and cudatoolkit are installed? You can use conda list to see what versions you're using. What version of CUDA do you have installed? What OS are you running on?

from lambo.

samuelstanton commented on June 29, 2024

closing due to inactivity, feel free to reopen if you have further questions

from lambo.

samuelstanton commented on June 29, 2024

@yuyang-0825 you might be interested to know I was able to reproduce your error while investigating #7, and I can confirm the device error occurs if you have gpytorch>=1.7 installed. If you install gpytorch using the pinned commit hash in requirements.txt the program should run as intended.

from lambo.

Device error about lambo HOT 7 CLOSED

Comments (7)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent