Giter VIP home page Giter VIP logo

Comments (7)

samuelstanton avatar samuelstanton commented on June 29, 2024

Can you tell me more about your setup? are you running the code on a machine with more than one GPU?

from lambo.

yuyang-0825 avatar yuyang-0825 commented on June 29, 2024

Hi,
I followed the steps in the README to install it and ran it on a machine with only one GPU.
And after debugging, I found that in the place where the error was reported:

File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/torch/distributions/lkj_cholesky.py", line 117, in log_prob
unnormalized_log_pdf = torch.sum(order * diag_elems.log(), dim=-1)

The device of order is cpu and the device of diag_elems is cuda:0. I think that's the problem.
Thank you.

from lambo.

samuelstanton avatar samuelstanton commented on June 29, 2024

can you post the full stack trace?

from lambo.

yuyang-0825 avatar yuyang-0825 commented on June 29, 2024

Sure, the following is the full stack trace:

logger:
  _target_: upcycle.logging.DataFrameLogger
  log_dir: data/experiments/test/vibrant-flower-23/2022-06-17_09-42-45
task:
  _target_: lambo.tasks.regex.RegexTask
  regex_list:
  - (?=AV)
  - (?=VC)
  - (?=CA)
  obj_dim: 3
  log_prefix: regex
  min_len: 32
  max_len: 36
  num_start_examples: 512
  batch_size: 16
  max_num_edits: null
  max_ngram_size: 1
  allow_len_change: true
acquisition:
  _target_: lambo.acquisitions.ehvi.NoisyEHVI
  num_samples: 2
  batch_size: 16
encoder:
  _target_: lambo.models.lm_elements.LanguageModel
  name: mlm_cnn
  model:
    _target_: lambo.models.shared_elements.mCNN
    tokenizer:
      _target_: lambo.utils.ResidueTokenizer
    max_len: 36
    embed_dim: 64
    latent_dim: 16
    out_dim: 16
    kernel_size: 5
    p: 0.0
    layernorm: true
    max_len_delta: 0
  batch_size: 32
  num_epochs: 128
  patience: 32
  lr: 0.001
  max_shift: 0
  mask_ratio: 0.125
optimizer:
  _target_: lambo.optimizers.pymoo.ModelBasedGeneticOptimizer
  _recursive_: false
  num_rounds: 64
  num_gens: 32
  seed: 0
  concentrate_pool: 1
  residue_sampler: uniform
  resampling_weight: 1.0
  encoder_obj: mll
  algorithm:
    _target_: pymoo.algorithms.soo.nonconvex.ga.GA
    pop_size: 16
    n_offsprings: null
    sampling:
      _target_: lambo.optimizers.sampler.BatchSampler
      batch_size: 16
    crossover:
      _target_: lambo.optimizers.crossover.BatchCrossover
      prob: 0.25
      prob_per_query: 0.25
    mutation:
      _target_: lambo.optimizers.mutation.LocalMutation
      prob: 1.0
      eta: 16
      safe_mut: false
    eliminate_duplicates: true
tokenizer:
  _target_: lambo.utils.ResidueTokenizer
surrogate:
  _target_: lambo.models.gp_models.MultiTaskExactGP
  max_shift: 0
  mask_size: 0
  bootstrap_ratio: null
  min_num_train: 128
  task_noise_init: 0.25
  gp_lr: 0.005
  enc_lr: 0.005
  bs: 32
  eval_bs: 16
  num_epochs: 256
  holdout_ratio: 0.2
  early_stopping: true
  patience: 32
  eval_period: 2
  out_dim: 3
  feature_dim: 16
  encoder_wd: 0.0001
  rank: null
  task_covar_prior:
    _target_: gpytorch.priors.LKJCovariancePrior
    'n': 3
    eta: 2.0
    sd_prior:
      _target_: gpytorch.priors.SmoothedBoxPrior
      a: 0.0001
      b: 1.0
  data_covar_module:
    _target_: gpytorch.kernels.MaternKernel
    ard_num_dims: 16
    lengthscale_prior:
      _target_: gpytorch.priors.NormalPrior
      loc: 0.7
      scale: 0.01
  likelihood:
    _target_: gpytorch.likelihoods.MultitaskGaussianLikelihood
    num_tasks: 3
    has_global_noise: false
    noise_constraint:
      _target_: gpytorch.constraints.GreaterThan
      lower_bound: 0.0001
seed: 0
trial_id: 0
project_name: lambo
version: v0.2.1
data_dir: data/experiments
exp_name: test
job_name: vibrant-flower-23
timestamp: 2022-06-17_09-42-45
log_dir: data/experiments/test
wandb_mode: online

GPU available: True
|    |   round_idx |   hypervol_abs |   hypervol_rel |   num_bb_evals |   time_elapsed |
|---:|------------:|---------------:|---------------:|---------------:|---------------:|
|  0 |           0 |          2.048 |              1 |              0 |     0.00963354 |

 best candidates
|    |   obj_val_0 |   obj_val_1 |   obj_val_2 |
|---:|------------:|------------:|------------:|
|  0 |     -2.0000 |     -2.0000 |     -2.0000 |

active set contracted to 4 pareto points
active set augmented with 12 random points
402 train, 56 val, 54 test

---- preparing checkpoint ----
starting val NLL: 1.6021

---- fitting all params ----
[2022-06-17 10:09:25,165][root][ERROR] - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Traceback (most recent call last):
  File "/home/bcell/home/lambo/scripts/black_box_opt.py", line 55, in main
    metrics = optimizer.optimize(
  File "/home/bcell/home/lambo/lambo/optimizers/pymoo.py", line 189, in optimize
    problem = self._create_inner_task(
  File "/home/bcell/home/lambo/lambo/optimizers/pymoo.py", line 389, in _create_inner_task
    records = self.surrogate_model.fit(
  File "/home/bcell/home/lambo/lambo/models/gp_models.py", line 321, in fit
    return fit_gp_surrogate(**fit_kwargs)
  File "/home/bcell/home/lambo/lambo/models/gp_utils.py", line 208, in fit_gp_surrogate
    enc_sup_loss = fit_encoder_only(
  File "/home/bcell/home/lambo/lambo/models/gp_utils.py", line 76, in fit_encoder_only
    loss = gp_train_step(surrogate, optimizer, inputs, targets, mll)
  File "/home/bcell/home/lambo/lambo/models/gp_utils.py", line 60, in gp_train_step
    loss = -mll(output, targets).mean()
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/module.py", line 30, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 63, in forward
    res = self._add_other_terms(res, params)
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 43, in _add_other_terms
    res.add_(prior.log_prob(closure(module)).sum())
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/priors/lkj_prior.py", line 105, in log_prob
    log_prob_corr = self.correlation_prior.log_prob(correlations)
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/priors/lkj_prior.py", line 62, in log_prob
    return super().log_prob(X_cholesky)
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/gpytorch/priors/prior.py", line 27, in log_prob
    return super(Prior, self).log_prob(self.transform(x))
  File "/home/bcell/anaconda3/envs/lambo-env/lib/python3.8/site-packages/torch/distributions/lkj_cholesky.py", line 117, in log_prob
    unnormalized_log_pdf = torch.sum(order * diag_elems.log(), dim=-1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

from lambo.

samuelstanton avatar samuelstanton commented on June 29, 2024

I just tested things again starting with a fresh conda env, following the installation instructions in the README, and I wasn't able to reproduce the issue. Mostly likely this is a problem with your virtual python environment. What versions of pytorch, gpytorch and cudatoolkit are installed? You can use conda list to see what versions you're using. What version of CUDA do you have installed? What OS are you running on?

from lambo.

samuelstanton avatar samuelstanton commented on June 29, 2024

closing due to inactivity, feel free to reopen if you have further questions

from lambo.

samuelstanton avatar samuelstanton commented on June 29, 2024

@yuyang-0825 you might be interested to know I was able to reproduce your error while investigating #7, and I can confirm the device error occurs if you have gpytorch>=1.7 installed. If you install gpytorch using the pinned commit hash in requirements.txt the program should run as intended.

from lambo.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.