yandex-research / tabular-dl-tabr Goto Github PK

View Code? Open in Web Editor NEW

222.0 222.0 23.0 19.97 MB

The implementation of "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning"

Home Page: https://arxiv.org/abs/2307.14338

License: MIT License

Python 82.72% Jupyter Notebook 17.28%

deep-learning machine-learning paper pytorch research tabular-data

tabular-dl-tabr's People

Contributors

Stargazers

Watchers

tabular-dl-tabr's Issues

micromamba environment setup issue

Hi, thanks for sharing this repo.

I tried to setup an environment by following the instruction in README: micromamba create -f environment.yaml
However, I got the following errors.
I was able to resolve the issues of cudatoolkit, panel and bokeh by modifying the version, but not for pytorch.
Could you help me to address this issue?

nvidia/linux-64                                               No change
nvidia/noarch                                                 No change
conda-forge/noarch                                            No change
conda-forge/linux-64                                          No change
pytorch/noarch                                                No change
pytorch/linux-64                                              No change
pyviz/linux-64                                                No change
pyviz/noarch                                                  No change
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
warning  libmamba Problem type not implemented SOLVER_RULE_STRICT_REPO_PRIORITY
error    libmamba Could not solve for environment specs
    The following packages are incompatible
    ├─ bokeh 3.0.3**  is requested and can be installed;
    ├─ cudatoolkit 11.8.0**  is not installable because it conflicts with any installable versions previously reported;
    ├─ panel 0.10.3**  is not installable because there are no viable options
    │  ├─ panel 0.10.3 would require
    │  │  └─ bokeh >=2.2,<2.3 , which conflicts with any installable versions previously reported;
    │  └─ panel 0.10.3 conflicts with any installable versions previously reported;
    └─ pytorch 1.13.1*  is not installable because it conflicts with any installable versions previously reported.
critical libmamba Could not solve for environment specs

When n_classes>1, how 'self.label_encoder‘ do for label(float)?

‘context_y_emb = self.label_encoder(candidate_y[context_idx][..., None])’ is W(yi) of Step-1. Adding context labels. Here candidate_y[context_idx] are the labels of retrieval samples,
When classification task, n_classes>1, candidate_y[context_idx] value are integers.,
so self.label_encoder do:

           else nn.Sequential(
                nn.Embedding(n_classes, d_main), delu.nn.Lambda(lambda x: x.squeeze(-2)) 
            )

I would like to know how to make this nn.Sequential operate on non-integer numbers?
For example, label goes from {Tensor:(512,96,)}=tensor([[7, 7, 7, ..., 8, 0, 7],
[8, 8, 8, ..., 8, 8, 8],
[2, 2, 2, ..., 6, 2, 1],
...,
[5, 5, 5, ..., 5, 5, 5],
[5, 5, 5, ..., 5, 5, 5],
[5, 5, 5, ..., 5, 5, 5]], device='cuda:0') to
{Tensor:(512,96,)}=tensor([[7.2188, 7.2188, 7.2188, ..., 7.7451, 0.0000, 7.2188],
[7.7451, 7.7451, 7.7451, ..., 7.7451, 7.7451, 7.7451],
[1.9530, 1.9530, 1.9530, ..., 6.3834, 1.9530, 0.9778],
...,
[5.5154, 5.5154, 5.5154, ..., 5.5154, 5.5154, 5.5154],
[5.5154, 5.5154, 5.5154, ..., 5.5154, 5.5154, 5.5154],
[5.5154, 5.5154, 5.5154, ..., 5.5154, 5.5154, 5.5154]],
device='cuda:0').

err: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

Thanks~

Could you please share the code to create new dataset directory?

Could you please share the code in order to transform the new dataset into X_num_train.npy, X_num_val.npy, X_num_test.npy...?

How to understand the relationship between tune.py and evaluate.py?

Hello,
My understanding is:
when tune.py finished, will get checkpoint.pt, DONE, report.json, summay.json, but only report.json and DONE(which is empty) are used by evaluate.py, which provide lr, weight_dalay,dropout parameters. The checkpoint.pt is not used.

Usually train.py gets all those parameters, just trail=1 and the early stop mechanism is used, doesn't apply trail 100.

So would you please explain the relationship between tune.py, evaluate.py,ensemble.py in more detail?

RuntimeError: mat1 and mat2 must have the same dtype

thanks very much for this great work.

I am tring to understanding the code and use it in my research.

I encounter an error and don't know how to fix it. Any suggestions would be greatly appreciated.

here is the code:

%%

data = {
"X_num": {
"train": X_train,
"val": X_test
},
"Y": {
"train": y_train,
"val": y_test
}
}

%%

dataset = Dataset(
data=data,
task_type=TaskType.REGRESSION,
score='rmse',
y_info=None,
_Y_numpy=None
)

seed = 42
model = {'num_embeddings': None, # Example embedding configuration
'd_main': 64,
'd_multiplier': 1.0,
'encoder_n_blocks': 2,
'predictor_n_blocks': 2,
'mixer_normalization': False,
'context_dropout': 0.1,
'dropout0': 0.1,
'dropout1': 0.1,
'normalization': 'BatchNorm1d',
'activation': 'ReLU'
}

define Config

config = Config(
seed=seed,
data=dataset,
model=model,
context_size=5,
optimizer={'type': 'Adam', 'lr': 0.001},
batch_size=64,
patience=10,
n_epochs=10,
)

%%

output_path = "./output"
force = True
report = main(config, output_path, force=force)

the error details are as follows:

RuntimeError Traceback (most recent call last)
File /Users/hjyu/Library/Mobile Documents/com~~apple~~CloudDocs/Code/Transfer_Learning_Tabular/TabR/tabr_test.py:4
2 output_path = "./output"
3 force = True
----> 4 report = main(config, output_path, force=force)

File ~/Library/Mobile Documents/com~~apple~~CloudDocs/Code/Transfer_Learning_Tabular/TabR/bin/tabr.py:508, in main(config, output, force)
503 epoch_losses = []
504 for batch_idx in tqdm(
505 lib.make_random_batches(train_size, C.batch_size, device),
506 desc=f'Epoch {epoch}',
507 ):
--> 508 loss, new_chunk_size = lib.train_step(
509 optimizer,
510 lambda idx: loss_fn(apply_model('train', idx, True), Y_train[idx]),
511 batch_idx,
512 chunk_size or C.batch_size,
513 )
514 epoch_losses.append(loss.detach())
515 if new_chunk_size and new_chunk_size < (chunk_size or C.batch_size):

File ~/Library/Mobile Documents/com~~apple~~CloudDocs/Code/Transfer_Learning_Tabular/TabR/lib/deep.py:447, in train_step(optimizer, step_fn, batch, chunk_size)
445 optimizer.zero_grad()
446 if batch_size <= chunk_size:
--> 447 loss = step_fn(batch)
448 loss.backward()
449 else:

File ~/Library/Mobile Documents/com~~apple~~CloudDocs/Code/Transfer_Learning_Tabular/TabR/bin/tabr.py:510, in main..(idx)
503 epoch_losses = []
504 for batch_idx in tqdm(
505 lib.make_random_batches(train_size, C.batch_size, device),
506 desc=f'Epoch {epoch}',
507 ):
508 loss, new_chunk_size = lib.train_step(
509 optimizer,
--> 510 lambda idx: loss_fn(apply_model('train', idx, True), Y_train[idx]),
511 batch_idx,
512 chunk_size or C.batch_size,
513 )
514 epoch_losses.append(loss.detach())
515 if new_chunk_size and new_chunk_size < (chunk_size or C.batch_size):

File ~/Library/Mobile Documents/com~~apple~~CloudDocs/Code/Transfer_Learning_Tabular/TabR/bin/tabr.py:436, in main..apply_model(part, idx, training)
428 candidate_indices = candidate_indices[~torch.isin(candidate_indices, idx)]
429 candidate_x, candidate_y = get_Xy(
430 'train',
431 # This condition is here for historical reasons, it could be just
432 # the unconditional candidate_indices.
433 None if candidate_indices is train_indices else candidate_indices,
434 )
--> 436 return model(
437 x_=x,
438 y=y if is_train else None,
439 candidate_x_=candidate_x,
440 candidate_y=candidate_y,
441 context_size=C.context_size,
442 is_train=is_train,
443 ).squeeze(-1)

File ~/anaconda3/envs/tabr/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Library/Mobile Documents/com~~apple~~CloudDocs/Code/Transfer_Learning_Tabular/TabR/bin/tabr.py:243, in Model.forward(self, x_, y, candidate_x_, candidate_y, context_size, is_train)
212 def forward(
213 self,
214 *,
(...)
221 ) -> Tensor:
222 # >>>
223 with torch.set_grad_enabled(
224 torch.is_grad_enabled() and not self.memory_efficient
225 ):
(...)
240 # performed without gradients.
241 # Later, it is recomputed with gradients only for the context objects.
242 candidate_k = (
--> 243 self.encode(candidate_x)[1]
244 if self.candidate_encoding_batch_size is None
245 else torch.cat(
246 [
247 self.encode(x)[1]
248 for x in delu.iter_batches(
249 candidate_x, self.candidate_encoding_batch_size
250 )
251 ]
252 )
253 )
254 x, k = self.encode(x)
255 if is_train:
256 # NOTE: here, we add the training batch back to the candidates after the
257 # function apply_model removed them. The further code relies
258 # on the fact that the first batch_size candidates come from the
259 # training batch.

File ~/Library/Mobile Documents/com~~apple~~CloudDocs/Code/Transfer_Learning_Tabular/TabR/bin/tabr.py:206, in Model._encode(failed resolving arguments)
203 assert x # 断言列表x不为空，这可能是为了确保输入数据的正确性
204 x = torch.cat(x, dim=1)
--> 206 x = self.linear(x)
207 for block in self.blocks0:
208 x = x + block(x)

File ~/anaconda3/envs/tabr/lib/python3.9/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 must have the same dtype

(params_with_wd if needs_wd else params_with_wd)['params'].append(parameter) in deep.py，is there a mistake here? Why if...else... are all connected to params_with_wd?

Thanks for your interesting work, which gives us new hope for tabular data research!
I have some questions to ask:
（1）(params_with_wd if needs_wd else params_with_wd)['params'].append(parameter) in deep.py，is there a mistake here? Why if...else... are all connected to params_with_wd?
（2）This bug occurs during training when trial_0 is completed and trial_1 is in progress, I observed that the gpu usage increases as trial increases, is this normal? Is it possible to release the gpu after trial_0 is completed and then proceed to trial_1? Currently, we only have 12 G of gpu resources.
[...] tmp22mqfk7j_trial_1/output | 0:03:04.846372
Epoch 39: 100%|███████████████████████████████████████████████████████████████████| 52/52 [00:04<00:00, 12.
(val) -0.426 (test) -0.414 (loss) 0.11597█████████████████████████████████████████| 52/52 [00:04<00:00, 13.
[W 2023-10-24 17:37:27,179] Trial 1 failed with parameters: {'model.d_main': 353, 'model.context_dropout': 76563006176, 'model.dropout0': 0.2300649112954666, 'optimizer.lr': 0.0004817508474772368, '?optimizer.weigh': True, 'optimizer.weight_decay': 7.098936257405907e-05} because of the following error: AttributeError("mtorch.cuda' has no attribute 'OutOfMemoryError'").

Diasable time dependet leaks during training

I have now the first results for TabR on my custom dataset! Thanks for your repo so far!

However I still have some problem with the current implementation of TabR.
This is what the paper stats:
"Figure 4: A simplified illustration of the retrieval module R, introduced in Figure 2. For the target object’s representation x˜, the module takes the m nearest neighbors among the candidates {x˜i} according to the similarity module S and aggregates their values produced by the value module V"

This approach is good for non time dependent datasets like the titanic dataset where each element is independent of another.

However, we have data from a auto completion usecase where the column "CREATIONDATE" is the date column which massively affects the results. Knowledge of information of future dates leaks into elements of the past. This is why the train and test split is split in the following way:

df_train = df[(df.CREATIONDATE >= '20190101') & (df.CREATIONDATE <= '20191231')]
df_test = df[(df.CREATIONDATE >= '20200101') & (df.CREATIONDATE <= '20200229')]

You see, the test set is strictly after the train set on the time line. And without the model learning this during training, the test results are not very well.

We somehow also need to achieve this inside the train set during training. It means when predicting the class of one row during train time, we need to make sure that only elements of the train set out of the past (so with CREATIONDATE_candidates < CREATIONDATE_train_element_we_want_to_predict).

Where do I need to change this logic in the code in the best possible way?

The change of the candidate set during training

Thank you for your interesting work, it inspires me a lot.There are a couple of questions that have been bugging me
The work initially uses the entire training set as the fixed set of candidates for all objects.
1.The function 'apply_model' removes them.
2.However,when computing the forward output, it adds the current batch to the candidate set and predicts the output.Should the current batch be added to the candidate set after predicting the output?
3.Additionally, when adding to the candidate set and retrieving context samples, it is guaranteed to retrieve samples that match the target. The related index is then removed from the obtained index.
I can't understand the role and connection of these three operations, can you provide some suggestions

Bug in make_parameter_groups

Hi! The following line in method make_parameter_goups looks very much like a mistype

https://github.com/yandex-research/tabular-dl-tabr/blob/75105013189c76bc4f247633c2fb856bc948e579/lib/deep.py#L364C46-L364C60

params_with_wd if needs_wd else params_with_wd

because of it we never add anything to params_without_wd which defeats the purpose of zero_weight_decay_condition

How to calculate the final result in the results of 100trails

Hello,
From your code, it is clear that n_trails=100, but I find that the last trail result is not optimal, and the paper doesn't seem to go into detail about this, so may I ask what the results reported in the paper do with these 100 trail results? Optimal or average?

And what are the 15 random seeds used in the test, and are they not used in the train?

Thanks~

bug:when eval, AttributeError: module 'torch.cuda' has no attribute 'OutOfMemoryError'

After the training of an epoch, when evaluating, eval_batch_size=32768, meaning that all the validation sets are treated as a batch, when evaluating the otto dataset, there is insufficient GPU memory, how to improve this? When evaluating the otto dataset, the GPU is running out of memory? bug as:
RuntimeError: CUDA out of memory. Tried to allocate 2.25 GiB (GPU 0; 11.17 GiB total capacity; 5.74 GiB already allocated; 2.04 GiB free; 7.13 GiB reserved in total by PyTorch)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/anaconda3/envs/torch/lib/python3.9/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
value_or_values = func(trial)
File "/bin/tune.py", line 160, in objective
report = function(raw_config, Path(tmp) / 'output') #the objective function, in turn, calls the function = "bin.tabr.main"
File "/bin/tabr.py", line 592, in main
metrics, predictions, eval_batch_size = evaluate(
File "/anaconda3/envs/torch/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File /bin/tabr.py", line 537, in evaluate
if not lib.is_oom_exception(err):
File "/lib/util.py", line 493, in is_oom_exception
return isinstance(err, torch.cuda.OutOfMemoryError) or any(
AttributeError: module 'torch.cuda' has no attribute 'OutOfMemoryError'
[W 2023-10-31 20:35:10,691] Trial 0 failed with value None.
thanks~

inference

Hello, I trained your model on my dataset, thank you a lot for this brilliant work.
But I don't understand, how to make prediction on my X_test without y_test.(I put 50% of validation instead of real X_test)

Expected 2d tensor for the single feature of such type, got 1d

Hi! Thank you for your interesting work.
I faced some problems because of this function (

tabular-dl-tabr/lib/data.py

Line 117 in d628ec7

def to_torch(self, device=None) -> 'Dataset[Tensor]':

). I have dataset with only one binary feature, it is flattened to 1d tensor, but later 2d tensor expected. Writing torch.atleast_2d(torch.as_tensor(value)).to(device) instead of torch.as_tensor(value).to(device) solved this problem.

Make a pip-installable Python package

at least fit predict example would be great

How do I get the ± values in my experimental results?

Hi, bothering you again~
For example, in the wine quality dataset, the ensemble performance is 0.620±0.007, and it is known that go.py will get three sets of scores, so how to get this ±0.007?

Can I use this code model as a backbone network for academic paper publishing？

If I use this work as a backbone network, and add my own modules, then publish an academic paper, does that constitute copyright infringement? Of course， I would state that I am citing your work.

OneHotEncoder of Cat features

Hi. Is OneHotEncoder betters working with suggested architecture? Did you test others (OrdinalEncoder, etc)?

? bug: delu.nn.Lambda(lambda x: x.squeeze(-2))

When the classification task is performed, n_classes>1,call self.label_encoder of tabr.py, and there is a bug:
ValueError: fn must be a function from torch or a method of torch.Tensor, but ...
How do I fix this?

Bugs in parallel gpus

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "3,4"
torch.cuda.empty_cache()
print('os.environ["CUDA_VISIBLE_DEVICES"]', os.environ["CUDA_VISIBLE_DEVICES"])
print('Free gpu memory: torch.cuda.empty_cache()')

When I use 2 gpus and train with tabr on otto dataset, the following bug occurs.
How can I debug it?
ps: Training with one gpu is ok, two gpus in parallel is bug.

bug as:
...
Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [22,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [23,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [24,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [25,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [26,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [27,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [28,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [29,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [30,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [6055,0,0], thread: [31,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Faiss assertion 'err == cudaSuccess' failed in virtual void faiss::gpu::StandardGpuResourcesImpl::deallocMemory(int, void*) at /project/faiss/faiss/gpu/StandardGpuResources.cpp:518; details: Failed to cudaFree pointer 0x420b046000 (error 59 device-side assert triggered)
Aborted (core dumped)

Thanks~

How to evaluate the performance of MLP on regression-cat-medium-0-OnlineNewsPopularity?

Hello,
From your code, I check the evaluation report.json of MLP on regression-cat-medium-0-OnlineNewsPopularity. And the best epoch metrics is as follows:
"n_parameters": 495793,
"prediction_type": null,
"best_epoch": 26,
"metrics": {
"train": {
"rmse": 0.8142614908186779,
"mae": 0.5985163409301961,
"r2": 0.23417308630867428,
"score": -0.8142614908186779
},
"val": {
"rmse": 0.844946381250874,
"mae": 0.6250255374955493,
"r2": 0.15331082859100664,
"score": -0.844946381250874
},
"test": {
"rmse": 0.8618776869166989,
"mae": 0.6317802140393205,
"r2": 0.14868952248971512,
"score": -0.8618776869166989
}
}
Generally, lower values for RMSE and MAE are desirable, and R² closer to 1 indicates better explanatory power of the model. Based on the provided results, the model performs relatively poorly on the validation and test sets, and the R² values suggest a limited explanatory capability. Is further optimization of the model or consideration of alternative improvement strategies still necessary? In addition, Tensorboard is provided in the project. How can we analyze this model based on the provided Tensorboard?"
Thanks~

Request help debug: I occur a bug when reproduce the winequality dataset

what the code I change as follow:

cp -r exp/tabr/why/regression-num-medium-2-wine_quality/ exp/tabr/why/regression-num-medium-3-wine_quality/
cp -r data/regression-num-medium-2-wine_quality/ data/regression-num-medium-3-wine_quality/
cp exp/tabr/why/regression-num-medium-3-wine_quality/0-tuning.toml exp/tabr/why/regression-num-medium-3-wine_quality/1-tuning.toml

In 'data/regression-num-medium-3-wine_quality/info.json', change 'regression-num-medium-2-wine_quality' to 'regression-num-medium-3-wine_quality'
In 'exp/tabr/why/regression-num-medium-3-wine_quality/1-tuning.toml', change ''data/regression-num-medium-2-wine_quality'' to ''data/regression-num-medium-3-wine_quality''

python bin/tune.py exp/tabr/why/regression-num-medium-2-wine_quality/1-tuning.toml

python bin/tune.py exp/tabr/why/regression-num-medium-3-wine_quality/1-tuning.toml

Then 'python bin/tune.py exp/tabr/why/regression-num-medium-2-wine_quality/1-tuning.toml' is ok, but 'python bin/tune.py exp/tabr/why/regression-num-medium-3-wine_quality/1-tuning.toml',

[W 2023-11-17 18:09:18,793] Trial 0 failed with value None.
  0%|                                                                                       | 0/100 [00:48<?, ?it/s]
Traceback (most recent call last):
  File "~/bin/tune.py", line 216, in <module>
    lib.run_Function_cli(main)
  File "~/tabR_lzd/lib/util.py", line 276, in run_Function_cli
    function(
  File "~/tabR_lzd/bin/tune.py", line 202, in main
    study.optimize(
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/optuna/study/study.py", line 442, in optimize
    _optimize(
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/optuna/study/_optimize.py", line 66, in _optimize
    _optimize_sequential(
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/optuna/study/_optimize.py", line 251, in _run_trial
    raise func_err
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "~/tabR_lzd/bin/tune.py", line 161, in objective
    report = function(raw_config, Path(tmp) / 'output')  #the objective function, in turn, calls the function = "bin.tabr.main"
  File "~/tabR_lzd/bin/tabr.py", line 595, in main
    metrics, predictions, eval_batch_size = evaluate(
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "~/tabR_lzd/bin/tabr.py", line 549, in evaluate
    dataset.calculate_metrics(predictions, report['prediction_type'])
  File "~/tabR_lzd/lib/data.py", line 235, in calculate_metrics
    metrics = {
  File "~/tabR_lzd/lib/data.py", line 236, in <dictcomp>
    part: calculate_metrics_(
  File "~/tabR_lzd/lib/metrics.py", line 58, in calculate_metrics
    'rmse': sklearn.metrics.mean_squared_error(y_true, y_pred) ** 0.5 * y_std,
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/sklearn/metrics/_regression.py", line 474, in mean_squared_error
    y_type, y_true, y_pred, multioutput = _check_reg_targets(
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/sklearn/metrics/_regression.py", line 101, in _check_reg_targets
    y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
  File "~/anaconda3/envs/torch/lib/python3.9/site-packages/sklearn/utils/validation.py", line 951, in check_array
    raise ValueError(
ValueError: Found array with dim 3. None expected <= 2.

How to debug?
Thanks!

yandex-research / tabular-dl-tabr Goto Github PK

tabular-dl-tabr's People

Contributors

Stargazers

Watchers

Forkers

tabular-dl-tabr's Issues

here is the code:

%%

%%

define Config

%%

the error details are as follows:

Recommend Projects

Recommend Topics

Recommend Org