inference,about yandex-research/tabular-dl-tabr

Comments (26)

Bezdarnost commented on June 10, 2024 1

Hi @Bezdarnost , I am sorry for losing track of this issue. Is it resolved?

Yes, everything alright. Thank you!

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

Hello. During inference, y=None should be passed to the model: link. And we do exactly that during evaluation: link Does this help?

from tabular-dl-tabr.

Bezdarnost commented on June 10, 2024

I don't understand. How to make inference after training

from tabular-dl-tabr.

Bezdarnost commented on June 10, 2024

I don't see the instruction of inference in your README.md
Only train example

from tabular-dl-tabr.

Bezdarnost commented on June 10, 2024

In the beginning it required me to X_train, X_val, X_test, Y_train, Y_val, Y_test. I tried putting my original X_test and keeping Y_test out, but the training refused to start. So I did the X_test and y_test from the validation(50/50). But after the training was over, I couldn't figure out how to re-run the trained model for prediction on the real test

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

I tried putting my original X_test and keeping Y_test out, but the training refused to start.

To solve this, generate a random Y_test.npy and put it in the dataset directory. After the training, you can get the predictions for the real X_test.npy from predictions.npz.

I couldn't figure out how to re-run the trained model for prediction on the real test

The repository is more suitable for reproducing the results reported in the paper and conducting further similar research. If your use case is different (e.g. making predictions with a trained model), then you may need to do some extra work and adjust the codebase accordingly. In your case, you can try copyng and adjusting bin/tabr.py for inference so that it only loads a checkpoint and makes predictions.

from tabular-dl-tabr.

Bezdarnost commented on June 10, 2024

To solve this, generate a random Y_test.npy and put it in the dataset directory. After the training, you can get the predictions for the real X_test.npy from predictions.npz.

Wow, that sounds very clever, almost genius, and how did I never figure it out. Will try it tonight and report back if it works well

from tabular-dl-tabr.

Bezdarnost commented on June 10, 2024

To solve this, generate a random Y_test.npy and put it in the dataset directory. After the training, you can get the predictions for the real X_test.npy from predictions.npz.

Hi, it's not working badly, but it seems that with the fake test the training process is going worse than it could.

The repository is more suitable for reproducing the results reported in the paper and conducting further similar research. If your use case is different (e.g. making predictions with a trained model), then you may need to do some extra work and adjust the codebase accordingly. In your case, you can try copyng and adjusting bin/tabr.py for inference so that it only loads a checkpoint and makes predictions.

Can you suggest what needs to be changed to make predictions after training?

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

Hi @Bezdarnost , I am sorry for losing track of this issue. Is it resolved?

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

Hello, I trained your model on my dataset, thank you a lot for this brilliant work. But I don't understand, how to make prediction on my X_test without y_test.(I put 50% of validation instead of real X_test)

Hello, I am trying to train this model on my dataset too. But I am confused by the .toml file. Could you plese help me understand how to write the file or let me know where to learn this? Thank you.

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

Could you plese help me understand how to write the file or let me know where to learn this?

Hello, TOML is a configuration language: https://toml.io/en

Also, when running the quick test, you can experiment with the exp/debug/0.toml file to become more familiar with TOML.

Numerous other TOML files that we used to run our experiments are scattered across the exp/ folder.

Does this help?

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

Could you plese help me understand how to write the file or let me know where to learn this?

Hello, TOML is a configuration language: https://toml.io/en

Also, when running the quick test, you can experiment with the exp/debug/0.toml file to become more familiar with TOML.

Numerous other TOML files that we used to run our experiments are scattered across the exp/ folder.

Does this help?

Yes, Thanks a lot.
But exp/debug/0.toml file is for MLP. I am trying to use tabr.
In fact, I have tried to copy exp/tabr/otto/0-tuning.toml for running my dataset for a multi-class job. ( Is this a good idea?)
What I have done is changing path = ":data/otto" to the path of my dataset and cat_policy = "__null__" to cat_policy = "ordinal".
But I got the following error:

[W 2024-03-27 18:01:32,928] Trial 0 failed with parameters: {'model.d_main': 254, 'model.context_dropout': 0.4291136198234517, 'model.dropout0': 0.3616580256429863, 'optimizer.lr': 0.000202727317015276, '?optimizer.weight_decay': True, 'optimizer.weight_decay': 7.501954443620125e-06} because of the following error: RuntimeError('The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1').

It seems change the context_size = 96 would affect the error. But when I change it to 512, I got a=512 but b=256 now.

[W 2024-03-27 18:08:01,286] Trial 0 failed with parameters: {'model.d_main': 254, 'model.context_dropout': 0.4291136198234517, 'model.dropout0': 0.3616580256429863, 'optimizer.lr': 0.000202727317015276, '?optimizer.weight_decay': True, 'optimizer.weight_decay': 7.501954443620125e-06} because of the following error: RuntimeError('The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1').

What I mean is that I don't know the meaning of some CI as there is no annotation.
I would appriciate it if you could help me. Thank you.

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

Is this a good idea?

Yes, it should work just fine.

Before further investigation, can you confirm that you can successfully run the experiment with the original config on the otto dataset (not on your custom dataset)?

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

Is this a good idea?

Yes, it should work just fine.

Before further investigation, can you confirm that you can successfully run the experiment with the original config on the otto dataset (not on your custom dataset)?

Yes, it works just fine.
Here is the end of its output.

[...] exp/tabr/otto/0-reproduce-evaluation/14 | 0:01:39
Epoch 91: 100%|█████████████████████████████████████████████████████████| 78/78 [00:00<00:00, 85.37it/s]
(val) 0.823 (test) 0.821 (loss) 0.34059
eval_batch_size = 16384

--------------------------------------------------------------------------------
{'function': 'bin.tabr.main',
 'gpus': ['NVIDIA GeForce RTX 4090'],
 'n_parameters': 734616,
 'best_epoch': 74,
 'scores': {'train': 0.8865937728845231,
            'val': 0.8237551762448238,
            'test': 0.8242566257272139},
 'time': '0:01:40'}
--------------------------------------------------------------------------------
[<<<] exp/tabr/otto/0-reproduce-evaluation/14 | 2024-03-27 20:25:43.457278

--------------------------------------------------------------------------------
{'function': 'bin.ensemble.main',
 'gpus': [],
 'scores': {'train': 0.8684376657155123,
            'val': 0.8258761741238259,
            'test': 0.8236102133160956}}
--------------------------------------------------------------------------------
[<<<] exp/tabr/otto/0-reproduce-ensemble-5/0 | 2024-03-27 20:25:43.771569

--------------------------------------------------------------------------------
{'function': 'bin.ensemble.main',
 'gpus': [],
 'scores': {'train': 0.8689427034670841,
            'val': 0.8256741743258257,
            'test': 0.8240142210730446}}
--------------------------------------------------------------------------------
[<<<] exp/tabr/otto/0-reproduce-ensemble-5/1 | 2024-03-27 20:25:44.036619

--------------------------------------------------------------------------------
{'function': 'bin.ensemble.main',
 'gpus': [],
 'scores': {'train': 0.8700285346329638,
            'val': 0.8270881729118271,
            'test': 0.8240950226244343}}
--------------------------------------------------------------------------------
[<<<] exp/tabr/otto/0-reproduce-ensemble-5/2 | 2024-03-27 20:25:44.305022

My dataset was made accroding to the README.md file.

I think at least the shape and datatype of my dataset are correct.
And here is my info.json.

{
    "name": "mydataset",
    "id": "mydataset--default",
    "task_type": "multiclass",
    "n_num_features": 11,
    "n_bin_features": 0,
    "n_cat_features": 1,
    "test_size": 1904,
    "train_size": 5713,
    "val_size": 1905,
    "n_classes": 5
}

Thanks a lot.

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

I see, thank you for the detailed reply. Can you provide the complete error traceback that you get when running the experiment on your dataset?

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

I see, thank you for the detailed reply. Can you provide the complete error traceback that you get when running the experiment on your dataset?

[W 2024-03-27 20:29:46,573] Trial 0 failed with parameters: {'model.d_main': 254, 'model.context_dropout': 0.4291136198234517, 'model.dropout0': 0.3616580256429863, 'optimizer.lr': 0.000202727317015276, '?optimizer.weight_decay': True, 'optimizer.weight_decay': 7.501954443620125e-06} because of the following error: RuntimeError('The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1').
Traceback (most recent call last):
  File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/mnt/d/path/tabular-dl-tabr/bin/tune.py", line 153, in objective
    report = function(raw_config, Path(tmp) / 'output')
  File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 488, in main
    loss, new_chunk_size = lib.train_step(
  File "/mnt/d/path/tabular-dl-tabr/lib/deep.py", line 447, in train_step
    loss = step_fn(batch)
  File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 490, in <lambda>
    lambda idx: loss_fn(apply_model('train', idx, True), Y_train[idx]),
  File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 416, in apply_model
    return model(
  File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 305, in forward
    values = context_y_emb + self.T(k[:, None] - context_k)
RuntimeError: The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1
[W 2024-03-27 20:29:46,576] Trial 0 failed with value None.                                             
  0%|                                                                           | 0/100 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/d/path/tabular-dl-tabr/bin/go.py", line 52, in <module>
    lib.run_cli(main)
  File "/mnt/d/path/tabular-dl-tabr/lib/util.py", line 534, in run_cli
    return fn(**vars(args))
  File "/mnt/d/path/tabular-dl-tabr/bin/go.py", line 36, in main
    bin.tune.main(tuning_config, tuning_output, continue_=continue_, force=force)
  File "/mnt/d/path/tabular-dl-tabr/bin/tune.py", line 185, in main
    study.optimize(
  File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/study.py", line 425, in optimize
    _optimize(
  File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 66, in _optimize
    _optimize_sequential(
  File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
  File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 251, in _run_trial
    raise func_err
  File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/mnt/d/path/tabular-dl-tabr/bin/tune.py", line 153, in objective
    report = function(raw_config, Path(tmp) / 'output')
  File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 488, in main
    loss, new_chunk_size = lib.train_step(
  File "/mnt/d/path/tabular-dl-tabr/lib/deep.py", line 447, in train_step
    loss = step_fn(batch)
  File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 490, in <lambda>
    lambda idx: loss_fn(apply_model('train', idx, True), Y_train[idx]),
  File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 416, in apply_model
    return model(
  File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 305, in forward
    values = context_y_emb + self.T(k[:, None] - context_k)
RuntimeError: The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1

Here is the error traceback. Thank you for your patient help.

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

[>>>] /tmp/tmpgkf3nayj_trial_0/output | 2024-03-27 20:29:44.940583
Creating the output
--------------------------------------------------------------------------------
{'seed': 0,
 'data': {'seed': 0,
          'cache': True,
          'path': ':data/mydataset',
          'num_policy': None,
          'cat_policy': 'ordinal',
          'y_policy': None},
 'model': {'num_embeddings': None,
           'd_main': 254,
           'context_dropout': 0.4291136198234517,
           'd_multiplier': 2.0,
           'encoder_n_blocks': 0,
           'predictor_n_blocks': 1,
           'mixer_normalization': 'auto',
           'dropout0': 0.3616580256429863,
           'dropout1': 0.0,
           'normalization': 'LayerNorm',
           'activation': 'ReLU'},
 'context_size': 96,
 'optimizer': {'type': 'AdamW', 'lr': 0.000202727317015276, 'weight_decay': 7.501954443620125e-06},
 'batch_size': 512,
 'patience': 16,
 'n_epochs': inf}
--------------------------------------------------------------------------------
Using cached dataset: build_dataset__mydataset__None__ordinal__None__None__0__cdbe2990a2672ace77e70398b7bcc78d.pickle
n_parameters = 589285

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

I should admit that I don't have any immediate ideas for why this is happening, it looks like debugging will require having closer look at shapes of the problematic tensors. Also, for the time of debugging, I recommend commenting the line cache = true in the config section related to the dataset.

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

Commenting the line cache = true would cause TypeError: build_dataset() missing 1 required keyword-only argument: 'cache'.
I set cache = false, causing the same error RuntimeError: The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1.
Thank you for your help.

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

Fixed by the following changes:
file bin/tabr.py in line 305 to

values = context_y_emb.squeeze(2) + self.T(k[:, None] - context_k)

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

@xingyunjohn1 I have one hypothesis: perhaps, you stored the files Y_train.npy, Y_val.npy and Y_test.npy as two-dimensional arrays? This repository expects that these arrays with labels are one-dimensional.

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

@xingyunjohn1 I have one hypothesis: perhaps, you stored the files Y_train.npy, Y_val.npy and Y_test.npy as two-dimensional arrays? This repository expects that these arrays with labels are one-dimensional.

Thank you, I found this problem just a few hours ago and fixed it. Now I am trying to solve another problem.

Fixed by the following changes: file bin/tabr.py in line 305 to
values = context_y_emb.squeeze(2) + self.T(k[:, None] - context_k)

It seems line 124 in file bin/tabr.py have no effect. So I add .squeeze(2) here. Is this expected? @Yura52

With this change, I can now run your model. But the result was quite bad. I got precision about 30% on tabr while 80+% on XGBoost. I am considering whether the change in line 305 has affect this.

To convince this, I have tried to reproduce the result of otto dataset with the changed model, it works just fine.

I am thinking whether I should use otto dataset's .toml file as a model for my dataset that is multi-class and with a extra X_cat_*.npy compared with otto dataset ( with this extra item, I have to change the .toml file to cat_policy = "ordinal" to avoid error.).

Thank you again.

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

Here is how it runs look like:

[...] /tmp/tmph377hm0y_trial_1/output | 0:00:01.296021
Epoch 7: 100%|█████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 90.93it/s]
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
(val) 0.368 (test) 0.365 (loss) 1.43124

Is this warning a problem?

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

I am thinking whether I should use otto dataset's .toml file as a model for my dataset that is multi-class and with a extra X_cat_*.npy compared with otto dataset ( with this extra item, I have to change the .toml file to cat_policy = "ordinal" to avoid error.).

Now I change my strategy that make the X_cat_*.npy, in fact, Gender, into GENDER_Male and GENDER_Female whose value is either 0 or 1 as a part of X_num_*.npy. Now I use the same .toml file as otto dataset except the path of dataset is different. But the result is still about 30% - 40%, with no improvement.

from tabular-dl-tabr.

xingyunjohn1 commented on June 10, 2024

Finally I found the solution. I post here for those who experienced this issue as me.
For my dataset, 0-tuning gets about 40%, 2-lr-tuning gets about 60% and 2-plr-lite-tuning gets about 90% precision.
May be there was something about embeddings affect the results and I should review the paper for futher researches.
At last, thank you @Yura52 , you help me a lot. And I have to say tabr is a brilliant model.

from tabular-dl-tabr.

Yura52 commented on June 10, 2024

Hi @xingyunjohn1 , it seems that I missed the reply. Thank you for the kind words! And yes, indeed, embeddings for numerical features can have a big impact on the performance. Glad that you resolved the issue!

from tabular-dl-tabr.

inference about tabular-dl-tabr HOT 26 CLOSED

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent