Comments (26)
Hi @Bezdarnost , I am sorry for losing track of this issue. Is it resolved?
Yes, everything alright. Thank you!
from tabular-dl-tabr.
Hello. During inference, y=None
should be passed to the model: link. And we do exactly that during evaluation: link Does this help?
from tabular-dl-tabr.
I don't understand. How to make inference after training
from tabular-dl-tabr.
I don't see the instruction of inference in your README.md
Only train example
from tabular-dl-tabr.
In the beginning it required me to X_train, X_val, X_test, Y_train, Y_val, Y_test. I tried putting my original X_test and keeping Y_test out, but the training refused to start. So I did the X_test and y_test from the validation(50/50). But after the training was over, I couldn't figure out how to re-run the trained model for prediction on the real test
from tabular-dl-tabr.
I tried putting my original X_test and keeping Y_test out, but the training refused to start.
To solve this, generate a random Y_test.npy
and put it in the dataset directory. After the training, you can get the predictions for the real X_test.npy
from predictions.npz
.
I couldn't figure out how to re-run the trained model for prediction on the real test
The repository is more suitable for reproducing the results reported in the paper and conducting further similar research. If your use case is different (e.g. making predictions with a trained model), then you may need to do some extra work and adjust the codebase accordingly. In your case, you can try copyng and adjusting bin/tabr.py
for inference so that it only loads a checkpoint and makes predictions.
from tabular-dl-tabr.
To solve this, generate a random Y_test.npy and put it in the dataset directory. After the training, you can get the predictions for the real X_test.npy from predictions.npz.
Wow, that sounds very clever, almost genius, and how did I never figure it out. Will try it tonight and report back if it works well
from tabular-dl-tabr.
To solve this, generate a random Y_test.npy and put it in the dataset directory. After the training, you can get the predictions for the real X_test.npy from predictions.npz.
Hi, it's not working badly, but it seems that with the fake test the training process is going worse than it could.
The repository is more suitable for reproducing the results reported in the paper and conducting further similar research. If your use case is different (e.g. making predictions with a trained model), then you may need to do some extra work and adjust the codebase accordingly. In your case, you can try copyng and adjusting
bin/tabr.py
for inference so that it only loads a checkpoint and makes predictions.
Can you suggest what needs to be changed to make predictions after training?
from tabular-dl-tabr.
Hi @Bezdarnost , I am sorry for losing track of this issue. Is it resolved?
from tabular-dl-tabr.
Hello, I trained your model on my dataset, thank you a lot for this brilliant work. But I don't understand, how to make prediction on my X_test without y_test.(I put 50% of validation instead of real X_test)
Hello, I am trying to train this model on my dataset too. But I am confused by the .toml
file. Could you plese help me understand how to write the file or let me know where to learn this? Thank you.
from tabular-dl-tabr.
Could you plese help me understand how to write the file or let me know where to learn this?
Hello, TOML is a configuration language: https://toml.io/en
Also, when running the quick test, you can experiment with the exp/debug/0.toml
file to become more familiar with TOML.
Numerous other TOML files that we used to run our experiments are scattered across the exp/
folder.
Does this help?
from tabular-dl-tabr.
Could you plese help me understand how to write the file or let me know where to learn this?
Hello, TOML is a configuration language: https://toml.io/en
Also, when running the quick test, you can experiment with the
exp/debug/0.toml
file to become more familiar with TOML.Numerous other TOML files that we used to run our experiments are scattered across the
exp/
folder.Does this help?
Yes, Thanks a lot.
But exp/debug/0.toml
file is for MLP. I am trying to use tabr.
In fact, I have tried to copy exp/tabr/otto/0-tuning.toml
for running my dataset for a multi-class job. ( Is this a good idea?)
What I have done is changing path = ":data/otto"
to the path of my dataset and cat_policy = "__null__"
to cat_policy = "ordinal"
.
But I got the following error:
[W 2024-03-27 18:01:32,928] Trial 0 failed with parameters: {'model.d_main': 254, 'model.context_dropout': 0.4291136198234517, 'model.dropout0': 0.3616580256429863, 'optimizer.lr': 0.000202727317015276, '?optimizer.weight_decay': True, 'optimizer.weight_decay': 7.501954443620125e-06} because of the following error: RuntimeError('The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1').
It seems change the context_size = 96
would affect the error. But when I change it to 512, I got a=512 but b=256 now.
[W 2024-03-27 18:08:01,286] Trial 0 failed with parameters: {'model.d_main': 254, 'model.context_dropout': 0.4291136198234517, 'model.dropout0': 0.3616580256429863, 'optimizer.lr': 0.000202727317015276, '?optimizer.weight_decay': True, 'optimizer.weight_decay': 7.501954443620125e-06} because of the following error: RuntimeError('The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 1').
What I mean is that I don't know the meaning of some CI as there is no annotation.
I would appriciate it if you could help me. Thank you.
from tabular-dl-tabr.
Is this a good idea?
Yes, it should work just fine.
Before further investigation, can you confirm that you can successfully run the experiment with the original config on the otto dataset (not on your custom dataset)?
from tabular-dl-tabr.
Is this a good idea?
Yes, it should work just fine.
Before further investigation, can you confirm that you can successfully run the experiment with the original config on the otto dataset (not on your custom dataset)?
Yes, it works just fine.
Here is the end of its output.
[...] exp/tabr/otto/0-reproduce-evaluation/14 | 0:01:39
Epoch 91: 100%|█████████████████████████████████████████████████████████| 78/78 [00:00<00:00, 85.37it/s]
(val) 0.823 (test) 0.821 (loss) 0.34059
eval_batch_size = 16384
--------------------------------------------------------------------------------
{'function': 'bin.tabr.main',
'gpus': ['NVIDIA GeForce RTX 4090'],
'n_parameters': 734616,
'best_epoch': 74,
'scores': {'train': 0.8865937728845231,
'val': 0.8237551762448238,
'test': 0.8242566257272139},
'time': '0:01:40'}
--------------------------------------------------------------------------------
[<<<] exp/tabr/otto/0-reproduce-evaluation/14 | 2024-03-27 20:25:43.457278
--------------------------------------------------------------------------------
{'function': 'bin.ensemble.main',
'gpus': [],
'scores': {'train': 0.8684376657155123,
'val': 0.8258761741238259,
'test': 0.8236102133160956}}
--------------------------------------------------------------------------------
[<<<] exp/tabr/otto/0-reproduce-ensemble-5/0 | 2024-03-27 20:25:43.771569
--------------------------------------------------------------------------------
{'function': 'bin.ensemble.main',
'gpus': [],
'scores': {'train': 0.8689427034670841,
'val': 0.8256741743258257,
'test': 0.8240142210730446}}
--------------------------------------------------------------------------------
[<<<] exp/tabr/otto/0-reproduce-ensemble-5/1 | 2024-03-27 20:25:44.036619
--------------------------------------------------------------------------------
{'function': 'bin.ensemble.main',
'gpus': [],
'scores': {'train': 0.8700285346329638,
'val': 0.8270881729118271,
'test': 0.8240950226244343}}
--------------------------------------------------------------------------------
[<<<] exp/tabr/otto/0-reproduce-ensemble-5/2 | 2024-03-27 20:25:44.305022
My dataset was made accroding to the README.md
file.
I think at least the shape and datatype of my dataset are correct.
And here is my info.json.
{
"name": "mydataset",
"id": "mydataset--default",
"task_type": "multiclass",
"n_num_features": 11,
"n_bin_features": 0,
"n_cat_features": 1,
"test_size": 1904,
"train_size": 5713,
"val_size": 1905,
"n_classes": 5
}
Thanks a lot.
from tabular-dl-tabr.
I see, thank you for the detailed reply. Can you provide the complete error traceback that you get when running the experiment on your dataset?
from tabular-dl-tabr.
I see, thank you for the detailed reply. Can you provide the complete error traceback that you get when running the experiment on your dataset?
[W 2024-03-27 20:29:46,573] Trial 0 failed with parameters: {'model.d_main': 254, 'model.context_dropout': 0.4291136198234517, 'model.dropout0': 0.3616580256429863, 'optimizer.lr': 0.000202727317015276, '?optimizer.weight_decay': True, 'optimizer.weight_decay': 7.501954443620125e-06} because of the following error: RuntimeError('The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1').
Traceback (most recent call last):
File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
value_or_values = func(trial)
File "/mnt/d/path/tabular-dl-tabr/bin/tune.py", line 153, in objective
report = function(raw_config, Path(tmp) / 'output')
File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 488, in main
loss, new_chunk_size = lib.train_step(
File "/mnt/d/path/tabular-dl-tabr/lib/deep.py", line 447, in train_step
loss = step_fn(batch)
File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 490, in <lambda>
lambda idx: loss_fn(apply_model('train', idx, True), Y_train[idx]),
File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 416, in apply_model
return model(
File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 305, in forward
values = context_y_emb + self.T(k[:, None] - context_k)
RuntimeError: The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1
[W 2024-03-27 20:29:46,576] Trial 0 failed with value None.
0%| | 0/100 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/mnt/d/path/tabular-dl-tabr/bin/go.py", line 52, in <module>
lib.run_cli(main)
File "/mnt/d/path/tabular-dl-tabr/lib/util.py", line 534, in run_cli
return fn(**vars(args))
File "/mnt/d/path/tabular-dl-tabr/bin/go.py", line 36, in main
bin.tune.main(tuning_config, tuning_output, continue_=continue_, force=force)
File "/mnt/d/path/tabular-dl-tabr/bin/tune.py", line 185, in main
study.optimize(
File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/study.py", line 425, in optimize
_optimize(
File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 66, in _optimize
_optimize_sequential(
File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
frozen_trial = _run_trial(study, func, catch)
File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 251, in _run_trial
raise func_err
File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
value_or_values = func(trial)
File "/mnt/d/path/tabular-dl-tabr/bin/tune.py", line 153, in objective
report = function(raw_config, Path(tmp) / 'output')
File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 488, in main
loss, new_chunk_size = lib.train_step(
File "/mnt/d/path/tabular-dl-tabr/lib/deep.py", line 447, in train_step
loss = step_fn(batch)
File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 490, in <lambda>
lambda idx: loss_fn(apply_model('train', idx, True), Y_train[idx]),
File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 416, in apply_model
return model(
File "/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/d/path/tabular-dl-tabr/bin/tabr.py", line 305, in forward
values = context_y_emb + self.T(k[:, None] - context_k)
RuntimeError: The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1
Here is the error traceback. Thank you for your patient help.
from tabular-dl-tabr.
[>>>] /tmp/tmpgkf3nayj_trial_0/output | 2024-03-27 20:29:44.940583
Creating the output
--------------------------------------------------------------------------------
{'seed': 0,
'data': {'seed': 0,
'cache': True,
'path': ':data/mydataset',
'num_policy': None,
'cat_policy': 'ordinal',
'y_policy': None},
'model': {'num_embeddings': None,
'd_main': 254,
'context_dropout': 0.4291136198234517,
'd_multiplier': 2.0,
'encoder_n_blocks': 0,
'predictor_n_blocks': 1,
'mixer_normalization': 'auto',
'dropout0': 0.3616580256429863,
'dropout1': 0.0,
'normalization': 'LayerNorm',
'activation': 'ReLU'},
'context_size': 96,
'optimizer': {'type': 'AdamW', 'lr': 0.000202727317015276, 'weight_decay': 7.501954443620125e-06},
'batch_size': 512,
'patience': 16,
'n_epochs': inf}
--------------------------------------------------------------------------------
Using cached dataset: build_dataset__mydataset__None__ordinal__None__None__0__cdbe2990a2672ace77e70398b7bcc78d.pickle
n_parameters = 589285
from tabular-dl-tabr.
I should admit that I don't have any immediate ideas for why this is happening, it looks like debugging will require having closer look at shapes of the problematic tensors. Also, for the time of debugging, I recommend commenting the line cache = true
in the config section related to the dataset.
from tabular-dl-tabr.
Commenting the line cache = true
would cause TypeError: build_dataset() missing 1 required keyword-only argument: 'cache'
.
I set cache = false
, causing the same error RuntimeError: The size of tensor a (96) must match the size of tensor b (512) at non-singleton dimension 1
.
Thank you for your help.
from tabular-dl-tabr.
Fixed by the following changes:
file bin/tabr.py
in line 305
to
values = context_y_emb.squeeze(2) + self.T(k[:, None] - context_k)
from tabular-dl-tabr.
@xingyunjohn1 I have one hypothesis: perhaps, you stored the files Y_train.npy
, Y_val.npy
and Y_test.npy
as two-dimensional arrays? This repository expects that these arrays with labels are one-dimensional.
from tabular-dl-tabr.
@xingyunjohn1 I have one hypothesis: perhaps, you stored the files
Y_train.npy
,Y_val.npy
andY_test.npy
as two-dimensional arrays? This repository expects that these arrays with labels are one-dimensional.
Thank you, I found this problem just a few hours ago and fixed it. Now I am trying to solve another problem.
Fixed by the following changes: file
bin/tabr.py
inline 305
tovalues = context_y_emb.squeeze(2) + self.T(k[:, None] - context_k)
It seems line 124
in file bin/tabr.py
have no effect. So I add .squeeze(2)
here. Is this expected? @Yura52
With this change, I can now run your model. But the result was quite bad. I got precision about 30% on tabr while 80+% on XGBoost. I am considering whether the change in line 305
has affect this.
To convince this, I have tried to reproduce the result of otto dataset with the changed model, it works just fine.
I am thinking whether I should use otto dataset's .toml
file as a model for my dataset that is multi-class and with a extra X_cat_*.npy
compared with otto dataset ( with this extra item, I have to change the .toml
file to cat_policy = "ordinal"
to avoid error.).
Thank you again.
from tabular-dl-tabr.
Here is how it runs look like:
[...] /tmp/tmph377hm0y_trial_1/output | 0:00:01.296021
Epoch 7: 100%|█████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 90.93it/s]
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/path/miniconda3/envs/tabr/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
(val) 0.368 (test) 0.365 (loss) 1.43124
Is this warning a problem?
from tabular-dl-tabr.
I am thinking whether I should use otto dataset's .toml file as a model for my dataset that is multi-class and with a extra X_cat_*.npy compared with otto dataset ( with this extra item, I have to change the .toml file to cat_policy = "ordinal" to avoid error.).
Now I change my strategy that make the X_cat_*.npy
, in fact, Gender
, into GENDER_Male
and GENDER_Female
whose value is either 0 or 1 as a part of X_num_*.npy
. Now I use the same .toml
file as otto dataset except the path of dataset is different. But the result is still about 30% - 40%, with no improvement.
from tabular-dl-tabr.
Finally I found the solution. I post here for those who experienced this issue as me.
For my dataset, 0-tuning
gets about 40%, 2-lr-tuning
gets about 60% and 2-plr-lite-tuning
gets about 90% precision.
May be there was something about embeddings
affect the results and I should review the paper for futher researches.
At last, thank you @Yura52 , you help me a lot. And I have to say tabr
is a brilliant model.
from tabular-dl-tabr.
Hi @xingyunjohn1 , it seems that I missed the reply. Thank you for the kind words! And yes, indeed, embeddings for numerical features can have a big impact on the performance. Glad that you resolved the issue!
from tabular-dl-tabr.
Related Issues (20)
- Expected 2d tensor for the single feature of such type, got 1d HOT 4
- Diasable time dependet leaks during training HOT 3
- (params_with_wd if needs_wd else params_with_wd)['params'].append(parameter) in deep.py,is there a mistake here? Why if...else... are all connected to params_with_wd? HOT 4
- ? bug: delu.nn.Lambda(lambda x: x.squeeze(-2)) HOT 2
- bug:when eval, AttributeError: module 'torch.cuda' has no attribute 'OutOfMemoryError' HOT 5
- The change of the candidate set during training HOT 4
- When n_classes>1, how 'self.label_encoder‘ do for label(float)? HOT 3
- Bugs in parallel gpus HOT 1
- Could you please share the code to create new dataset directory? HOT 1
- Request help debug: I occur a bug when reproduce the winequality dataset HOT 3
- How to calculate the final result in the results of 100trails HOT 7
- How to understand the relationship between tune.py and evaluate.py? HOT 3
- How to evaluate the performance of MLP on regression-cat-medium-0-OnlineNewsPopularity? HOT 1
- micromamba environment setup issue HOT 3
- Bug in make_parameter_groups HOT 1
- How do I get the ± values in my experimental results? HOT 2
- Can I use this code model as a backbone network for academic paper publishing? HOT 3
- RuntimeError: mat1 and mat2 must have the same dtype HOT 1
- add a new dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabular-dl-tabr.