Comments (4)
I am also facing the same issue. I have trained and sampled the model with my own data. But in the output all the generated data columns are shuffled and the column names are also not there. So how will I detect those column names from such outputs? @rotot0 or @erjanmx please help asap.
from tab-ddpm.
Hi,
Sorry for the late answer. Please, see this answer #3 (comment)
In short, there is no way to reconstruct column names back. All I can say is that it is very likely in order of columns from original .csv file. So, if in .csv file you have [num1, num2, cat1, num3, cat2]
, then X_num=[num1, num2, num3]
, X_cat=[cat1, cat2]
. The easiest way is to do data partitioning yourself.
from tab-ddpm.
@rotot0 I have already partitioned the whole csv data into X_num and X_cat myself. but after generation even the generated cat dataframe and num dataframe columns are suffled.. Please have a look to my above screenshot. They are only the original category dataframe and generated category dataframe. And in the generated categorical data all the columns have got suffled. Please fix this issue. otherwise the library is of no use.
from tab-ddpm.
@rotot0 I have already partitioned the whole csv data into X_num and X_cat myself. but after generation even the generated cat dataframe and num dataframe columns are suffled.. Please have a look to my above screenshot. They are only the original category dataframe and generated category dataframe. And in the generated categorical data all the columns have got suffled. Please fix this issue. otherwise the library is of no use.
@shamikdhar Sorry, but I cannot reproduce your problem in my experiments. The original and generated columns are aligned. It may be a bug on your side. Or provide additional code/info on you problem, please. Also, you might want to open another issue.
from tab-ddpm.
Related Issues (20)
- Software license HOT 1
- How can I save the generated data and how does the generated data be used?
- Can't get the generated categorical values according to the README. HOT 2
- Cuda issue : RuntimeError: CUDA error: invalid device ordinal HOT 1
- Errors from running the code in terminal
- Error while running pipeline HOT 1
- Trouble training/sampling on data with high-cardinality categorical features HOT 2
- Dataset link not valid HOT 1
- How to reproduce the results of Figure 2 from the paper?
- Question about tune evaluation models HOT 1
- Update dependencies HOT 1
- Evalution when label imbalance HOT 1
- Default hyperparameters for TabDDPM HOT 1
- Error when run scripts.pipeline.py
- xyz_cv.json HOT 1
- Detailed Description of using TABDDPM method to generate synthetic data HOT 1
- Dataset inquiry HOT 8
- How to add a new dataset?
- How to tune SMOTE and CatBoost at the same time?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tab-ddpm.