lmassaron / deep_learning_for_tabular_data Goto Github PK

A presention of core concepts and a data generator making easier using tabular data with TensorFlow and Keras

Jupyter Notebook 96.03% Python 3.97%

deep_learning_for_tabular_data's Introduction

Deep learning for tabular data

Deep Learning can be used also for predictions based on tabular data, the data you most commonly find in databases and in tables. During the presentation session of this workshop it is discussed about how such an approach works and how it is competitive in respect of more popular machine learning algorithms such as gradient boosting. The workshop itself demonstrates how to achieve good results using TensorFlow, it high level API, Keras, integrated with more classical approaches based on Scikit-learn and Pandas.

Workshop code on Colab:

Follow the tutorial on Youtube (GDG Venezia 2019)

https://www.youtube.com/watch?v=nQgUt_uADSE&t=1533s

deep_learning_for_tabular_data's People

Contributors

Stargazers

Watchers

Forkers

ryurikritz unoqualsiasi marcoscattolin manikant92 chenwuperth rtk42 spytensor ibabbar kalaikumarr runxingzhong areyesan

deep_learning_for_tabular_data's Issues

error "Passing list-likes to .loc or [] with any missing labels is no longer supported."

I used my own data to run your code. My model is regression. I followed your code and it is okay for catboost, but for deeplearning part, I got the following error messages:

KeyError Traceback (most recent call last)
in
52 shuffle=True)
53
---> 54 history = model.fit(train_batch,
55 # validation_data=(tb.transform(X.iloc[test_idx]), y[test_idx]),
56 validation_data=test_batch,

~/.virtualenvs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1048 training_utils.RespectCompiledTrainableState(self):
1049 # Creates a tf.data.Dataset and handles batch and epoch iteration.
-> 1050 data_handler = data_adapter.DataHandler(
1051 x=x,
1052 y=y,

~/.virtualenvs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py in init(self, x, y, sample_weight, batch_size, steps_per_epoch, initial_epoch, epochs, shuffle, class_weight, max_queue_size, workers, use_multiprocessing, model, steps_per_execution)
1098
1099 adapter_cls = select_data_adapter(x, y)
-> 1100 self._adapter = adapter_cls(
1101 x,
1102 y,

~/.virtualenvs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py in init(self, x, y, sample_weights, shuffle, workers, use_multiprocessing, max_queue_size, model, **kwargs)
900 self._keras_sequence = x
901 self._enqueuer = None
--> 902 super(KerasSequenceAdapter, self).init(
903 x,
904 shuffle=False, # Shuffle is handed in the _make_callable override.

~/.virtualenvs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py in init(self, x, y, sample_weights, workers, use_multiprocessing, max_queue_size, model, **kwargs)
777 # Since we have to know the dtype of the python generator when we build the
778 # dataset, we have to look at a batch to infer the structure.
--> 779 peek, x = self._peek_and_restore(x)
780 peek = self._standardize_batch(peek)
781 peek = _process_tensorlike(peek)

~/.virtualenvs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py in _peek_and_restore(x)
911 @staticmethod
912 def _peek_and_restore(x):
--> 913 return x[0], x
914
915 def _handle_multiprocessing(self, x, workers, use_multiprocessing,

~/projects/ifp85/tabular.py in getitem(self, index)
348 def getitem(self, index):
349 indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
--> 350 samples, labels = self.__data_generation(indexes)
351 return samples, labels
352

~/projects/ifp85/tabular.py in __data_generation(self, selection)
342 return dct, self.y[selection]
343 else:
--> 344 return self.tbt.transform(self.X.iloc[selection, :]), self.y[selection]
345 else:
346 return self.X.iloc[selection, :], self.y[selection]

~/.virtualenvs/tf24/lib/python3.8/site-packages/pandas/core/series.py in getitem(self, key)
904 return self._get_values(key)
905
--> 906 return self._get_with(key)
907
908 def _get_with(self, key):

~/.virtualenvs/tf24/lib/python3.8/site-packages/pandas/core/series.py in _get_with(self, key)
939 # (i.e. self.iloc) or label-based (i.e. self.loc)
940 if not self.index._should_fallback_to_positional():
--> 941 return self.loc[key]
942 else:
943 return self.iloc[key]

~/.virtualenvs/tf24/lib/python3.8/site-packages/pandas/core/indexing.py in getitem(self, key)
877
878 maybe_callable = com.apply_if_callable(key, self.obj)
--> 879 return self._getitem_axis(maybe_callable, axis=axis)
880
881 def _is_scalar_access(self, key: Tuple):

~/.virtualenvs/tf24/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1097 raise ValueError("Cannot index with multidimensional key")
1098
-> 1099 return self._getitem_iterable(key, axis=axis)
1100
1101 # nested tuple slicing

~/.virtualenvs/tf24/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
1035
1036 # A collection of keys
-> 1037 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
1038 return self.obj._reindex_with_indexers(
1039 {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~/.virtualenvs/tf24/lib/python3.8/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1252 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1253
-> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
1255 return keyarr, indexer
1256

~/.virtualenvs/tf24/lib/python3.8/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1313
1314 with option_context("display.max_seq_items", 10, "display.width", 80):
-> 1315 raise KeyError(
1316 "Passing list-likes to .loc or [] with any missing labels "
1317 "is no longer supported. "

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([ 963, 26089, 37285, 32796, 21419,\n ...\n 7514, 35430, 5619, 9022, 40319],\n dtype='int64', length=253). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"

I couldn't know how to solve this.
By the way, I don't fully understand the meaning of variables sizes and categorical_levels

tb = TabularTransformer(numeric = numeric_variables,
ordinal = [],
lowcat = [],
highcat = categorical_variables)

tb.fit(X.iloc[train_idx])
sizes = tb.shape(X.iloc[train_idx])
categorical_levels = dict(zip(categorical_variables, sizes[1:]))
print(f"Input array sizes: {sizes}")
print(f"Categorical levels: {categorical_levels}\n")

Thank you very much!

Feature importance

How would you go about finding the feature importance for the DNN model?

High GPU Memory-Usage but zero volatile gpu-util

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 27737 C ...rtualenvs/tf24/bin/python 23079MiB |
+-----------------------------------------------------------------------------+

I checked that my GPU is available. I think my GPU always waits for CPU to process data. Do you know how to improve the utility of GPU? I tried your example, most of the time, GPU-Util was 0%, sometimes, it showed 20%.

errors when training deep learning model ('list' object has no attribute 'keys')

AttributeError Traceback (most recent call last)
in
43 shuffle=True)
44
---> 45 history = model.fit_generator(train_batch,
46 validation_data=(tb.transform(X.iloc[test_idx]), y[test_idx]),
47 epochs=30,

~/.virtualenvs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1845 'will be removed in a future version. '
1846 'Please use Model.fit, which supports generators.')
-> 1847 return self.fit(
1848 generator,
1849 steps_per_epoch=steps_per_epoch,

~/.virtualenvs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py in init(self, x, y, sample_weight, batch_size, steps_per_epoch, initial_epoch, epochs, shuffle, class_weight, max_queue_size, workers, use_multiprocessing, model, steps_per_execution)
1115 dataset = self._adapter.get_dataset()
1116 if class_weight:
-> 1117 dataset = dataset.map(_make_class_weight_map_fn(class_weight))
1118 self._inferred_steps = self._infer_steps(steps_per_epoch, dataset)
1119

~/.virtualenvs/tf24/lib/python3.8/site-packages/tensorflow/python/keras/engine/data_adapter.py in _make_class_weight_map_fn(class_weight)
1276 weighting.
1277 """
-> 1278 class_ids = list(sorted(class_weight.keys()))
1279 expected_class_ids = list(range(len(class_ids)))
1280 if class_ids != expected_class_ids:

AttributeError: 'list' object has no attribute 'keys'

GPU utilization mostly 0% during training

I think my GPU always waits for CPU to process data. Do you know how to improve the utility of GPU?

lmassaron / deep_learning_for_tabular_data Goto Github PK

deep_learning_for_tabular_data's Introduction

Deep learning for tabular data

Follow the tutorial on Youtube (GDG Venezia 2019)

deep_learning_for_tabular_data's People

Contributors

Stargazers

Watchers

Forkers

deep_learning_for_tabular_data's Issues

error "Passing list-likes to .loc or [] with any missing labels is no longer supported."

Feature importance

High GPU Memory-Usage but zero volatile gpu-util

errors when training deep learning model ('list' object has no attribute 'keys')

GPU utilization mostly 0% during training

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent