afshinea / keras-data-generator Goto Github PK

View Code? Open in Web Editor NEW

283.0 15.0 151.0 23 KB

Template for data generator in Keras

Home Page: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

Python 100.00%

keras data-generator

keras-data-generator's People

Contributors

Stargazers

Watchers

Forkers

narayanmahto sduxzh shivasj bfonta erikqu wentongli libardo1 pcjimmmy m42cococa deeplearningpk aymenkhelifi chuk-yong remotejob a-bayat abiraja2004 hs-heddy huqingyuan satyam-cyc andreas-koukorinis batermj lircsszz mojokb manuelschmidt stevefoy xiongjunhan unclelld wangwg2 mimichiu amicie malofficial sashiandra steermomo ethannyding thekaggler firojalam dennistang742 peternordstrom turgunyusuf stanlee321 hyekang fudp mbyase songcong123 hhy5277 big-data-ai thanhkaist yuanjie-ai walobit jiveshs98 luvi01 jsmets lizhaodong cruiserx swfi kaleapoholi dotran vedraiyani mindis cj401 pgsrv subbaraomanchala orlandolanaaa curiousninen sharadgupta27 pradhy729 melodylail habibmrad sidf3ar ankemik callmetoy ayyucekizrak phymucs ltwyer lpcteste prasad21012018 nanabaah eirikboy rahulindoria5 anupam2372 xliucs kwanster abouttaib sifat62 fanszoro ivan-pua endlessio aly-shmahell chandrakanth-gudavalli drussellmrichie mrsujeet swhe ykjin manojkesani drmichaelwang vishnujasti hieuqtran karthiksurya dangitr shalinijadon marzieghorbani

keras-data-generator's Issues

Support for real time Data Augmentation

Hi Shervine, great repo. I want to perform real time data augmentation with the data generator. I have listed a possible solution for that : -
we can useImageDatagenerator(...).flow_from_dataframe(…)to generate batch of data in __getitem__(self,idx) method. Actually I want data to be read from a list of file paths with real time data augmentation.

According to you is the approach correct?

about keras.utils.to_categorical(y, num_classes=self.n_classes)

I have n_classes = 1251, but it come up with error:

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/utils/data_utils.py", line 401, in get_index
return _SHARED_SEQUENCES[uid][i]
File "", line 28, in getitem
X, y = self.__data_generation(list_IDs_temp)
File "", line 55, in __data_generation
return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/utils/np_utils.py", line 34, in to_categorical
categorical[np.arange(n), y] = 1
IndexError: index 1251 is out of bounds for axis 1 with size 1251

What are .npy files in Image recognition model

Hi, Thanks for such a nice post. I'm a new learner and have few doubts here. I want to train my model from a big dataset of images those are classified in folder names like red_floral_skirts/1.jpeg,2.jpeg... n.jpeg and black_striped_shirts/1.jpeg,2.jpeg... n.jpeg. Now I want to predict from my model like black: 98%, striped:97%, shirt:98% etc. (My model should tell that this is a black, floral and shirt) now my training data should be the collection of images and labels should be like ['black','striped','shirt'] for all images under folder black_striped_shirts. I really need your help here to know that how can I fit my requirement in this DataGenerator. Do I need to replace partition dictionary from id_1 to actual images and label dictionary with folder names with splitting into 3? and in below code snippet what is data/ID_1.npy file? from where .npy file comes in my case?

for i, ID in enumerate(list_IDs_temp):

Store sample

X[i,] = np.load('data/' + ID + '.npy')

Thanks a lot for your help,
Jitender

Please add a license to this repo

First, thank you for writing the blog post and sharing this sample code with us!

Could you please add an explicit LICENSE file to the repo so that it's clear
under what terms the content is provided, and under what terms user
contributions are licensed?

Per GitHub docs on licensing:

[...] without a license, the default copyright laws apply, meaning that you
retain all rights to your source code and no one may reproduce, distribute,
or create derivative works from your work. If you're creating an open source
project, we strongly encourage you to include an open source license.

Thanks!

A problem when the sample_size is not divisible by the batch_size

I adapted DataGenerator to my Deep Learning pipeline.
When the sample size is not divisible by the batch_size, the DataGenerator seems to return to the first batch without taking into account the last (smaller) batch.

Example
Let A be an array of train samples, and batch_size = 4.
A = [4,7,8,7,9,78,8,4,78,51,6,5,1,0]. Here A.size = 14
It is clear, in this situation, that A.size is not divisible by batch_size.

The batches the DataGenerator yields during the training process are the following :

Batch_0 = [4,7,8,7],
Batch_1 = [9,78,8,4]
Batch_2 = [78,51,6,5]
Batch_3 = [4,7,8,7] This is where the problem lies. Instead of having Batch_3 = [1,0]. It goes back to the first batch

Here is a situation where an other generator behaves well when the sample_size is not divisible by the batch_size https://stackoverflow.com/questions/54159034/what-if-the-sample-size-is-not-divisible-by-batch-size-in-keras-model

For your information, I kept as is the following instruction
int(np.floor(len(self.list_IDs) / self.batch_size))
If I change np.floor to np.ceil, it seems to bug during the training/validation phases.

the getitem code delivers (batch_size + 1) results, not (batch_size)

the line in getitem in the generator

indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]

returns batch_size +1 results. E.g. if batch_size is 10, when index is 0 it returns indices 0 through 10, which is 11 items. If batch_size is 1, it returns item 0 and 1 etc. Therefore it will overflow when batch_size is an even multiple of the row count.

The correct code is:

 indexes = self.indexes[index*self.batch_size:((index+1)*self.batch_size)-1]

A question on the internal principle of DataGenerator

Your code and blog are wonderful! Now I still have one question. In def __getitem__(self, index), the index argument comes from where? Could your explain it further? Many thanks!

afshinea / keras-data-generator Goto Github PK

keras-data-generator's People

Contributors

Stargazers

Watchers

Forkers

keras-data-generator's Issues

Support for real time Data Augmentation

about keras.utils.to_categorical(y, num_classes=self.n_classes)

What are .npy files in Image recognition model

Store sample

Please add a license to this repo

A problem when the sample_size is not divisible by the batch_size

the getitem code delivers (batch_size + 1) results, not (batch_size)

A question on the internal principle of DataGenerator

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent