afshinea / keras-data-generator Goto Github PK
View Code? Open in Web Editor NEWTemplate for data generator in Keras
Home Page: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
Template for data generator in Keras
Home Page: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
Hi Shervine, great repo. I want to perform real time data augmentation with the data generator. I have listed a possible solution for that : -
we can useImageDatagenerator(...).flow_from_dataframe(โฆ)
to generate batch of data in __getitem__(self,idx)
method. Actually I want data to be read from a list of file paths with real time data augmentation.
According to you is the approach correct?
I have n_classes = 1251, but it come up with error:
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/utils/data_utils.py", line 401, in get_index
return _SHARED_SEQUENCES[uid][i]
File "", line 28, in getitem
X, y = self.__data_generation(list_IDs_temp)
File "", line 55, in __data_generation
return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/utils/np_utils.py", line 34, in to_categorical
categorical[np.arange(n), y] = 1
IndexError: index 1251 is out of bounds for axis 1 with size 1251
Hi, Thanks for such a nice post. I'm a new learner and have few doubts here. I want to train my model from a big dataset of images those are classified in folder names like red_floral_skirts/1.jpeg,2.jpeg... n.jpeg and black_striped_shirts/1.jpeg,2.jpeg... n.jpeg. Now I want to predict from my model like black: 98%, striped:97%, shirt:98% etc. (My model should tell that this is a black, floral and shirt) now my training data should be the collection of images and labels should be like ['black','striped','shirt'] for all images under folder black_striped_shirts. I really need your help here to know that how can I fit my requirement in this DataGenerator. Do I need to replace partition dictionary from id_1 to actual images and label dictionary with folder names with splitting into 3? and in below code snippet what is data/ID_1.npy file? from where .npy file comes in my case?
for i, ID in enumerate(list_IDs_temp):
X[i,] = np.load('data/' + ID + '.npy')
Thanks a lot for your help,
Jitender
First, thank you for writing the blog post and sharing this sample code with us!
Could you please add an explicit LICENSE
file to the repo so that it's clear
under what terms the content is provided, and under what terms user
contributions are licensed?
[...] without a license, the default copyright laws apply, meaning that you
retain all rights to your source code and no one may reproduce, distribute,
or create derivative works from your work. If you're creating an open source
project, we strongly encourage you to include an open source license.
Thanks!
I adapted DataGenerator to my Deep Learning pipeline.
When the sample size is not divisible by the batch_size, the DataGenerator seems to return to the first batch without taking into account the last (smaller) batch.
Example
Let A be an array of train samples, and batch_size = 4.
A = [4,7,8,7,9,78,8,4,78,51,6,5,1,0]. Here A.size = 14
It is clear, in this situation, that A.size is not divisible by batch_size.
The batches the DataGenerator yields during the training process are the following :
Here is a situation where an other generator behaves well when the sample_size is not divisible by the batch_size https://stackoverflow.com/questions/54159034/what-if-the-sample-size-is-not-divisible-by-batch-size-in-keras-model
For your information, I kept as is the following instruction
int(np.floor(len(self.list_IDs) / self.batch_size))
If I change np.floor to np.ceil, it seems to bug during the training/validation phases.
the line in getitem in the generator
indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
returns batch_size +1 results. E.g. if batch_size is 10, when index is 0 it returns indices 0 through 10, which is 11 items. If batch_size is 1, it returns item 0 and 1 etc. Therefore it will overflow when batch_size is an even multiple of the row count.
The correct code is:
indexes = self.indexes[index*self.batch_size:((index+1)*self.batch_size)-1]
Your code and blog are wonderful! Now I still have one question. In def __getitem__(self, index)
, the index
argument comes from where? Could your explain it further? Many thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.