Giter VIP home page Giter VIP logo

keras-data-generator's People

Contributors

afshinea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-data-generator's Issues

Support for real time Data Augmentation

Hi Shervine, great repo. I want to perform real time data augmentation with the data generator. I have listed a possible solution for that : -
we can useImageDatagenerator(...).flow_from_dataframe(โ€ฆ)to generate batch of data in __getitem__(self,idx) method. Actually I want data to be read from a list of file paths with real time data augmentation.

According to you is the approach correct?

about keras.utils.to_categorical(y, num_classes=self.n_classes)

I have n_classes = 1251, but it come up with error:


RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/utils/data_utils.py", line 401, in get_index
return _SHARED_SEQUENCES[uid][i]
File "", line 28, in getitem
X, y = self.__data_generation(list_IDs_temp)
File "", line 55, in __data_generation
return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
File "/data/hktxt/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/utils/np_utils.py", line 34, in to_categorical
categorical[np.arange(n), y] = 1
IndexError: index 1251 is out of bounds for axis 1 with size 1251

What are .npy files in Image recognition model

Hi, Thanks for such a nice post. I'm a new learner and have few doubts here. I want to train my model from a big dataset of images those are classified in folder names like red_floral_skirts/1.jpeg,2.jpeg... n.jpeg and black_striped_shirts/1.jpeg,2.jpeg... n.jpeg. Now I want to predict from my model like black: 98%, striped:97%, shirt:98% etc. (My model should tell that this is a black, floral and shirt) now my training data should be the collection of images and labels should be like ['black','striped','shirt'] for all images under folder black_striped_shirts. I really need your help here to know that how can I fit my requirement in this DataGenerator. Do I need to replace partition dictionary from id_1 to actual images and label dictionary with folder names with splitting into 3? and in below code snippet what is data/ID_1.npy file? from where .npy file comes in my case?

for i, ID in enumerate(list_IDs_temp):

Store sample

X[i,] = np.load('data/' + ID + '.npy')

Thanks a lot for your help,
Jitender

Please add a license to this repo

First, thank you for writing the blog post and sharing this sample code with us!

Could you please add an explicit LICENSE file to the repo so that it's clear
under what terms the content is provided, and under what terms user
contributions are licensed?

Per GitHub docs on licensing:

[...] without a license, the default copyright laws apply, meaning that you
retain all rights to your source code and no one may reproduce, distribute,
or create derivative works from your work. If you're creating an open source
project, we strongly encourage you to include an open source license.

Thanks!

A problem when the sample_size is not divisible by the batch_size

I adapted DataGenerator to my Deep Learning pipeline.
When the sample size is not divisible by the batch_size, the DataGenerator seems to return to the first batch without taking into account the last (smaller) batch.

Example
Let A be an array of train samples, and batch_size = 4.
A = [4,7,8,7,9,78,8,4,78,51,6,5,1,0]. Here A.size = 14
It is clear, in this situation, that A.size is not divisible by batch_size.

The batches the DataGenerator yields during the training process are the following :

  • Batch_0 = [4,7,8,7],
  • Batch_1 = [9,78,8,4]
  • Batch_2 = [78,51,6,5]
  • Batch_3 = [4,7,8,7] This is where the problem lies. Instead of having Batch_3 = [1,0]. It goes back to the first batch

Here is a situation where an other generator behaves well when the sample_size is not divisible by the batch_size https://stackoverflow.com/questions/54159034/what-if-the-sample-size-is-not-divisible-by-batch-size-in-keras-model

For your information, I kept as is the following instruction
int(np.floor(len(self.list_IDs) / self.batch_size))
If I change np.floor to np.ceil, it seems to bug during the training/validation phases.

the __getitem__ code delivers (batch_size + 1) results, not (batch_size)

the line in getitem in the generator

indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]

returns batch_size +1 results. E.g. if batch_size is 10, when index is 0 it returns indices 0 through 10, which is 11 items. If batch_size is 1, it returns item 0 and 1 etc. Therefore it will overflow when batch_size is an even multiple of the row count.

The correct code is:

 indexes = self.indexes[index*self.batch_size:((index+1)*self.batch_size)-1]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.