Giter VIP home page Giter VIP logo

embeddings2image's Introduction

Embeddings2Image

former -> visualize-tsne

This small project is for creating 2d images out of the embeddings of the images.
It was inspired by Andrej Karpathy's blog post on the visualization of CNNs using t-sne.
(this guy is pretty sharp 😉 - you should definitely follow him! ).

UPDATE #1
At first the package only supported dimension reduction using t-sne but now it also support the great umap.
Check it out https://github.com/lmcinnes/umap

UPDATE #2
I saw that the project is useful to some people so I uploaded it to PyPI for easier integration.

UPDATE #3
Checkout the end2end example added by @nivha

Examples

Image of mnist 2d grid via TSNE         Image of mnist scatter via TSNE         Image of mnist scatter via UMAP
mnist TSNE grid example                            mnist TSNE scatter example                            mnist UMAP scatter example


cifar10 grid example          cifar10 scatter example
cifar10 grid image example                                     cifar10 scatter image example

Installation

  1. via pip
    1. pip install Embeddings2Image
  2. Download / Clone
    1. install - python setup.py install
    2. Or just use it as is
      1. pip install -r requirements.txt
      2. see documentation below

Usage

if installed via PyPI

from e2i import EmbeddingsProjector  
 
image = EmbeddingsProjector()
image.path2data = 'data.hdf5'
image.load_data()
image.calculate_projection()
image.create_image()

important! the module expects an hdf5 file with 2 datasets:

  • urls - datasets which contain the path/url of each image
  • vectors - dataset which contains the corresponding vector for each image.
    make sure that they are both ordered alike
  • checkout this hdf5 example

another option is to load the data and urls explicitly:

  • urls - create a np.asarray out of a url list and load to image.image_list
  • vectors - create a np.ndarray of the vectors and load to image.data_vectors

if cloned - you can use it from the cmd

root@yonti:~/github/Embeddings2|Image$ python cmd.py -h
usage: cmd.py [-h] -d PATH2DATA [-n OUTPUT_NAME] [-t OUTPUT_TYPE]
              [-s OUTPUT_SIZE] [-i EACH_IMG_SIZE] [-c BG_COLOR] [--no-shuffle]
              [--no-sklearn] [--no-svd] [-b BATCH_SIZE]

Creating 2d images out of the embeddings ot the images

optional arguments:
  -h, --help            show this help message and exit
  -d PATH2DATA, --path2data PATH2DATA
                        Path to the hdf5 file   
  -n OUTPUT_NAME, --output_name OUTPUT_NAME
                        output image name. Default is tsne_scatter/grid.jpg
  -t OUTPUT_TYPE, --output_type OUTPUT_TYPE
                        the type of the output images (scatter/grid)
  -s OUTPUT_SIZE, --output_size OUTPUT_SIZE
                        output image size (default=2500)
  -i EACH_IMG_SIZE, --img_size EACH_IMG_SIZE
                        each image size (default=50)
  -c BG_COLOR, --background BG_COLOR
                        choose output background color (black/white)
  --no-shuffle          use this flag if you don't want to shuffle
  --method              chose which method to use for projection.
                        umap(default) / sklearn - for sklearn's tsne / matten
                        - for his implementation of tsne
  --no-svd              it is better to reduce the dimension of long dense
                        vectors to a size of 50 or smallerbefore computing the
                        tsne.use this flag if you don't want to do so
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        for speed/memory size errors consider using just a
                        portion of your data (default=all)

root@yonti:~/github/visualize-tsne$ python cmd.py -d /home/data/data.hdf5 -i 50 -s 4000 -n test 

full usage options

# the folowing have both getter and setter
image.path2doc # getter 
image.path2doc = '/home/data/data.hdf5' # setter -> expects string and correct path to an hdf5 file

image.output_img_name  #  getter
image.output_img_name = 'be_creative'  # expects string. default is 'tsne'
                                       # don't add the file type - jpg is set automatically
                                       # also the image type(scatter/grid) is added automatically
image.output_img_type  #  getter
image.output_img_type = 'grid' # expects string. default is 'scatter'. set grid to this way.

image.output_img_size  #  getter
image.output_img_size =  2500  # expects int. default is 2500. 
                               # all images are squared so it means 2500x2500 img.
                               # also the image type(scatter/grid) is added automatically

image.each_img_size    #  getter
image.each_img_size =  50      # expects int. default is 50. 
                               # the output looks better when constructed with squared images
                               # but can also handle rects
                               
image.image_list       #  getter
image.image_list = img_list    # expects numpy array of strings. 
                               # this is filled up automatically when load_data is called.
                               # set this explicitly only if you dont load your data from 
                               # an hdf5 file

image.data_vectors      #  getter
image.data_vectors = data       # expects numpy ndarray of dense vectors. 
                               # this is filled up automatically when load_data is called.
                               # set this explicitly only if you dont load your data from 
                               # an hdf5 file

image.batch_size       #  getter
image.batch_size =  5000       # expects int. default is 0 which means that all images are taken
                               # use this when you have memory issues. 
                               # it will shuffle your data and take only a subset in order to 
                               # compute the tsne. 

image.method       #  getter
image.method =  'maaten'       # expects string. default is 'umap'.
                               # it is both effiecient in time and ,to my naked eye, seperates the clusters better. 
                               # the other options are 'sklearn' and 'maaten'
                               # this sets the tsne method to sklearn.tsne vs python version
                               # of Maaten's tsne.
                               # i guess they both do the same but didn't fully check it 
                               # so i left it as an option

image.background_color         #  getter
image.background_color =  'white'  # expects string. default is 'black'. the other option is 'white'
                                        
image.tsne_vectors      #  getter
image.tsne_vectors = data       # expects numpy ndarray of dense 2d vectors. 
                               # this is filled up automatically when 
                               # image.calaculate_tsne is called.
                               # set this explicitly only if you have already the tsne vectors

# the followings are methods
image.load_data()  #  opens the file which path2file point to
                   #  fills image.data_vectors and image.image_list  
                   
image.calculate_tsne()  #  straight forward

image.create_image()  #  straight forward

embeddings2image's People

Contributors

dependabot[bot] avatar nivha avatar yontilevin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

embeddings2image's Issues

End to end example

Great project! Adding an end to end example would be very helpful.
(for example, the mnist example, but adding how the data was created and so on)

(also, there are hard-coded paths in the code, such as in modules.py line 165)

Image "fullness" control

When embedding my images I am left with a huge canvas with sparse tiny images. Changing the image size parameter does not increase the density of images, as the final resolution increases as well. Do you know which parameters control how "dense" the output canvas is? I would like to have large images relative to the total canvas size. Thanks!

row, column index error

Hi
Thank you for your work. This saved me a lot of time!

I faced the same problem with one of the previous issues. The shape of the small_img doesn't match with the row, column indexing.

File "./visualize-tsne/modules.py", line 267, in _scatter
image[x0 + dx:x0 + dx + x1, y0 + dy:y0 + dy + y1] = small_img
ValueError: could not broadcast input array from shape (100,75,3) into shape (75,100,3)
This can be fixed by adding the following line:

I found out the indexing has to be changed like this.
image[x0 + dx:x0 + dx + x1, y0 + dy:y0 + dy + y1] = small_img
--> image[ y0 + dy:y0 + dy + y1, x0 + dx:x0 + dx + x1] = small_img

If you could update the code, it would be great!

Row/column indexing error?

When running this code, I get the following error:

 File "./visualize-tsne/modules.py", line 267, in _scatter
    image[x0 + dx:x0 + dx + x1, y0 + dy:y0 + dy + y1] = small_img
ValueError: could not broadcast input array from shape (80,65,3) into shape (65,80,3)

This can be fixed by adding the following line:

if np.max(image[x0 + dx:x0 + dx + x1, y0 + dy:y0 + dy + y1]) > 0:
  continue
--> small_img = small_img.transpose(1,0,2)
image[x0 + dx:x0 + dx + x1, y0 + dy:y0 + dy + y1] = small_img

But then the images are all rotated 90 degrees.

Simply switching the indices lead to some other issues.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.