Giter VIP home page Giter VIP logo

wordmap's Introduction

Wordmap

Visualize large collections of text data with WebGL

App preview

Installation

pip install wordmap

Basic Usage

To create a visualization from a directory of text files, you can call wordmap as follows:

wordmap --texts "data/*.txt"

That process creates a visualization in ./web that can be viewed if you start a local web server:

# python 2
python -m SimpleHTTPServer 7090

# python 3
python -m http.server 7090

After starting the web server, navigate to http://localhost:7090/web/ to view the visualization.

Command Line Arguments

The following flags can be passed to the wordmap command. Type --help to see the full list:

--texts A glob of files to process

--encoding The encoding of input files

--max_n The maximum number of words/docs to include in the visualization

--layouts The layouts to render {umap, tsne, grid, img, obj}

--obj_file An .obj file that should be used to create the obj layout

--img_file A .png or .jpg file that should be used to create the img layout

--n_components The number of dimensions to use when creating the layouts

--tsne_perplexity The perplexity value to use when creating TSNE layout

--umap_n_neighbors The n_neighbors value to use when creating UMAP layout

--umap_min_distance The min_distance value to use when creating the UMAP layout

--model_type The model type to use {word2vec}

--use_cache Boolean that, if True, will load saved layouts from models

--model_name The name to use when saving a model to disk

--model A persisted model to use to create layouts

--size The number of dimensions to include in Word2Vec vectors

--window The number of words to include in windows when creating a Word2Vec model

--iter The maximum number of iterations to run the created model

--min_count The minimum occurrences of each word to be included in the Word2Vec model

--workers The number of computer cores to use when processing input data

--verbose If true, logs progress during layout construction

Examples:

Create a wordmap of the text files in ./data using the umap, tsne, and grid layouts:

wordmap --texts "data/*.txt" \
  --layouts umap tsne grid

Create a wordmap using a saved Word2Vec model with 3 dimsions and a maximum of 10000 words:

wordmap --model "1563222036.model" \
  --n_components 3 \
  --max_n 10000

Create a wordmap with several layouts, each with multiple parameter steps:

python wordmap/wordmap.py \
  --texts "data/philosophical_transactions/*.txt" \
  --layouts tsne umap grid \
  --tsne_perplexity 5 25 100 \
  --umap_n_neighbors 2 20 200 \
  --umap_min_dist 0.01 0.1 1.0 \
  --n_clusters 10 25 \
  --iter 100

wordmap's People

Contributors

duhaime avatar idroz avatar pleonard212 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

wordmap's Issues

Support NMF model

It'd be nice to support NMF as a model_type, then render only points, where each point represents a document (or subregion of a document if a user specified a window_length argument or similar). By hovering on a point we could fetch the text from that passage:

preview

More generally it'd be nice to create a uniform vector space in which, say, a text region describing cats would hash to the same position as an image of cats. Then the mouseover could allow one to explore a vector space with heterogenous object types...

Add Loaders

There are XHR hooks that give loading progress callbacks. We should dip into those to show a loader so users know that new data is on the way...

Complaints about dependencies

I'm not sure if this is related to how the module is registered in PyPI or if I'm doing something wrong, but when I do pip install wordmap in a clean Python3 environment and then run the local version of wordmap, I get a ModuleNotFoundError: No module named 'vertices'.

After installing vertices manually, I then get a ModuleNotFoundError: No module named 'tensorflow' -- perhaps due to the use of EpochLogger in the code?

Installing tensorflow manually then leads to a ImportError: cannot import name 'imread' from 'scipy.misc' error, because it has overwritten the version of scipy specified for the module (1.1.0).

Forcing scipy back to 1.1.0 finally allows wordmap to run, but probably there's a pretty simple fix to avoid this runaround?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.