Wordmap

Visualize large collections of text data with WebGL

Installation

pip install wordmap

Basic Usage

To create a visualization from a directory of text files, you can call wordmap as follows:

wordmap --texts "data/*.txt"

That process creates a visualization in ./web that can be viewed if you start a local web server:

# python 2
python -m SimpleHTTPServer 7090

# python 3
python -m http.server 7090

After starting the web server, navigate to http://localhost:7090/web/ to view the visualization.

Command Line Arguments

The following flags can be passed to the wordmap command. Type --help to see the full list:

--texts A glob of files to process

--encoding The encoding of input files

--max_n The maximum number of words/docs to include in the visualization

--layouts The layouts to render {umap, tsne, grid, img, obj}

--obj_file An .obj file that should be used to create the obj layout

--img_file A .png or .jpg file that should be used to create the img layout

--n_components The number of dimensions to use when creating the layouts

--tsne_perplexity The perplexity value to use when creating TSNE layout

--umap_n_neighbors The n_neighbors value to use when creating UMAP layout

--umap_min_distance The min_distance value to use when creating the UMAP layout

--model_type The model type to use {word2vec}

--use_cache Boolean that, if True, will load saved layouts from models

--model_name The name to use when saving a model to disk

--model A persisted model to use to create layouts

--size The number of dimensions to include in Word2Vec vectors

--window The number of words to include in windows when creating a Word2Vec model

--iter The maximum number of iterations to run the created model

--min_count The minimum occurrences of each word to be included in the Word2Vec model

--workers The number of computer cores to use when processing input data

--verbose If true, logs progress during layout construction

Examples:

Create a wordmap of the text files in ./data using the umap, tsne, and grid layouts:

wordmap --texts "data/*.txt" \
  --layouts umap tsne grid

Create a wordmap using a saved Word2Vec model with 3 dimsions and a maximum of 10000 words:

wordmap --model "1563222036.model" \
  --n_components 3 \
  --max_n 10000

Create a wordmap with several layouts, each with multiple parameter steps:

python wordmap/wordmap.py \
  --texts "data/philosophical_transactions/*.txt" \
  --layouts tsne umap grid \
  --tsne_perplexity 5 25 100 \
  --umap_n_neighbors 2 20 200 \
  --umap_min_dist 0.01 0.1 1.0 \
  --n_clusters 10 25 \
  --iter 100

Complaints about dependencies

I'm not sure if this is related to how the module is registered in PyPI or if I'm doing something wrong, but when I do pip install wordmap in a clean Python3 environment and then run the local version of wordmap, I get a ModuleNotFoundError: No module named 'vertices'.

After installing vertices manually, I then get a ModuleNotFoundError: No module named 'tensorflow' -- perhaps due to the use of EpochLogger in the code?

Installing tensorflow manually then leads to a ImportError: cannot import name 'imread' from 'scipy.misc' error, because it has overwritten the version of scipy specified for the module (1.1.0).

Forcing scipy back to 1.1.0 finally allows wordmap to run, but probably there's a pretty simple fix to avoid this runaround?

yaledhlab / wordmap Goto Github PK

wordmap's Introduction

Wordmap

Installation

Basic Usage

Command Line Arguments

wordmap's People

Contributors

Stargazers

Watchers

Forkers

wordmap's Issues

Support NMF model

Missing 'assets/' folder in 'web/' output

Add Loaders

Complaints about dependencies

Support for multilingual embeddings?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent