Pretty and opinionated topic model visualization in Python.
topicwizard_new_release-2023-04-25_09.38.23.mp4
- Exclude pages, that are not needed ๐ฆ
- Self-contained interactive figures ๐
- Topic name inference is now default behavior and is done implicitly.
- Investigate complex relations between topics, words and documents
- Highly interactive
- Automatically infer topic names
- Name topics manually
- Pretty ๐จ
- Intuitive ๐ฎ
- Clean API ๐ฌ
- Sklearn, Gensim and BERTopic compatible ๐ฉ
- Easy deployment ๐
Install from PyPI:
pip install topic-wizard
Usage (documentation)
Train a scikit-learn compatible topic model. (If you want to use non-scikit-learn topic models, check compatibility)
from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline
# Create topic pipeline
topic_pipeline = make_pipeline(
CountVectorizer(),
NMF(n_components=10),
)
# Then fit it on the given texts
topic_pipeline.fit(texts)
Visualize with the topicwizard webapp ๐ก
import topicwizard
topicwizard.visualize(pipeline=topic_pipeline, corpus=texts)
From version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:
import topicwizard
# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(pipeline=topic_pipeline, corpus=texts, exclude_pages=["documents"])
Ooooor...
Produce high quality self-contained HTML plots and create your own dashboards/reports ๐
from topicwizard.figures import word_map
word_map(corpus=texts, pipeline=pipeline)
from topicwizard.figures import document_topic_timeline
document_topic_timeline(
"Joe Biden takes over presidential office from Donald Trump.",
pipeline=pipeline,
)
from topicwizard.figures import topic_wordclouds
topic_wordclouds(corpus=texts, pipeline=pipeline)