Giter VIP home page Giter VIP logo

topic-wizard's Introduction

topicwizard


Pretty and opinionated topic model visualization in Python.

Open in Colab PyPI version pip downloads python version Code style: black

topicwizard_new_release-2023-04-25_09.38.23.mp4

New in version 0.3.0 ๐ŸŒŸ ๐ŸŒŸ

  • Exclude pages, that are not needed ๐Ÿฆ
  • Self-contained interactive figures ๐ŸŽ
  • Topic name inference is now default behavior and is done implicitly.

Features

  • Investigate complex relations between topics, words and documents
  • Highly interactive
  • Automatically infer topic names
  • Name topics manually
  • Pretty ๐ŸŽจ
  • Intuitive ๐Ÿฎ
  • Clean API ๐Ÿฌ
  • Sklearn, Gensim and BERTopic compatible ๐Ÿ”ฉ
  • Easy deployment ๐ŸŒ

Installation

Install from PyPI:

pip install topic-wizard

Step 1:

Train a scikit-learn compatible topic model. (If you want to use non-scikit-learn topic models, check compatibility)

from sklearn.decomposition import NMF
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline

# Create topic pipeline
topic_pipeline = make_pipeline(
    CountVectorizer(),
    NMF(n_components=10),
)

# Then fit it on the given texts
topic_pipeline.fit(texts)

Step 2a:

Visualize with the topicwizard webapp ๐Ÿ’ก

import topicwizard

topicwizard.visualize(pipeline=topic_pipeline, corpus=texts)

From version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:

import topicwizard

# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(pipeline=topic_pipeline, corpus=texts, exclude_pages=["documents"])

topics screenshot words screenshot words screenshot documents screenshot

Ooooor...

Step 2b:

Produce high quality self-contained HTML plots and create your own dashboards/reports ๐Ÿ“

Map of words

from topicwizard.figures import word_map

word_map(corpus=texts, pipeline=pipeline)

word map screenshot

Timelines of topic distributions

from topicwizard.figures import document_topic_timeline

document_topic_timeline(
    "Joe Biden takes over presidential office from Donald Trump.",
    pipeline=pipeline,
)

document timeline

Wordclouds of your topics โ˜๏ธ

from topicwizard.figures import topic_wordclouds

topic_wordclouds(corpus=texts, pipeline=pipeline)

wordclouds

And much more (documentation)

topic-wizard's People

Contributors

x-tabdeveloping avatar kitchentable99 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.