Giter VIP home page Giter VIP logo

priyanmuthu / lloom Goto Github PK

View Code? Open in Web Editor NEW

This project forked from michelle123lam/lloom

0.0 0.0 0.0 8.27 MB

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM (CHI 2024 paper). LLooM automatically surfaces high-level concepts to analyze unstructured text.

Home Page: https://stanfordhci.github.io/lloom

License: BSD 3-Clause "New" or "Revised" License

JavaScript 0.81% Python 56.18% Svelte 43.01%

lloom's Introduction

LLooM

Open In Colab Β  PyPI text_lloom

LLooM is an interactive text analysis tool introduced as part of an ACM CHI 2024 paper:

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM. Michelle S. Lam, Janice Teoh, James Landay, Jeffrey Heer, Michael S. Bernstein. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24).

LLooM splash figure

What is LLooM?

LLooM is an interactive data analysis tool for unstructured text data, such as social media posts, paper abstracts, and articles. Manual text analysis is laborious and challenging to scale to large datasets, and automated approaches like topic modeling and clustering tend to focus on lower-level keywords that can be difficult for analysts to interpret.

By contrast, the LLooM algorithm turns unstructured text into meaningful high-level concepts that are defined by explicit inclusion criteria in natural language. For example, on a dataset of toxic online comments, while a BERTopic model outputs "women, power, female", LLooM produces concepts such as "Criticism of gender roles" and "Dismissal of women's concerns". We call this process concept induction: a computational process that produces high-level concepts from unstructured text.

The LLooM Workbench is an interactive text analysis tool that visualizes data in terms of the concepts that LLooM surfaces. With the LLooM Workbench, data analysts can inspect the automatically-generated concepts and author their own custom concepts to explore the data.

What can I do with LLooM?

LLooM can assist with a range of data analysis goalsβ€”from preliminary exploratory analysis to theory-driven confirmatory analysis. Analysts can review LLooM concepts to interpret emergent trends in the data, but they can also author concepts to actively seek out certain phenomena in the data. Concepts can be compared with existing metadata or other concepts to perform statistical analyses, generate plots, or train a model.

LLooM pull figure

Example notebooks

Check out the Examples section to walk through case studies using LLooM, including:

Workbench visualization

LLooM Workbench UI

After running concept induction, the Workbench can display an interactive visualization like the one above. LLooM Workbench features include:

  • A: Concept Overview: Displays an overview of the dataset in terms of concepts and their prevalence.
  • B: Concept Matrix: Provides an interactive summary of the concepts. Users can click on concept rows to inspect concept details and associated examples. Aids comparison between concepts and other metadata columns with user-defined slice columns.
  • C: Detail View (for Concept or Slice):
    • C1: Concept Details: Includes concept information like the Name, Inclusion criteria, Number of doc matches, and Representative examples.
    • C2: Concept Matches and Non-Matches: Shows all input documents in table form. Includes the original text, bullet summaries, concept scores, highlighted text that exemplifies the concept, score rationale, and metadata columns.

How does LLooM work?

LLooM is a concept induction algorithm that extracts and applies concepts to make sense of unstructured text datasets. LLooM leverages large language models (specifically GPT-3.5 and GPT-4 in the current implementation) to synthesize sampled text spans, generate concepts defined by explicit criteria, apply concepts back to data, and iteratively generalize to higher-level concepts.

LLooM splash figure

Get Started

Follow the Get Started instructions on our documentation for a walkthrough of the main LLooM functions to run on your own dataset. We suggest starting with this template Colab Notebook.

This will involve downloading our Python package, available on PyPI as text_lloom. We recommend setting up a virtual environment with venv or conda.

pip install text_lloom

Contact

LLooM is a research prototype and still under active development! Feel free to reach out to Michelle Lam at [email protected] if you have questions, run into issues, or want to contribute.

Citation

If you find this work useful to you, we'd appreciate you citing our paper!

@article{lam2024conceptInduction,
    author = {Lam, Michelle S. and Teoh, Janice and Landay, James and Heer, Jeffrey and Bernstein, Michael S.},
    title = {Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM},
    year = {2024},
    isbn = {9798400703300},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3613904.3642830},
    doi = {10.1145/3613904.3642830},
    booktitle = {Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems},
    articleno = {933},
    numpages = {28},
    location = {Honolulu, HI, USA},
    series = {CHI '24}
}

lloom's People

Contributors

michelle123lam avatar justinebreuch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.