Giter VIP home page Giter VIP logo

stjordanis / nlp_profiler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from neomatrix369/nlp_profiler

0.0 2.0 0.0 1.11 MB

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

License: Other

Python 7.78% Jupyter Notebook 90.99% Shell 1.23%

nlp_profiler's Introduction

NLP Profiler License

A simple NLP library allows profiling datasets with one or more text columns.

When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

In short: Think of it as using the pandas.describe() function or running Pandas Profiling on your data frame, but for datasets containing text columns rather than the usual columnar datasets.

More detail: so what do you get from the library:

  • input a Pandas dataframe series as input param
  • and you get back a new dataframe with various features about the parsed text per row
    • high-level: sentiment analysis, objectivity/subjectivity analysis, spelling quality check, grammar quality check, etc...
    • low-level/granular: number of characters in the sentence, number of words, number of emojis, number of words, etc...
  • from the above numerical data in the resulting dataframe descriptive statistics can be drawn using the pandas.describe() on the dataframe

See screenshots under the Jupyter section and also under Screenshots for further illustrations.

Under the hood it does make use of a number of libraries that are popular in the AI and ML communities, but we can extend it's functionality by replacing or adding other libraries as well.

A simple notebook have been provided to illustrate the usage of the library.

Note: this is a new endeavour and it's probably NOT capable of doing many things yet, including running at scale. Many of these gaps are opportunities we can work on and plug, as we go along using it.

Requirements

  • Python 3.7.x or higher
  • Dependencies described in the requirements.txt
  • (Optional)
    • Jupyter Lab (on your local machine)
    • Google Colab account
    • Kaggle account

Get started

Demo

Take a look at this short demo of the NLP Profiler library by clicking on the below image: Demo of the NLP Profiler libraryor you find the rest of the talk here.

Installation

Install directly from the GitHub repo:

pip install git+https://github.com/neomatrix369/nlp_profiler.git@master

Usage

import nlp_profiler.core as nlpprof

new_text_column_dataset = nlpprof.apply_text_profiling(dataset['text_column'])

or

from nlp_profiler.core import apply_text_profiling

new_text_column_dataset = apply_text_profiling(dataset['text_column'])

See Notebooks section for further illustrations.

Notebooks

Jupyter

See Jupyter Notebook

Google Colab

You can open these notebooks directly in Google Colab

Kaggle kernels

Notebook/Kernel | Script | Other related links

Screenshots

Importing the library


Pandas describe() function

Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing (and warranty) policy.


Go to the NLP page

nlp_profiler's People

Contributors

neomatrix369 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.