Giter VIP home page Giter VIP logo

jon-chun / nlp-in-practice Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kavgan/nlp-in-practice

0.0 0.0 0.0 93.96 MB

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Home Page: http://kavita-ganesan.com/kavitas-tutorials/#.WvIizNMvyog

Jupyter Notebook 99.28% Python 0.72%

nlp-in-practice's Introduction

NLP-IN-PRACTICE

Use these NLP, Text Mining and Machine Learning code samples and tools to solve real world text data problems.

Notebooks / Source

Links in the first column take you to the subfolder/repository with the source code.

Task Related Article Source Type Description
Large Scale Phrase Extraction phrase2vec article python script Extract phrases for large amounts of data using PySpark. Annotate text using these phrases or use the phrases for other downstream tasks.
Word Cloud for Jupyter Notebook and Python Web Apps word_cloud article python script + notebook Visualize top keywords using word counts or tfidf
Gensim Word2Vec (with dataset) word2vec article notebook How to work correctly with Word2Vec to get desired results
Reading files and word count with Spark spark article python script How to read files of different formats using PySpark with a word count example
Extracting Keywords with TF-IDF and SKLearn (with dataset) tfidf article notebook How to extract interesting keywords from text using TF-IDF and Python's SKLEARN
Text Preprocessing text preprocessing article notebook A few code snippets on how to perform text preprocessing. Includes stemming, noise removal, lemmatization and stop word removal.
TFIDFTransformer vs. TFIDFVectorizer tfidftransformer and tfidfvectorizer usage article notebook How to use TFIDFTransformer and TFIDFVectorizer correctly and the difference between the two and what to use when.
Accessing Pre-trained Word Embeddings with Gensim Pre-trained word embeddings article notebook How to access pre-trained GloVe and Word2Vec Embeddings using Gensim and an example of how these embeddings can be leveraged for text similarity
Text Classification in Python (with news dataset) Text classification with Logistic Regression article notebook Get started with text classification. Learn how to build and evaluate a text classifier for news classification using Logistic Regression.
CountVectorizer Usage Examples How to Correctly Use CountVectorizer? An In-Depth Look article notebook Learn how to maximize the use of CountVectorizer such that you are not just computing counts of words, but also preprocessing your text data appropriately as well as extracting additional features from your text dataset.
HashingVectorizer Examples HashingVectorizer Vs. CountVectorizer article notebook Learn the differences between HashingVectorizer and CountVectorizer and when to use which.
CBOW vs. SkipGram Word2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI article notebook A quick comparison of the three embeddings architecture.

Notes

Contact

This repository is maintained by Kavita Ganesan. Connect with me on LinkedIn or Twitter.

nlp-in-practice's People

Contributors

kavgan avatar brusic avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.