pwharrison / modern-nlp-in-python Goto Github PK

View Code? Open in Web Editor NEW

376.0 376.0 226.0 601 KB

Jupyter Notebook 100.00%

modern-nlp-in-python's People

Stargazers

Watchers

Forkers

vav1288 juliegkim1 mandeepbal amit-dingare navinshah vijayendra-g atyamsriharsha jeleandro 2legit l06102128 lalithakishore pgnepal bruso randy3465 shaneshifflett earlynr ipsolar zouzias fehiepsi ynalcakan dataclip kmrvijay maxsop cbgk46 little1tow allensmile zhanglae benjamesbabala zhangruiskyline mllog andrewjsiu mukeshjaiswal44 dkuang1980 cristinaandronescu ocarneiro kpdir cswanghao davidbradway th93ce stchau4work m7catsue chaitanyacixlive avi990 johnkabler chao-shi-git richmcaleavey yeyaum akshayjh fichel henghuiz-zz overfitter rsarthakshekhar rtao isaac34mi johnshushu gp2454 chatkausik margaretnym adaj d0tn3t chetankhatri narulkargunjan rachelmadler diegoami pawanpatil94 nathandwalker tomyc toclim ingokl seanreed1111 gdpan919 pramodkumar8 coneeleven ak-py herdingbats ersiu kkonz minas1900 grumpylittleted alvijohn hehuan0430puphy perezv72 jayteesf anki1909 akg003 kolliparap ryanmetz hardikgw maryamnajafian jjediny carlosandres12 anthonyyeo dataist2019 jagdeepsingh28 luxiaolingfei ronakshah92 magicwanda pandagod cwrather alexjmsherman

modern-nlp-in-python's Issues

Python 3.7 Migration

Suggested modifications for modern-nlp-in-python/executable/Modern_NLP_in_Python.ipynb:

# Change tuple reference in sorted to use third term rather than multi-parameter lamba func:
ordered_vocab = sorted(ordered_vocab, key=lambda x: -x[2])

# Change 'w' to 'wb' when writing binary pickle file:
with open(tsne_filepath, 'wb') as f:

# Add 'rb' as argument rather than default argument:
with open(tsne_filepath, 'rb') as f:


# --------------------------------------
# bokeh unfortunately is not working when I pull this up so we do not get to see the visualization

# Tetsted with my own data (rather than yelp data), this code may help someone get started:
%matplotlib inline
import matplotlib.pyplot as plt

tsne_vectors.plot.scatter('x_coord', 'y_coord', figsize=(14,14))
ax = plt.gca()

for cntr, txt in enumerate(tsne_vectors.index):
    ax.annotate(txt, (tsne_vectors[tsne_vectors.index==txt].x_coord, tsne_vectors[tsne_vectors.index==txt].y_coord))
    if cntr > 50:
        # for demo purposes label an arbitrary set of points
        break
plt.show()

Did you use your own machine for Yelp Food Reviews' Analysis?

Hi Patrick,

Apologies, I tried finding other means of approaching you and asking this question but could not find any.

Did you conduct all of the analyses in the Yelp Food Reviews on your machine, without resorting to AWS or others? Because I need to do some analyses myself on 20GiB text data, and I want to try Python, if you used it without encountering RAM issues.

Thanks ever so much in advance.

Code giving error

Hi,

This is a great resource. I wanted to point out that some of the code especially for LDA analyzing and word vector creation is not working. I am guessing that this is due to updated version of gensim. Also some of the code shows errors for Python 3.

# build a list of the terms, integer indices,
# and term counts from the food2vec model vocabulary
ordered_vocab = [(term, voc.index, voc.count) for term, voc in food2vec.vocab.items()]

This is giving me an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-21-2c76ac03d027> in <module>()
      1 # build a list of the terms, integer indices,
      2 # and term counts from the food2vec model vocabulary
----> 3 ordered_vocab = [(term, voc.index, voc.count) for term, voc in food2vec.vocab.items()]

AttributeError: 'Word2Vec' object has no attribute 'vocab'

I'm am trying to update the code to bring the original output.

Code Giving Error

After getting the data set from Kaggle the following lines of code is giving an error
``import json

restaurant_ids = set()

open the businesses file

with codecs.open(businesses_filepath, encoding='utf_8') as f:

# iterate through each line (json record) in the file
for business_json in f:
    
    # convert the json record to a Python dict
    business = json.loads(business_json)
    #print(business)
    # if this business is not a restaurant, skip to the next one
    if u'Restaurants' not in business[u'categories']:
        continue
        
    # add the restaurant business id to our restaurant_ids set
    restaurant_ids.add(business[u'business_id'])

turn restaurant_ids into a frozenset, as we don't need to change it anymore

restaurant_ids = frozenset(restaurant_ids)

print the number of unique restaurant ids in the dataset

print ('{:,}'.format(len(restaurant_ids)), u'restaurants in the dataset.')``

The error is

TypeError Traceback (most recent call last)
in ()
13 #print(business)
14 # if this business is not a restaurant, skip to the next one
---> 15 if u'Restaurants' not in business[u'categories']:
16 continue
17

TypeError: argument of type 'NoneType' is not iterable

Thanks,
Maniceet Sahay

Using the modern-nlp notebook at spacy-notebooks

Hello! I've been trying to aggregate all the spaCy tutorials in the form of jupyter notebooks, when I came across your repo in the tutorial section. The notebook was very useful, and I wanted to know if it's ok if I add it to this repo.

It'll be credited (and linked) appropriately, obviously.

Thanks again for the notebook and your help!

pwharrison / modern-nlp-in-python Goto Github PK

modern-nlp-in-python's People

Stargazers

Watchers

Forkers

modern-nlp-in-python's Issues

Python 3.7 Migration

Suggested modifications for modern-nlp-in-python/executable/Modern_NLP_in_Python.ipynb:

Did you use your own machine for Yelp Food Reviews' Analysis?

Code giving error

Code Giving Error

open the businesses file

turn restaurant_ids into a frozenset, as we don't need to change it anymore

print the number of unique restaurant ids in the dataset

Using the modern-nlp notebook at spacy-notebooks

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent