Giter VIP home page Giter VIP logo

modern-nlp-in-python's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modern-nlp-in-python's Issues

Python 3.7 Migration

Suggested modifications for modern-nlp-in-python/executable/Modern_NLP_in_Python.ipynb:

# Change tuple reference in sorted to use third term rather than multi-parameter lamba func:
ordered_vocab = sorted(ordered_vocab, key=lambda x: -x[2])

# Change 'w' to 'wb' when writing binary pickle file:
with open(tsne_filepath, 'wb') as f:

# Add 'rb' as argument rather than default argument:
with open(tsne_filepath, 'rb') as f:


# --------------------------------------
# bokeh unfortunately is not working when I pull this up so we do not get to see the visualization

# Tetsted with my own data (rather than yelp data), this code may help someone get started:
%matplotlib inline
import matplotlib.pyplot as plt

tsne_vectors.plot.scatter('x_coord', 'y_coord', figsize=(14,14))
ax = plt.gca()

for cntr, txt in enumerate(tsne_vectors.index):
    ax.annotate(txt, (tsne_vectors[tsne_vectors.index==txt].x_coord, tsne_vectors[tsne_vectors.index==txt].y_coord))
    if cntr > 50:
        # for demo purposes label an arbitrary set of points
        break
plt.show()

Did you use your own machine for Yelp Food Reviews' Analysis?

Hi Patrick,

Apologies, I tried finding other means of approaching you and asking this question but could not find any.

Did you conduct all of the analyses in the Yelp Food Reviews on your machine, without resorting to AWS or others? Because I need to do some analyses myself on 20GiB text data, and I want to try Python, if you used it without encountering RAM issues.

Thanks ever so much in advance.

Code giving error

Hi,

This is a great resource. I wanted to point out that some of the code especially for LDA analyzing and word vector creation is not working. I am guessing that this is due to updated version of gensim. Also some of the code shows errors for Python 3.

# build a list of the terms, integer indices,
# and term counts from the food2vec model vocabulary
ordered_vocab = [(term, voc.index, voc.count) for term, voc in food2vec.vocab.items()]

This is giving me an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-21-2c76ac03d027> in <module>()
      1 # build a list of the terms, integer indices,
      2 # and term counts from the food2vec model vocabulary
----> 3 ordered_vocab = [(term, voc.index, voc.count) for term, voc in food2vec.vocab.items()]

AttributeError: 'Word2Vec' object has no attribute 'vocab'

I'm am trying to update the code to bring the original output.

Code Giving Error

After getting the data set from Kaggle the following lines of code is giving an error
``import json

restaurant_ids = set()

open the businesses file

with codecs.open(businesses_filepath, encoding='utf_8') as f:

# iterate through each line (json record) in the file
for business_json in f:
    
    # convert the json record to a Python dict
    business = json.loads(business_json)
    #print(business)
    # if this business is not a restaurant, skip to the next one
    if u'Restaurants' not in business[u'categories']:
        continue
        
    # add the restaurant business id to our restaurant_ids set
    restaurant_ids.add(business[u'business_id'])

turn restaurant_ids into a frozenset, as we don't need to change it anymore

restaurant_ids = frozenset(restaurant_ids)

print the number of unique restaurant ids in the dataset

print ('{:,}'.format(len(restaurant_ids)), u'restaurants in the dataset.')``

The error is

TypeError Traceback (most recent call last)
in ()
13 #print(business)
14 # if this business is not a restaurant, skip to the next one
---> 15 if u'Restaurants' not in business[u'categories']:
16 continue
17

TypeError: argument of type 'NoneType' is not iterable

Thanks,
Maniceet Sahay

Using the modern-nlp notebook at spacy-notebooks

Hello! I've been trying to aggregate all the spaCy tutorials in the form of jupyter notebooks, when I came across your repo in the tutorial section. The notebook was very useful, and I wanted to know if it's ok if I add it to this repo.

It'll be credited (and linked) appropriately, obviously.

Thanks again for the notebook and your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.