modern-nlp-in-python's People
Forkers
vav1288 juliegkim1 mandeepbal amit-dingare navinshah vijayendra-g atyamsriharsha jeleandro 2legit l06102128 lalithakishore pgnepal bruso randy3465 shaneshifflett earlynr ipsolar zouzias fehiepsi ynalcakan dataclip kmrvijay maxsop cbgk46 little1tow allensmile zhanglae benjamesbabala zhangruiskyline mllog andrewjsiu mukeshjaiswal44 dkuang1980 cristinaandronescu ocarneiro kpdir cswanghao davidbradway th93ce stchau4work m7catsue chaitanyacixlive avi990 johnkabler chao-shi-git richmcaleavey yeyaum akshayjh fichel henghuiz-zz overfitter rsarthakshekhar rtao isaac34mi johnshushu gp2454 chatkausik margaretnym adaj d0tn3t chetankhatri narulkargunjan rachelmadler diegoami pawanpatil94 nathandwalker tomyc toclim ingokl seanreed1111 gdpan919 pramodkumar8 coneeleven ak-py herdingbats ersiu kkonz minas1900 grumpylittleted alvijohn hehuan0430puphy perezv72 jayteesf anki1909 akg003 kolliparap ryanmetz hardikgw maryamnajafian jjediny carlosandres12 anthonyyeo dataist2019 jagdeepsingh28 luxiaolingfei ronakshah92 magicwanda pandagod cwrather alexjmshermanmodern-nlp-in-python's Issues
Python 3.7 Migration
Suggested modifications for modern-nlp-in-python/executable/Modern_NLP_in_Python.ipynb:
# Change tuple reference in sorted to use third term rather than multi-parameter lamba func:
ordered_vocab = sorted(ordered_vocab, key=lambda x: -x[2])
# Change 'w' to 'wb' when writing binary pickle file:
with open(tsne_filepath, 'wb') as f:
# Add 'rb' as argument rather than default argument:
with open(tsne_filepath, 'rb') as f:
# --------------------------------------
# bokeh unfortunately is not working when I pull this up so we do not get to see the visualization
# Tetsted with my own data (rather than yelp data), this code may help someone get started:
%matplotlib inline
import matplotlib.pyplot as plt
tsne_vectors.plot.scatter('x_coord', 'y_coord', figsize=(14,14))
ax = plt.gca()
for cntr, txt in enumerate(tsne_vectors.index):
ax.annotate(txt, (tsne_vectors[tsne_vectors.index==txt].x_coord, tsne_vectors[tsne_vectors.index==txt].y_coord))
if cntr > 50:
# for demo purposes label an arbitrary set of points
break
plt.show()
Did you use your own machine for Yelp Food Reviews' Analysis?
Hi Patrick,
Apologies, I tried finding other means of approaching you and asking this question but could not find any.
Did you conduct all of the analyses in the Yelp Food Reviews on your machine, without resorting to AWS or others? Because I need to do some analyses myself on 20GiB text data, and I want to try Python, if you used it without encountering RAM issues.
Thanks ever so much in advance.
Code giving error
Hi,
This is a great resource. I wanted to point out that some of the code especially for LDA analyzing and word vector creation is not working. I am guessing that this is due to updated version of gensim. Also some of the code shows errors for Python 3.
# build a list of the terms, integer indices,
# and term counts from the food2vec model vocabulary
ordered_vocab = [(term, voc.index, voc.count) for term, voc in food2vec.vocab.items()]
This is giving me an error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-21-2c76ac03d027> in <module>()
1 # build a list of the terms, integer indices,
2 # and term counts from the food2vec model vocabulary
----> 3 ordered_vocab = [(term, voc.index, voc.count) for term, voc in food2vec.vocab.items()]
AttributeError: 'Word2Vec' object has no attribute 'vocab'
I'm am trying to update the code to bring the original output.
Code Giving Error
After getting the data set from Kaggle the following lines of code is giving an error
``import json
restaurant_ids = set()
open the businesses file
with codecs.open(businesses_filepath, encoding='utf_8') as f:
# iterate through each line (json record) in the file
for business_json in f:
# convert the json record to a Python dict
business = json.loads(business_json)
#print(business)
# if this business is not a restaurant, skip to the next one
if u'Restaurants' not in business[u'categories']:
continue
# add the restaurant business id to our restaurant_ids set
restaurant_ids.add(business[u'business_id'])
turn restaurant_ids into a frozenset, as we don't need to change it anymore
restaurant_ids = frozenset(restaurant_ids)
print the number of unique restaurant ids in the dataset
print ('{:,}'.format(len(restaurant_ids)), u'restaurants in the dataset.')``
The error is
TypeError Traceback (most recent call last)
in ()
13 #print(business)
14 # if this business is not a restaurant, skip to the next one
---> 15 if u'Restaurants' not in business[u'categories']:
16 continue
17
TypeError: argument of type 'NoneType' is not iterable
Thanks,
Maniceet Sahay
Using the modern-nlp notebook at spacy-notebooks
Hello! I've been trying to aggregate all the spaCy tutorials in the form of jupyter notebooks, when I came across your repo in the tutorial section. The notebook was very useful, and I wanted to know if it's ok if I add it to this repo.
It'll be credited (and linked) appropriately, obviously.
Thanks again for the notebook and your help!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.