Creates a VM using for development of data science tools. Installs Zeppelin for notebooks with Spark/Scala, Jupyter notebooks using pySpark and Python 3.
Clone the repo.
Starting the VM:
$ vagrant up
$ vagrant ssh
For Zeppelin notebooks
$ zeppelin/bin/zeppelin-daemon.sh start
For Jupyter notebooks (http://localhost:8888/):
$ jupyter notebook
For Jupyter notebooks with Pyspark (http://localhost:8888/):
$ pyspark
Launching TensorBoard (http://localhost:6006/):
$ tensorboard --logdir path/to/tf_logs/
Launching a Bokeh server application (http://localhost:5006/):
$ bokeh serve path/to/application/
Package | Link | Usage |
---|---|---|
cython | cython | Python C extensions |
jupyter | Jupyter | Python/Pyspark Notebooks |
numpy | NumPy | Data Manipulation |
pandas | Pandas | Data Manipulation |
pretty-pandas | PrettyPandas | Data Display |
pandas-profiling | pandas-profiling | Data Description |
dora | Dora | Data Exploration |
scikit-learn | Scikit-learn | Machine Learning |
annoy | annoy | Efficient k-NN |
imblearn | imbalanced-learn | Class Resampling |
pomegranate | pomegranate | Probabilistic Models |
edward | Edward | Probabilistic Models |
modal | modAL | Active Learning |
snorkel | Snorkel | Data Programming |
tdigest | tdigest | Online Quantiles |
keras | Keras | Deep Learning |
tensorflow | TensorFlow/TensorBoard | Deep Learning |
darkon | DARKON | Deep Learning Hacking |
pytorch | pyTorch | Deep Learning |
dynet | dynet | Deep Learning |
scikit-surprise | scikit-surprise | Association Learning |
bokeh | Bokeh/Bokeh Server | Visualization |
matplotlib | Matplotlib | Visualization |
folium | Folium | Visualization |
seaborn | seaborn | Visualization |
colorcet | ColorCet | Visualization |
wordcloud | WordCloud | Visualization (NLP) |
gensim | gensim | NLP |
spacy | spaCy | NLP |
textacy | textacy | NLP |
nltk | NLTK | NLP |
textblob | TextBlob | NLP |
stop_words | stop_words | NLP |
langid | LangID | NLP |
bs4 | Beautiful Soup | Text Extraction |
statsmodels | StatsModels | Statistics |
pymc3 | PyMC3 | Bayesian Statistics |
pystan | PyStan | Bayesian Statistics |
prophet | Prophet | Time Series Prediction |
networkx | NetworkX | Network Analysis |
tweepy | tweepy | Twitter API tool |
sympy | sympy | Symbolic Mathematics |
autograd | autograd | Automatic Differentiation |
sacred | sacred | Experimentation |
elasticsearch | elasticsearch | ES Connector |
prospector | prospector | Static Code Analysis |
optml | OptML | Hyperparameter Optimization |
ftfy | ftfy | Unicode Cleaning |
beautifier | beautifier | Cleans URLs |
scrubadub | scrubadub | Data Anonymization |
bandits | bandits | Multi-Armed Bandits |
datasketch | datasketch | Text Search |
- If notebooks are not running properly:
sudo chown -R vagrant: ~/.local/share/jupyter