Giter VIP home page Giter VIP logo

qmss2017's Introduction

Python Workshop Materials

This repo contains a series of introductory tutorials to the Python programming language. The materials were originally written by Sam Zorowitz for an Intro to Python Workshop for the Columbia Quantitative Methods in the Social Sciences (QMSS) Masters Program in Summer, 2017. The tutorials are broken into six sequential modules and cover the following topics:

# Topic Description Libraries Covered
1 Intro to Python & Numpy An overview of the Python language, including basic types, control flow, defining functions, and basic scripting. Also introduces the NumPy library and its core functions. numpy
2 DataFrames, Statistics, and Visualization Introduces the Pandas library for generating, manipulating, and saving DataFrames. Basic statistical functions with Scipy and Statsmodels are also covered. Examples of data visualization with Matplotlib and Seaborn are provided. scipy, pandas, statsmodels, matplotlib, seaborn
3 Machine Learning An overview of machine learning in Python with the Scikit-Learn library. Topics covered include preprocessing and standardizing data; unsupervised learning (PCA, K-means, agglomerative clustering); supervised learning (linear models, SVMs, decision trees, random forests, and neural networks); and cross-validation. scikit-learn
4 Text Processing The Natural Language Toolkit (NLTK) library is introduced. Steps in processing text are described, including encoding, tokenizing, word-stopping, stemming/lemmatizing, and spellchecking. Machine learning models useful for text analysis are also discussed using Scikit-Learn (naive Bayes classifiers, Latent Dirichlet Allocation model). nltk, scikit-learn
5 API Wrappers & Webscraping Python wrappers for several major APIs (Facebook, Twitter, Reddit) are discussed. Links to Python wrappers other major APIs are provided but not further discussed. Webcrawling and webscraping with BeautifulSoup are also discussed. facebook-sdk, twython, praw, beautifulsoup4, scrapy
6 Network Analysis A brief overview of network analysis with Python using the Networkx library. The basics of generating, analyzing, and visualizing graphs are introduced. networkx

Contents

All modules are broken into three parts: (1) notes, which provide overviews and use cases for the python libraries covered; (2) exercises, which provide example problems for learners to work on; and (3) solutions, which provide solutions to the exercises.

To further motivate the topics, many of the modules include and make use of real datasets collected from various online repositories. These are detailed below and in their respective folders.

Prerequisites

The tutorials are all written in Jupyter-Notebooks as part of the Anaconda python v3.6 distribution. Please visit this page to download and install the Anaconda distribution (python version 3.6).

References

Essential Texts

The following references were instrumental in writing these tutorials and provided tremendous insight in introducing topics, structuring walkthroughs, and describing functions. They are exceptional guides to their respective topics and are cited repeatedly. I extend my sincere thanks to the authors.

Kevin Sheppard. Introduction to Python for Econometrics, Statistics, and Data Analysis, 3rd Edition. https://www.kevinsheppard.com/Python_for_Econometrics.

Andreas Muller & Sarah Guido. Introduction to Machine Learning with Python: A Guide for Data Scientists. http://shop.oreilly.com/product/0636920030515.do.

Steven Bird, Ewan Klein, Edward Loper. Natural Language Processing with Python. http://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf.

Datasets

Gambling Dataset (Module 2)

Stroop Dataset (Module 2)

Iris Dataset (Module 3)

Diabetes Dataset (Module 3)

Phenotype Dataset (Module 3)

Wine Dataset (Module 3)

NSF Abstracts Dataset (Module 4)

Amazon Food Reviews (Module 4)

Les Miserables Network Dataset (Module 6)

qmss2017's People

Contributors

szorowi1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.