Giter VIP home page Giter VIP logo

2016_fordham_python_seminar's Introduction

Overview

This is the repository for a Python seminar that was taught to students of a few different programs at the Fordham Graduate School of Business in October of 2016.

The bulk of our time was spent working through a variety of technciques for data maniuplation in Pandas. Our coursework was broken up over four sessions in roughly the following way:

1) Python fundamentals
2) Python practice & intro to Pandas
3) Deeper into Pandas
4) Heavier data engineering in Pandas, with some light modeling at the end

This code was written using Python 3.5, with Pandas as a heavy dependency. Much of this code will run in 2.7, but there will be some issues.

This is a work in progress that I will continue to update (more detail on that below).

Repository contents

stock_csvs/ contains 500 csv files, each with daily stock price data for one of the 500 stocks in the S&P 500, for the time period ranging Jan 1998 - August 2013. Presumably, this data corresponds to the 500 stocks in the S&P 500 as of August 9th, 2013, where this data set ends. This was a free data set obtained from https://quantquote.com/historical-stock-data.

ppts/ contains the powerpoint presentations delivered in class. Though these presentations generally did a pretty mediocre job of predicting what would take place in subsequent classes, they did stay pretty close to whatever was covered in their respective sessions. I plan on eventually moving all of the powerpoint notes into Jupyter notebooks.

text_example/ contains a Jupyter notebook that walks through a practice example: searching for names in a text file using built-in Python data structures. This notebook contains detailed notes around what is being done and why - a great place to start for someone interested in learning or refreshing some Python fundamentals.

output/ contains output from some of the programs run.

code/ is where the bulk of the code lives:

      stock_data.py: functions used for aggregating the 500 csv's into a workable format

      pandas_basics.ipynb: provides a first look at DataFrames and columnar access

      pandas_more.ipynb: takes things a little further, introducing heirarchical columns

      correlation_fun.ipynb: we look at stocks whose returns were most correlated to other S&P 500 members' returns

      build_table.ipynb: we compile a daily table with derived columns

Additionally, in code/:

      basic_examples.ipynb: provides some very intro-to-Python style activities

      additional_exercises/: on-the-fly exercises from class, with & without solutions

      python_demo.py: a less polished & heavily commented stock_data; a first look at what Python can do

What this repository does not contain

Data analysis & predictive modeling

While much of the work we were doing might full under the umbrella of data analysis (and to be fair I would consider the correlation exercise a useful piece of analysis), we did not get into real exploratory data analysis, which would include things like:

      - handling missing data

      - understanding how stock price returns are distributed

      - understanding how daily stock price returns correlate to volume and intraday movement

      - understanding how daily stock price returns relate to monthly returns

I might sum up these missing analysis pieces by saying that we did a whole lot of doing and not a whole lot of understanding, which is fine given the fact that this was a) at heart a Python seminar and not a data analysis seminar and b) a very time-constrained seminar. And with that in mind, I thought it best to hold off on posting any modeling work that is not accompanied by a real exploratory analysis the data. I am working through producing these materials but at this point in time they are not quite finished.

Additionally, while a textual overview of the relational data concepts of joins and group by's is covered in the ppts directory, I think it would be useful to have a dedicated notebook covering these topics in here.

Advanced, non-Pandas Python

Our focus was the Pandas library, as it is highly relevant to the QF and GF coursework. We touched on basic Python too, as that's necessary for being able to properly use Pandas. That said, there is a lot of other Python functionality that we didn't cover (or at least didn't thoroughly cover), including:

      - generators

      - the collections library

      - the NumPy library

I would like to at some point in the near future provide you with at least some kind of reference for using these.

Computer science

We did not really get into any computer science, so to speak - this was a look at how to get started using this language. But understanding computer science - e.g. object-oriented programming, data structures & algorithms - is phenomenally interesting, and it will make you much more proficient as a programmer, even if you are just hacking quick solutions together. If you enjoyed what we worked on I would highly recommend you take the time to investigate Python through a more computer- science-tinted lens. interactivepython.org would be a great place to start.

Thank you again for your participation in the seminar!

2016_fordham_python_seminar's People

Contributors

lermana avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.