Giter VIP home page Giter VIP logo

dat6's Introduction

DAT6 Course Repository

Course materials for General Assembly's Data Science course in Washington, DC (2/21/14 - 5/2/15). View student work in the student repository.

Instructors: Sinan Ozdemir and Josiah Davis.

Office hours: 5-7pm on Tuesday and 5-7pm on Saturday at General Assembly

Course Project information

Saturday Topic Project Milestone
2/21: Introduction / Pandas
2/28: Git(hub) / Getting Data
3/7: Advanced Pandas / Machine Learning One Page Write-up with Data
ฯ€ == ฯ„/2 day Model Evaluation / Logistic Regression
3/21: Linear Regression 2-3 Minute Presentation
3/28: Data Problem / Clustering and Visualization
4/2 Naive Bayes / Natural Language Processing Deadline for Topic Changes
4/11 Decision Trees / Ensembles First Draft Due (Peer Review)
4/18 PCA / Databases / MapReduce
4/25 Recommendation Engines
5/2 Project Presentations Presentation

Installation and Setup

  • Install the Anaconda distribution of Python 2.7x.
  • Install Git and create a GitHub account.
  • Once you receive an email invitation from Slack, join our "DAT6 team" and add your photo!

Class 1: Introduction / Pandas

Agenda:

  • Introduction to General Assembly
  • Course overview: our philosophy and expectations (slides)
  • Data science overview (slides)
  • Data Analysis in Python (code)
  • Tools: check for proper setup of Anaconda, overview of Slack

Homework:

Optional:

  • Review your base python (code)

Class 2: Git(hub) and Getting Data

Agenda:

Homework:

Resources:


Class 3: Advanced Pandas and Machine Learning

Agenda:

Homework:

  • Complete the advanced Pandas homework (Submit on the Dat6-students repo via a pull request)
  • Continue to develop your project. If you have a dataset, explore it with pandas. If you don't have a dataset yet, you should prioritize getting the data.(Nothing to turn in for next week).
  • Read this excellent article, Understanding the Bias-Variance Tradeoff, and be prepared to discuss it next class. (You can ignore sections 4.2 and 4.3.) Here are some questions to think about while you read:
    • In the Party Registration example, what are the features? What is the response? Is this a regression or classification problem?
    • In the interactive visualization, try using different values for K across different sets of training data. What value of K do you think is "best"? How do you define "best"?
    • In the visualization, what do the lighter colors versus the darker colors mean? How is the darkness calculated?
    • How does the choice of K affect model bias? How about variance?
    • As you experiment with K and generate new training data, how can you "see" high versus low variance? How can you "see" high versus low bias?
    • Why should we care about variance at all? Shouldn't we just minimize bias and ignore variance?
    • Does a high value for K cause over-fitting or under-fitting?

Resources:


Class 4: Model Evaluation and Logistic Regression

Agenda:

dat6's People

Contributors

josiahdavis avatar sinanuozdemir avatar

Watchers

Sung Kim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.