Giter VIP home page Giter VIP logo

2020spring_stat3302_final_project's Introduction

STAT3302_Final_Project (Due Friday, 4/17)

Group members:

Description

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Variable description:

  • Response:

    • Survived (contains your binary predictions: 1 for survived, 0 for deceased)
  • Other variables

    • Continuous varaibles: Age, Fare
    • Discrete variables: Sex, SibSp, pclass, Embarked
    • Binary variable: Sex, Survived
    • Factor variables: pclass, Sex, Embarked
Variable Description
PassengerId (sorted in any order)
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

Note: There are total 11 columns, excluded the Survived as response. However, not all the varaible are useful for prediction, some of them just use for identification purpose, such as PassengerId, Name and ticket. Thus, only the following varaibles will be consider for our suvival prediction purpose: Pclass, Sex, Age, SibSp, Parch, Fare, Embarked.

Scientific Question:

What sorts of people were more likely to survive the Titanic sinking?

TO-DO:

  1. What covariates are useful for answering our question? (Plot the graph and give an graphical summary)
  2. Model comparison (Which model has the best "Performance"?)
  3. Does the models fit well? Explain why?
  4. EDA(Exploratory Data Analysis)
  5. Processing data: 1) Clean out missing value.
  • Option:
    • Cross validation
    • Train, Tune and Ensemble machine learning models

DONE:

  1. Decide the dataset for project?
    • Analysis must involves Logistic/Poisson regression model.
    • Should have enough covariates to make an intersting statistical analysis(Q: What does enought means?)
  2. The scientific questions to ask?

Checking list: (Based on Project description)

  1. Make sure that you write down a scientific question for your group to answer in your project. Make sure you answer the question using the results of your statistical analysis.
  2. Select a dataset with enough cases so you can adequately estimate the different effects in your model. Your dataset should have enough covariates to make for an interesting statistical analysis. Make sure to investigate for possible interaction effects.
  3. Exploratory data analysis, as well as model building, model selection, and diagnostics must be part of your statistical analysis.
  4. You should take time to explain the interpretation of your statistical model and to answer the original scientific question.
  5. You may want to include some discussion of your results and ideas for further analysis at the end of the report. If you have references, format them appropriately.
  6. You may include R code in an appendix (not counted in the 5–6 page limit), but no R code or R summaries can be included in the main report. For example, present your results in tables and make sure to discuss your results in the text of the report.
  7. For the report use a 12pt font with 1.5 spacing (like this document!)
  8. Make sure that all students contribute equally in your group. Each student in a group will be assigned the same grade.

Reference

2020spring_stat3302_final_project's People

Contributors

drago1234 avatar shuhan111 avatar wyx34 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.