Giter VIP home page Giter VIP logo

data-analysis-visualization's Introduction

DATA-ANALYSIS-VISUALIZATION

(Titanic Data set)

by (Hassan Ali)

Dataset

the data set we used for this project is titanic data set which provide information about each persons like age,Sex,did he/she survived or not source

Feature information

Survival:0 = didn't survive, 1 = survived
pclass:Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex:the gender of the passnger (male or female)
Age:Age of each passnger
sibsp: number of siblings / spouses aboard the Titanic
parch: number of parents / children aboard the Titanic
ticket:Ticket number
fare:Passenger fare
cabin:Cabin number
embarked:Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

further information taken from the source
pclass: A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children traveled only with a nanny, therefore parch=0 for them.

Data wrangling

the source of data provided us with three CSV file (train,test,gender_submission)

  • we assigned the gender_submission to the test data to make the number of the column the same as the train set
  • we merged the train set and test set(after adding the new column )
  • We checked for NaN value in each column and after that, we decided to drop the cabin column because it contains the most NaN value
  • we also dropped the raws with Age==NaN and Embarked=NaN

FEATURE ENCODING

the Survived column we encoded

  • 0-didn't survive
  • 1-did survive

FEATURE ENGINEERING

  • we created a new column called the Family
  • which is the sum of SibSp + Parch and it will indicate to the family size that was on the ship
  • when we create the new column we will add 1 to give indicate to that passenger is only one from his family on this ship
    and if it 2 that means he and one of his family members are on the ship and the same goes from 3 or 4,...

after all of that the data we used for the exploration part contains (1043 row, 12 columns)
we did merge the train and test and used the finale merged data frame as the final data

Summary of Findings

  • we can say that most survived gender is female
  • we can say that most most gender that did not survive is male
  • Furthermore, half of our data were men and did not survive with percentage =54%
  • the most common port for male and female passenger to embark from was S-Southampton
  • the most common port that people emparked from did survive and did not was S-Southampton
  • half of the passenger were men that embarked from the S-Southampton port with 50% percentage from our data set
  • the most gender of the passengers that didn't survive and embarked form S-Southampton port were male with a 43% percentage form all data set
  • the most gender of the passengers that did survive and embarked form S-Southampton port was female with a 20% percentage form all data set
  • the most family size that survived and didn't survive for both the female and male passenger was = 1 meaning by that the most affected people by survived and didn't survive rate was the solo passenger

Key Insights for Presentation

we started the presentation with a plot to tell us what is the dominated gender in our data set, after that, we investigate what is the most survived and didn't survive gender in our data set. then we investigated what is the port that most passenger that embarked from it from and after that we investigate what is the common (age, family size, gender) for the passenger that emparked from that port

data-analysis-visualization's People

Contributors

hassanali-svg avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.