Giter VIP home page Giter VIP logo

diabetes-prediction's Introduction

Diabetes-Prediction

diabetes-prediction's People

Contributors

krishnaik06 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

diabetes-prediction's Issues

Hi, Krish! I'm new to python and while running the code 'random_search.best_estimator_'; I'm getting "nan" as the value of missing and hence showing an error after copying it in the next step. can u help me with it?

This is my whole code:`import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

data = pd.read_csv("diabetes.csv")

data.shape

data.head(5)

#check if any null value is present
data.isnull().values.any()

Correlation

import seaborn as sns
import matplotlib.pyplot as plt

get correlations of each feature in dataset

corrmat = data.corr()
top_corr_features = corrmat.index
plt.figure(figsize = (20,20))

plot heatmap

g = sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")

data.corr()

data.head(5)

diabetes_true_count = len(data.loc[data['Outcome']==True])
diabetes_false_count = len(data.loc[data['Outcome']==False])

(diabetes_true_count,diabetes_false_count) # Train test split
from sklearn.model_selection import train_test_split
feature_columns = ['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']
predicted_class = ['Outcome']

X=data[feature_columns].values
y=data[predicted_class].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=10)

Check how many other missing (zero) values

print("total no. of rows : {0}".format(len(data)))
print("no. of missing rows Pregnancies : {0}".format(len(data.loc[data['Pregnancies'] == 0])))
print("no. of missing rows Glucose :{0}".format(len(data.loc[data['Glucose'] == 0])))
print("no. of missing rows BloodPressure : {0}".format(len(data.loc[data['BloodPressure'] == 0])))
print("no. of missing rows SkinThickness : {0}".format(len(data.loc[data['SkinThickness'] == 0])))
print("no. of missing rows Insulin : {0}".format(len(data.loc[data['Insulin'] == 0])))
print("no. of missing rows BMI : {0}".format(len(data.loc[data['BMI'] == 0])))
print("no. of missing rows DiabetesPedigreeFunction : {0}".format(data.loc[data['DiabetesPedigreeFunction'] == 0]))
print("no. of missing rows Age : {0}".format(len(data.loc[data['Age'] == 0])))

import numpy as np
from sklearn.impute import SimpleImputer

np.isnan(X_test)

fill_values = SimpleImputer(missing_values=0, strategy='mean')
X_train = fill_values.fit_transform(X_train)
X_test = fill_values.fit_transform(X_test)

random_forest_model.fit(X_train, y_train.ravel())

predict_train_data = random_forest_model.predict(X_test)

from sklearn import metrics

print("Accuracy = {0:.3f}".format(metrics.accuracy_score(y_test, predict_train_data)))

Hyper Parameter Optimization

params={
"learning_rate" : [0.05, 0.10, 0.15, 0.20, 0.25, 0.30],
"max_depth" : [ 3, 4, 5, 6, 8, 10, 12, 15 ],
"min_child_weight" : [ 1, 3, 5, 7 ],
"gamma" : [ 0.0, 0.1, 0.2, 0.3, 0.4 ],
"colsample_by_tree" : [ 0.3, 0.4, 0.5, 0.7 ]
}

Hyperparameter optimization using RandomizedSearchCV

from sklearn.model_selection import RandomizedSearchCV
import xgboost

use_label_encoder=False
classifier = xgboost.XGBClassifier()

random_search = RandomizedSearchCV(classifier,param_distributions = params,n_iter=5,scoring='roc_auc',n_jobs=-1, cv=5,verbose=3)

def timer(start_timer=None):
if not start_time:
start_time = datetime.now()
return start_time
elif start_time:
thour, temp_sec = divmod((datetime.now() - start_time).total_seconds(), 3600)
tmin, tsec = divmod(temp_sec,60)
print('\n Time taken: %i hours %i minutes %s seconds.' %(thour, tmin, round(tsec,2)))

from datetime import datetime

start_time = None #timing starts from this point for "start_time" variable
random_search.fit(X,y.ravel())
start_time #timing ends here for "start_time" variable

random_search.best_estimator_`
this is the dataset, I've used for my project:
diabetes.csv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.