Hi, Krish! I'm new to python and while running the code 'random_search.best_estimator_'; I'm getting "nan" as the value of missing and hence showing an error after copying it in the next step. can u help me with it?

This is my whole code:`import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

data = pd.read_csv("diabetes.csv")

data.shape

data.head(5)

#check if any null value is present
data.isnull().values.any()

Correlation

import seaborn as sns
import matplotlib.pyplot as plt

get correlations of each feature in dataset

corrmat = data.corr()
top_corr_features = corrmat.index
plt.figure(figsize = (20,20))

plot heatmap

g = sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")

data.corr()

data.head(5)

diabetes_true_count = len(data.loc[data['Outcome']==True])
diabetes_false_count = len(data.loc[data['Outcome']==False])

(diabetes_true_count,diabetes_false_count) # Train test split
from sklearn.model_selection import train_test_split
feature_columns = ['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']
predicted_class = ['Outcome']

X=data[feature_columns].values
y=data[predicted_class].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=10)

Check how many other missing (zero) values

print("total no. of rows : {0}".format(len(data)))
print("no. of missing rows Pregnancies : {0}".format(len(data.loc[data['Pregnancies'] == 0])))
print("no. of missing rows Glucose :{0}".format(len(data.loc[data['Glucose'] == 0])))
print("no. of missing rows BloodPressure : {0}".format(len(data.loc[data['BloodPressure'] == 0])))
print("no. of missing rows SkinThickness : {0}".format(len(data.loc[data['SkinThickness'] == 0])))
print("no. of missing rows Insulin : {0}".format(len(data.loc[data['Insulin'] == 0])))
print("no. of missing rows BMI : {0}".format(len(data.loc[data['BMI'] == 0])))
print("no. of missing rows DiabetesPedigreeFunction : {0}".format(data.loc[data['DiabetesPedigreeFunction'] == 0]))
print("no. of missing rows Age : {0}".format(len(data.loc[data['Age'] == 0])))

import numpy as np
from sklearn.impute import SimpleImputer

np.isnan(X_test)

fill_values = SimpleImputer(missing_values=0, strategy='mean')
X_train = fill_values.fit_transform(X_train)
X_test = fill_values.fit_transform(X_test)

random_forest_model.fit(X_train, y_train.ravel())

predict_train_data = random_forest_model.predict(X_test)

from sklearn import metrics

print("Accuracy = {0:.3f}".format(metrics.accuracy_score(y_test, predict_train_data)))

Hyper Parameter Optimization

params={
"learning_rate" : [0.05, 0.10, 0.15, 0.20, 0.25, 0.30],
"max_depth" : [ 3, 4, 5, 6, 8, 10, 12, 15 ],
"min_child_weight" : [ 1, 3, 5, 7 ],
"gamma" : [ 0.0, 0.1, 0.2, 0.3, 0.4 ],
"colsample_by_tree" : [ 0.3, 0.4, 0.5, 0.7 ]
}

Hyperparameter optimization using RandomizedSearchCV

from sklearn.model_selection import RandomizedSearchCV
import xgboost

use_label_encoder=False
classifier = xgboost.XGBClassifier()

random_search = RandomizedSearchCV(classifier,param_distributions = params,n_iter=5,scoring='roc_auc',n_jobs=-1, cv=5,verbose=3)

def timer(start_timer=None):
if not start_time:
start_time = datetime.now()
return start_time
elif start_time:
thour, temp_sec = divmod((datetime.now() - start_time).total_seconds(), 3600)
tmin, tsec = divmod(temp_sec,60)
print('\n Time taken: %i hours %i minutes %s seconds.' %(thour, tmin, round(tsec,2)))

from datetime import datetime

start_time = None #timing starts from this point for "start_time" variable
random_search.fit(X,y.ravel())
start_time #timing ends here for "start_time" variable

random_search.best_estimator_`
this is the dataset, I've used for my project:
diabetes.csv

krishnaik06 / diabetes-prediction Goto Github PK

diabetes-prediction's Introduction

Diabetes-Prediction

diabetes-prediction's People

Contributors

Stargazers

Watchers

Forkers

diabetes-prediction's Issues

Hi, Krish! I'm new to python and while running the code 'random_search.best_estimator_'; I'm getting "nan" as the value of missing and hence showing an error after copying it in the next step. can u help me with it?

Correlation

get correlations of each feature in dataset

plot heatmap

Check how many other missing (zero) values

Hyper Parameter Optimization

Hyperparameter optimization using RandomizedSearchCV

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent