Light

learn-co-curriculum / dsc-chi-warmup-log-reg-metrics Goto Github PK

View Code? Open in Web Editor NEW

0.0 24.0 13.0 27 KB

Warmup for after the logistic regression and metrics lectures in V2.1

Python 62.75% Jupyter Notebook 37.25%

dsc-chi-warmup-log-reg-metrics's Introduction

Do you even compare the metrics of your models bro

#run as-is

import pandas as pd

from sklearn.datasets import make_classification

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split

data = make_classification(n_samples=10000, random_state=666, n_informative=6)

X = pd.DataFrame(data[0])
y = data[1]

data = X.copy()
data['target'] = y

How many features in `data`? How many classes? Is there a class imbalance?

#your work here

Train-test split (`random_state` = 666) and standard scale all features

Why do we standardize after the train test split, and not before?
Why do we scale the training data separately from the testing data?

#your work here

Create a logistic regression model with the first three features of the training data (with no regularization)

#your work here

Get predictions for this 3-feature model for the training data

Assign them to train_preds_3

#your work here

Get predictions for this 3-feature model for the testing data

Assign them to test_preds_3

#your work here

Generate two confusion matrices, one each for the training predictions and testing predictions

#your work here

Calculate the accuracy, recall, and precision for the training predictions

Calculate the accuracy, recall, and precision for the testing predictions

#your work here

Is the model over- or under-fitting? How can you tell?

Is bias or variance more of a problem with this model?

#your work here

Run models with the first 10 variables, then another model with all the variables

Generate confusion matrices and calculate accuracy, precision and recall as you did above
BONUS: use functions to do so!

How is the problem you diagnosed in the 3-variable model altered in the 10-variable and 20-variable models?

What new problems crop up?

#your work here

dsc-chi-warmup-log-reg-metrics's People

Contributors

Watchers

Forkers

jwong853 chum46 dannmorr mesterhammerfic aspotter99 johnhoystephens luluvalakdawala jesusbaquiax benjmccarty iansharff hs1692 douglasglu rivents

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.