#run as-is
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
data = make_classification(n_samples=10000, random_state=666, n_informative=6)
X = pd.DataFrame(data[0])
y = data[1]
data = X.copy()
data['target'] = y
#your work here
-
Why do we standardize after the train test split, and not before?
-
Why do we scale the training data separately from the testing data?
#your work here
Create a logistic regression model with the first three features of the training data (with no regularization)
#your work here
- Assign them to
train_preds_3
#your work here
- Assign them to
test_preds_3
#your work here
#your work here
#your work here
#your work here
- Generate confusion matrices and calculate accuracy, precision and recall as you did above
- BONUS: use functions to do so!
How is the problem you diagnosed in the 3-variable model altered in the 10-variable and 20-variable models?
#your work here