- Data Pre-processing on Isurance.csv dataset(Data_Mining.ipynb).
- Handling missing values in dataset(Missing_values.ipynb).
- Binning and scaling in dataset.
- Insurance.csv is the datset for pre-processing.
- Wholesale customers data
- Pima-indians-diabetes.data.csv (https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv")
- KNN (Lazy learning and Non Parametric Learning)
- HMM (Hidden Markov Model and Viterbi)
- K_Means Clustering (Hierarchial CLustering)
- Random Forest Classifier
- Accuracy
- Confusion Matrix
- Precision
- recall (Sensitivity/ True Positive Rate)
- f1 score
- ROC curve
- Precision -recall curve
- Average Precision
Internal goodness metrices:- Internal metrics do not use any external information and assess the goodness of clusters based only on the initial data.
External metrics:- They use the information about the known true split.
- Adjusted Rand Index(ARI):-
- Adjusted Mutual Information(AMI)
- Homogenity, Completeness, V-Measure
- Silhouette