- To familiarize with the implementation of a machine learning algorithm from scratch, without the usage of any machine learning API.
- To apply KNN algorithm to classify the age of abalone using the Abalone dataset.
- To familiarize with evaluating the performance of a machine learning algorithm
There are 4,177 data observations in the dataset with 8 input attributes and 1 output variable. The input attributes are as follows:
- Sex [Male (M), Female (F), or Infant (I)]
- Length
- Diameter
- Height
- Whole weight
- Shucked weight
- Viscera weight
- Shell weight
- Rings (output)
- Use the function
loadData()
to load data from file. The commandX = loadData(‘abalone.data’)
returns an array of size . In this function, the values of the first attribute have been converted into floats:
- M: 0.333
- F: 0.666
- I: 1.000
-
Normalize the dataset. You are to normalize the 8 input attributes by writing a function,
dataNorm()
. The normalization equation is given as:(data-min)/(maxmin)
-
Split the dataset into training and testing set by: (i) Using the train-and-test split method. (ii) Using the -fold cross-validation method. Note that the k -value here is different from the
K-value
in the KNN algorithm. Set thek
value to 5, 10 and 15 respectively. -
Implement the KNN algorithm by writing a function,
KNN()
. You can use the Euclidean distance as the similarity measure for any two samples. -
Use the
classification_report()
function provided by the scikit-learn library to construct aclassification report for the 5-fold cross validation with K = 15