▪In this project, we delved into a dataset encapsulating various health metric from cancer patients.
The dataset consists of training dataset was 1514 rows & 16340 columns which involving 16340 features and 1514 samples.
▪Here, the feature and sample ratio was significantly high, and overfitting was probable. Hence, feature selection was preferable to achieve a reliable ML model.
▪Moreover, there are three unique values including ‘breast invasive carcinoma’, ‘lung squamous cell carcinoma’ and, ‘lung adenocarcinoma’ in the primary disease.