Correctly identifying driver mutations in a patient’s tumor is a major challenge in precision oncology. We aim to implement logistic regression, KNN and SVM to classify the transcriptional activity of mutant p53 genes.
Notes: Data should be stored in the same repo as the source code.
- Before running all the code, clean the data first. Run "clean_data.py" and get "cleaned_K8.csv". Then all the methods are performed on "cleaned_K8.csv"
- Logistic regression has following files: Logistic Regression_imb.ipynb, Logistic Regression_SMOTE.ipynb, Logistic Regression_SMOTE_underSampling.ipynb
- K-nearest neighbors has following files: ML KNN - update.ipynb
- Support Vector Machine has following files: "imbalanced_svm.ipynb"