BioML: Machine Learning for Biomedical Data (Georgetown School of Medicine Graduate Course)
This course covers practical and conceptual aspects of machine learning in application to high-throughput biomedical data using Python. Throughout the course, students will get an understanding of opportunities and limitations of machine learning in the context of pre-clinical and clinical research. The course is designed as a combination of online resources, practical assignments and live sessions that will be conducted online. Throughout the course, we will review several project examples that demonstrate successes and limitations of conventional machine learning (ML) methods and deep learning (DL) using data from public repositories. As a result of completing this course, each participant should be able to differentiate between various methods, apply the correct method to a data or problem statement and develop a completed project using ML or DL.
The readings will be in the form of relevant publications that will be provided. All additional content, including coding practice for learning will be available through the Omics Logic portal: learn.omicslogic.com
- OmicsLogic Learn portal: https://learn.omicslogic.com
- T-BioInfo server for Big Data Analysis: https://server.t-bio.info
- Recommended reading: Deep Learning in Omics Data Analysis and Precision Medicine (book - https://www.ncbi.nlm.nih.gov/books/NBK550335/)
- Overview of Machine Learning Part 1: Fundamentals and Classic Approaches https://www.sciencedirect.com/science/article/pii/S1052514920300629?via%3Dihub
SYLLABUS SYSM-578 (https://systemsmedicine.georgetown.edu)
- Introduction to the course: objectives and outcomes (refer to https://learn.omicslogic.com)
- Data Processing and Exploratory Analysis (https://learn.omicslogic.com/courses/course/course-7-bioml-machine-learning-for-biomedical-data)
- Machine Learning Methods: unsupervised and supervised types of the analysis
- Dimensionality Reduction: Ordination and Embedding
- Unsupervised Learning: Clustering
- Supervised Learning: Discriminant Analysis and Classification
- Explainable AI: Feature selection
- Classification vs. Regression
- Generalized Linear Models: an introduction to Deep Learning
- Network analysis: neighborhoods, manifold, and regression
- Deep Learning: Multi-layer Perceptron (MLP), Network Topography, Activation Function
- Model Accuracy and Validation: Cross Validation, Randomized and Grid Search for Hyperparameter Optimization
- Project Examples and Case Studies (https://learn.omicslogic.com/courses)
- How to design your data science project (https://learn.omicslogic.com/courses/course/course-9-designing-a-bioinformatics-research-project)
- Project Submissions and Final Exam (https://learn.omicslogic.com/projects)
- T-test, F-test, chi-square, ANOVA and Regression
- PCA, tSNE, LDA, Clustering (hierarchical, k-means, DBscan, Fuzzy, PAM)
- Classification: Decision Trees, Random Forest, Support Vector Machine, Naive Bayes
- Feature Selection Strategies (Feature Significance & Greedy Methods)
- Deep Learning: Deep Feedforward Neural Network (DFNN), Convolutional Neural Network (CNN) and other implementations for time-series data.
Loading data from csv, txt, or xlsx sources and converting it to various data structures (dataframe, matrix, lists and vectors) Summarizing categorical and continuous datasets Data preparation using log-normal transformation and quantile normalization Statistical tests and outputs (p-value, t-value, standard error, FDR, logFC) Popular packages like pandas, numpy, and sklearn Visualization using matplotlib, seaborn and plotly Reading, understanding and loading code examples General Coding & Data Sharing Practices: Organizing your scripts with comments and functions (syntax) Setting up a development environment (IDE) Dealing with errors and troubleshooting code (debugging) Preparing data summaries and submitting curated data and meta-data tables to sharing repositories (FAIR principles) Sharing your analysis in jupyter notebooks, on github or google colab Creating interactive visualization in plotly
The course is available for those who are just getting started and does not require in-depth knowledge of programming or machine learning. Some background in the basics of molecular biology preferred introduction to bioinformatics. Please complete the following free tutorials to help you get a head start: Bytes and Molecules (https://learn.omicslogic.com/courses/course/course-1-bytes-and-molecules) Getting Started with Bioinformatics in Python (https://learn.omicslogic.com/courses/course/getting-started-with-bioinformatics-in-python)
Understanding of analytical methods for processing, visualization, and analysis of complex biomedical data Learning terminology for machine learning and artificial intelligence in biomedical discovery Becoming familiar with project examples where ML was used effectively to achieve meaningful results Hands-on practice in application of standard unsupervised and supervised learning methods to various types of data, such as genomic, transcriptomic, metagenomics, imaging, and clinical Understand the ML taxonomy and the commonly used machine learning algorithms for analyzing “omics” data Understand differences between ML algorithms categories and to which kind of problem they can be applied to Understand different applications of ML in application to different -omics studies and project design objectives Use popular Python packages for data visualization, analysis and ML Interpret and visualize the results obtained from ML analyses on omics datasets Apply the ML techniques to analyze public domain or their own datasets