Giter VIP home page Giter VIP logo

alzheimers-prediction's Introduction

Alzheimer's Diagnosis

Data Science Project by Grant Gasser under advisement of Dr. Joshua Patrick

ADNI Picture

Table of Contents

  1. Summary of Alzheimer's
  2. Project Motivation
  3. First Data Set
  4. Prediction Models, First Data Set

Summary of Alzheimer's Disease

Alzheimer's disease (AD) is a progressive neurodegenerative disease. Though best known for its role in declining memory function, symptoms also include: difficulty thinking and reasoning, making judgements and decisions, and planning and performing familiar tasks. It may also cause alterations in personality and behavior. The cause of AD is not well understood. There is thought to be a significant hereditary component. For example, a variation of the APOE gene, APOE e4, increases risk of Alzheimer's disease. Pathologically, AD is associated with amyloid beta plaques and neurofibrillary tangles.

Diagnosis

Onset of the disease is slow and early symptoms are often dismissed as normal signs of aging. A diagnosis is typically given based on history of illness, cognitive tests, medical imaging, and blood tests.

Treatment

There is no medication that stops or reverses the progression of AD. There are two types of drugs that attempt to treat the cognitive symptoms:

  • Acetylcholinesterase Inhibitors that work to prevent the breakdown of acetylcholine, a neurotransmitter critical in memory and cognition.
  • Memantine (Namenda), which works to inhibit NMDA receptors in the brain.

These medications can slightly slow down the progression of the disease.

Prevention

It is thought that frequent mental and physical exercise may reduce risk.


Project Motivation

The Alzheimer's Association estimates nearly 6 million Americans suffer from the disease and it is the 6th leading cause of death in the US. The estimated cost of AD was $277 billion in the US in 2018. The association estimates that early and accurate diagnoses could save up to $7.9 trillion in medical and care costs over the next few decades.

Sources: Mayo Clinic, Alzheimer's Association, Wikipedia

Project Description:

Using data provided by the ADNI Project, it is our goal to develop a computer model that assists in the diagnosis of the disease. We will try multiple models recently popularized in machine learning (Neural Network, SVM, etc.) and more traditional statistical models such as ordinal regression, multinomial regression, and decision trees.


First Data Set: ADNI Q3

  • 628 observations, 15 features (will likely use subset of features)

  • Labels: (CN, LMCI, AD)

  • Class Label distribution: alt-text

  • Features include age, gender, years of education, race, genotype, cognitive test score (MMSE), and more

  • There are six error scenarios:

Prediction Actual Error Type
CN LMCI False Negative
CN AD False Negative
LMCI CN False Positive
LMCI AD ?
AD CN False Positive
AD LMCI ?

Important Note: The models using this data set assume the physician diagnoses (DX.bl) are correct.


Prediction Models

Ordinal Regression (Ranking Learning) in R (CN < LMCI < AD)

  • File: ordinal.R
  • Features/Predictor Variables Used: AGE (Age at baseline), PTGENDER (Sex), PTEDUCAT (Years of Education), PTRACCAT (Race), APOE4 (APOE4) genotype, MMSE (MMSE score), Imputed_genotype (Challenge specific designation, TRUE=has imputed genotypes)

Results: 70% Test Accuracy (110/157)

  • Main problem with the model is False Negatives. As pointed out at the end of the script, when the model makes incorrect predictions, it often predicts Cognitively Normal (CN) when a patient has Limited Mild Conitive Impairment (LMCI) or Alzheimer's (AD). Roughly 50% of the errors were False Negatives.
  • This leads to a model with low sensitivity.

Proposed Solution: Only predict CN if P(CN) > some threshold instead of predicting max(P(CN), P(LMCI), P(AD)). This should reduce the amount of CN predictions and thus, reduce the amount of False Negatives.

Multi-Class Prediction in Python (Jupyter Notebook)

  • File: Multi-Class Classification Jupyter Notebook

  • Since the data was processed with Scikit-Learn, it was easy to try several models using the library such as logistic regression, random forest, k-nearest-neighbor, and multi-layer perceptron.

  • 5-Fold Cross Validation: logistic regression had the highest validation score of .69

Results: 74% Test Accuracy

alzheimers-prediction's People

Contributors

grantgasser avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.