Giter VIP home page Giter VIP logo

ades's Introduction

Adverse Drug Event Analysis with Hadoop, R, and Gephi

Introduction

This project contains code for running an analysis of adverse drug events using the Multi-Item Gamma Poisson Shrinker (MGPS) model described in Empirical bayes screening for multi-item associations.

Prerequistes

Code

This analysis is designed to be small enough that you can run it on a single machine if you do not have access to a Hadoop cluster. You will need to have a version of CDH3 on your local machine, along with the version of Pig that is compatible with that version.

You will need to have Maven for compiling the Pig user-defined functions, and may also want to have a copy of R and Gephi for certain phases of the analysis.

Data

The input data for this analysis may be downloaded from the FDA's AERS website. You'll need to get the ASCII version of the data files for as many quarters as you would like to run over. For my own analysis, I used the data from 2008 through 2010.

The Pig scripts below assume that the input data is stored in three HDFS directories under the user's home directory: aers/drugs, aers/demos, and aers/reactions. All of the DRUG*.TXT files from the AERS website should go into aers/drugs, all of the DEMO*.TXT files should go into aers/demos, and all of the REAC*.TXT files should go into aers/reactions.

Running the Pipeline

If you have not done so already, load the input data into the Hadoop cluster:

hdfs dfs -mkdir aers
hdfs dfs -mkdir aers/drugs
hdfs dfs -put DRUG*.TXT aers/drugs
hdfs dfs -mkdir aers/demos
hdfs dfs -put DEMO*.TXT aers/demos
hdfs dfs -mkdir aers/reactions
hdfs dfs -put REAC*.TXT aers/reactions

Each of these commands should be run from the project's top-level directory, i.e., the directory that contains this README file.

mvn package  # Builds the Pig UDFs
pig -f src/main/pig/step1_join_drugs_reactions.pig
pig -f src/main/pig/step2_generate_drug_reaction_counts.pig
pig -f src/main/pig/step3_generate_squashed_distribution.pig

At this point, you can optionally run the R code to solve the MGPS optimization problem. You will need to install the BB library in your local version of R using install.packages("BB") if you do not have it already.

hadoop fs -getmerge aers/drugs2_reacs_stats d2r_stats.csv
Rscript src/main/R/ebgm.R d2r_stats.csv

The output from the optimization run may be plugged into the Pig script that scores the tuples, or you can just use the default parameters that are there now:

pig -f src/main/pig/step4_apply_ebgm.pig

The final output will be in aers/scored_drugs2_reacs. To generate the GEXF file of drug-drug interactions to load into Gephi, run:

hadoop fs -getmerge aers/scored_drugs2_reacs scored_d2r.csv
./src/main/python/gephi.py scored_d2r.csv > drugs.gexf

ades's People

Contributors

jwills avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.