Giter VIP home page Giter VIP logo

ibmbigdatacertification's Introduction

##IBMBigDataCertification - Analysis of Medicare Plans

##Problem Statment Several Medicare plans are available for senior citizens and other qualified members to enrol into every year, that are offered by different health insurance companies offering these plans. While a lot of information regarding individual plans exist, it is difficult to compare plans based on various criteria to make an informed choice to suit unique situations of individual members as well as for plan benefit designers to compare plans and design benefits that meets unique requirements and are competitive in different markets. The purpose of this document is to detail the analysis of the medicare plans data across US and provide useful summary details.

##Project Description The primary purpose of this project is to facilitate analysis of Medicare plans to provide meaningful insights that help in choosing appropriate medicare plans by comparing all relevant details regarding these plans that are available in each counties throughout the country.

While CMS provides rich details on all the plans that are offered county wise, it makes better analysis when each plans that are offered are compared on the finer descriptions of its cost and coverage details

##Objectives

  1. To implement an efficient system to extract , load and transform all data related to Medicare plans to perform analytics.
  2. Analysis of Medicare plans to compare plan offerings by various criteria’s to select suitable plan for the Members
  3. Analysis of Medicare plans to compare plan offerings to design suitable benefit plan for different regions

##Analysis Required for this project

  1. Identify top 5 plans with lowest premiums for a given county across the US
  2. To find plans that have highest co-pays for doctors in a given county
  3. To compare plans based on features like plans that offer free ambulance services
  4. To compare plans based on  features like the benefits available for diabetes under specific plan
  5. To compare plan benefits on diabetes and mental healthcare offered by all companies in a particular county

##Solution Architecture The proposed solution uses Hadoop framework and its eco-system tools to implement a distributed storage and processing of all data to perform the analysis.

Client Machine: The medicare files will be downloaded from medicare website onto client machine.

HDFS: The downaloded files will be uploaded onto Hadoop Distributed Files System. The files are uploaded onto a local machine using pseudo distributionn mode.

PIG: The raw medicare files on the HDFS will be cleansed and transformed on HDFS and spooled into CSV files on HDFS location using Pig Latin Scripts.

HIVE: The Partioned HIVE tables are built over the cleansed medicare plan info and plan services CSV files that are stored in HDFS which were generated by pig latin scripts.

SQOOP:Squoop is used to export out the Analyzed summary data and stored in MySQL RDBMS.

##System Requirements Softwares: Apache Hadoop 2.6.0, Apache Pig 0.13.0, HIVE 0.12.0, Sqoop, GitHub Hardware: Mac OS X Yosemite Cluster: Psudo distibution mode cluster Libraries: piggybank-0.12.0.jar

##Data Source The medicare data can be downloaded from medicare government website from the following location: https://www.medicare.gov/download/downloaddb.asp

Specifically the following two CSV files consists of the plan information and plan services data from where the analysis of the data can be performed:

  1. PlanInfoCounty_FipsCodeMoreThan30000 This file contains the contractid, planid, segmented, planname and county ID where the county ID is more than 30000. The sample of the plan info table:

"H0022","001","0","2015","Buckeye Health Plan - MyCare Ohio","Buckeye Health Plan - MyCareOhio (Medicare-Medicaid Plan)","","Cleveland Dayton Toledo and surrounding counties","1","For Profit","Con Fines de Lucro","48","Medicare-Medicaid Plan","http://mmp.buckeyehealthplan.com","","http://mmp.buckeyehealthplan.com/","http://mmp.buckeyehealthplan.com/","Approved by Medicare and Medicaid","Aprobado por Medicare y Medicaid","FALSE","TRUE","FALSE","88"," ","","","

  • This is a Medicare-Medicaid plan for people with both Medicare and Medicaid. Contact the plan for details.
","","","","BUCKEYE COMMUNITY HEALTH PLAN INC.","","","","","4349 Easton Way Suite 200","Columbus","OH","43219","[email protected]","1-866-549-8289","1-866-549-8289","711","711","[email protected]","1-866-549-8289","1-866-549-8289","711","711","","4349 Easton Way Suite 200","Columbus","OH","43219","[email protected]","1-866-549-8289","1-866-549-8289","711","711","[email protected]","1-866-549-8289","1-866-549-8289","711","711","FALSE","FALSE","0","Not SNP","No hay planes para necesidades especiales","$0.00","$0.00","$0.00","$0.00","FALSE","39023"

  1. PlanInfoCounty_FipsCodeLessThan30000 This file contains the contractid, planid, segmented, planname and county ID where the county ID is less than 30000. The sample data of the plan info table:

"H0028","001","0","2015","CHA HMO Inc.","Humana Gold Plus H0028-001 (HMO)","","Cedar Rapids Metro Area","1","For Profit","Con Fines de Lucro","1","HMO","www.humana-medicare.com","www.humana.com","https://www.humana.com/pharmacy/medicare/tools/druglist/","https://www.humana.com/pharmacy/medicare/","Approved by Medicare","Aprobado por Medicare","FALSE","TRUE","FALSE","88"," ","","","

  • This plan does not charge an annual deductible for all drugs. The $320 annual deductible only applies to drugs on certain tiers.
","","","","CHA HMO INC.","","1501-2000 physicians and providers.","1501-2000 m?dicos y proveedores.","","500 West Main Street","Louisville","KY","40202","[email protected]","1-800-833-2364","1-800-833-2364","711","711","","1-800-457-4708","1-800-457-4708","711","711","","500 West Main Street","Louisville","KY","40202","[email protected]","1-800-833-2364","1-800-833-2364","711","711","","1-800-457-4708","1-800-457-4708","711","711","FALSE","FALSE","0","Not SNP","No hay planes para necesidades especiales","$0.00","$0.00","$0.00","$0.00","FALSE","19011"

  1. vwPlanServices This file contains the contractid, category description, category code and benefit information. The sample data of the plan services table:

"English","2015","H0001","001","0","Monthly Premium Deductible and Limits on How Much You Pay for Covered Services","1","In 2015 the monthly Part B Standard Premium is $104.90","Base Plan","000","1" "English","2015","H0001","001","0","Monthly Premium Deductible and Limits on How Much You Pay for Covered Services","1","This plan has deductibles for some hospital and medical services.","Base Plan","000","4"

##Project Work Flow #Data Collection (Source url) The source files were downloaded from the medicare government website on to the client machine and decompressed to CSV files.

#Data Ingestion (HDFS) The files from the client machine are uploaded onto HDFS using the Hadoop FS commands. As the files are sample medicare files and uploaded onto psudo mode of client machine all the files are stored on a single data node on HDFS.

#Data transformation (PIG) The data in the plan info and plan services files had double quotes enclosed around each field. The files were cleaned using the pig scripts which will be listed in next section.

The data in the plan services file had 2 rows for each record one in English and other in Spanish. The files was cleaned using a separate relation in the pig script.

Plan Info & Plan services files were filtered with records where contractId, SegmentId, PlanId&CountyId were null using pig scripts. The details will be listed in next section.

#Data Analysis (HIVE): Hive tables were built with the cleaned output files from the pig latin scripts. A custom JAVA code was written to order the filtered records into ascending or descenting order. The custom code was compiled into a JAR file and was deployed onto HDFS. The JAR files was added onto HIVE and the summary queries were developed.

#Export Data using Sqoop: The summary files data from HIVE tables can be exported out of HDFS into MySQL databases using the Sqoop. However, this step is not in the scope of the project but included in the future use cases.

#The following are some of the analysis performed -Grouping the plans based on companies managing the plans and the counties where offered. -Finding plans that offer specific services like free ambulance service in its coverage descriptions. -Comparing plans based on the benefits offered for specific conditions like Diabetes -Comparing plans based on premiums and co-pays for specific coverage criteria like doctors co-pays -Comparing plans based on premiums for specific coverage criteria like doctors co-pays

ibmbigdatacertification's People

Contributors

geekay2015 avatar

Watchers

 avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.