a script to create an emr cluster from ec2 to run spark job daily
using JDBC connection to database and conduct query |-hiveConnection |-redshiftConnection |-mySQLConnection
python connecting to redshift and download data as dataframe
few ways to detect outlier and show effects
open a list of url for reviewing from R
tunning xgboost model using sklearn wraper of xgboost (XGBClassifier) and GridSearchCV