Giter VIP home page Giter VIP logo

da_case_studies's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

da_case_studies's Issues

ch10 hotels minor

List of 5 best deals

df <- data.frame(hotel_id = hotels$hotel_id, price= hotels$price, lnprice_resid=hotels$lnprice_resid, distanc=hotels$distance, stars=hotels$stars, rating=hotels$rating)
df[order(df$lnprice_resid)[1:5], ]
#TODO

print out nice table

ch 22 market definition

single aa or us markets are actually untreated. Typo in the book
--> must add comments in all languages

ch23 python R2

R2 calculation in FE models corrected in R. Within R2 should be printed. --> check Python.

Added a bit (not in book) that prints R2 for unweighted model --> add Python

[urgent] swim log model

Model M6 RMSE is wrong in R.
The issue is the log correction.
It wants to use the system-generated residual whereas that's impossible for the holdout set.
Instead, thus, it uses the training set residuals.
Yet it's easy to obtain the holdout set "residuals", they are (y-yhat).
The corrected RMSE is very different.

This is what we have in R
#had to cheat and use train error on full train set because could not obtain CV fold train errors
corrb <- mean((reg6$finalModel$residuals)^2)
rmse_CV["reg6"] <- reg6$pred %>%
mutate(pred = exp(pred + corrb/2)) %>%
group_by(Resample) %>%
summarise(rmse = RMSE(pred, exp(obs))) %>%
as.data.frame() %>%
summarise(mean(rmse)) %>%
as.numeric()
rmse_CV["reg6"]

below is new stata code that does it

*** 5 MODELS WITH QUANTITY AS TARGET VARIABLE (M1-M5)
local M1 t i.month
local M2 t i.month i.dayofweek
local M3 t i.month i.dayofweek i.natholiday
local M4 t i.month i.school_off##i.dayofweek
local M5 t i.month i.school_off##i.dayofweek i.month##i.dayofweek

forvalue i=1/5 {
forvalue y=2010/2015 {
qui use "$data_out/swim-daily-workfile.dta", replace
dis ""
dis "*********************************************"
dis "Model Mi'" dis "test year: y'"
dis "training years:"
tab year if year!=y' reg quantity Mi'' if year!=y'
qui predict yhat
qui gen sq_error = (quantity - yhat)^2
qui sum sq_error if year==y' local mse_y' = r(mean)
}
gen cv_mse_Mi' = (mse_2010'+mse_2011'+mse_2012'+mse_2013'+mse_2014'+mse_2015')/6 gen cv_rmse_Mi' = sqrt(cv_mse_M`i')
keep if t==1 /
one-obervation file with the forecast statistics /
keep t cv

cap merge 1:1 t using "$data_out/swim-daily-forecasts.dta", nogen
save "$data_out/swim-daily-forecasts.dta", replace
}

*** +1 MODEL (M6) WITH LN(QUANTITY) AS TARGET VARIABLE
local M6 t i.month i.school_off##i.dayofweek
forvalue y=2010/2015 {
use "$data_out/swim-daily-workfile.dta", replace
cap gen lnq = ln(quantity)
keep if year>=2010 & year<=2015
qui reg lnq M6' if year!=y'
qui predict yhat
local sig = e(rmse)
replace yhat = exp(yhat) * exp(sig'^2/2) qui gen sq_error = (quantity - yhat)^2 qui sum sq_error if year==y'
local mse_y' = r(mean) } gen cv_mse_M6 = (mse_2010'+mse_2011'+mse_2012'+mse_2013'+mse_2014'+`mse_2015')/6
gen cv_rmse_M6 = sqrt(cv_mse_M6)
cap merge 1:1 t using "$data_out/swim-daily-forecasts.dta", nogen
aorder
order t
save "$data_out/swim-daily-forecasts.dta", replace

tabstat cv_rmse_M*, col(s) format(%4.2f)

requirements.txt looks skinny

It's great that exact versions are provided, but are you sure we have everything there? I am missing jupyter for instance.

ch-21-wms-analysis.R matching on propensity score

It seems to me that the treatment effects from propensity score matching in lines 186-187 and 215-216 are average treatment effects on the treated (ATET) and not average treatment effects (ATE) as suggested by the variable names in the code (see documentation of matchit function in R)

ch16 RF -- PDP todo

#########################################################################################

Partial Dependence Plots -------------------------------------------------------

#########################################################################################

TODO

: somehow adding scale screws up. ideadlly both graphs y beween 70 and 130,

n:accom should be 1,7 by=1

FIXME

should be on holdout, right? pred.grid = distinct_(data_train, "), --> pred.grid = distinct_(data_holdout, )

R library freeze

Need to add Renv / or other solution to offer library options
SwitchR was another idea. It has the idea, that offers a separate platform for the book, ie ppl may stay use other versions for other projects.

Something we started discussing w @zholler in the summer, not done.

[urgemt] ch21 wms

Valami güzmi van
R 4.0.2, dplyr 1.02

Ownership: define founder/family owned and drop ownership that's missing or not relevant

Ownership

data %>%

  • dplyr::group_by(ownership) %>%
  • dplyr::summarise(Freq = n()) %>%
  • mutate(Percent = Freq / sum(Freq)*100, Cum = cumsum(Percent))
    Error in UseMethod("group_by_") :
    no applicable method for 'group_by_' applied to an object of class "function"

ch14 airbnb logs

We are purging logs from code
and move to separate code which will not be shared.

I started it, not done.

+some todos in log bit.

Python type hints

Have you thought of providing Python type hints for functions? Wouldn't be a huge effort to append them IMO.

It's becoming the standard (at least I'd like to think so) and definitely helps instructors understand the codebase.

world bank files ch02, ch23

R, Python world bank immunization files read in .dta
it should read in cleaned csv from osf.

+add cleaners to osf that save cleaned csv--- to be read in by code in github

reformat R code folders as part of package

-change folder management at all R scripts.
-create packages.txt for all code and save as txt

Example: Like ch07. ch07-hotel-simple-reg
Zsuzsi to help Adam get started.

geom

ezt miert kapom minden geom smoooth esetben?
pl ch07 hotels simple

geom_smooth() using formula 'y ~ x'

filter

Ch04 wms

filter is sok helyen nyafog

Sample selection

df <- df %>%

  • filter(country=="Mexico" & wave==2013 & emp_firm>=100 & emp_firm<=5000)
    Error in UseMethod("filter_") :
    no applicable method for 'filter_' applied to an object of class "function"
    In addition: Warning message:
    filter_() is deprecated as of dplyr 0.7.0.
    Please use filter() instead.
    See vignette('programming') for more help

funs

ch04 wms

df %>%

  • dplyr::select(management, emp_firm) %>%
  • summarise_all(funs(min, max, mean, median, sd, n()))

A tibble: 1 x 12

management_min emp_firm_min management_max emp_firm_max management_mean emp_firm_mean management_medi~ emp_firm_median

1 1.28 100 4.61 5000 2.94 761. 2.94 353

... with 4 more variables: management_sd , emp_firm_sd , management_n , emp_firm_n

Warning message:
funs() is deprecated as of dplyr 0.8.0.
Please use a list of either functions or lambdas:

Simple named list:

list(mean = mean, median = median)

Auto named with tibble::lst():

tibble::lst(mean, median)

Using lambdas

list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once every 8 hours.
Call lifecycle::last_warnings() to see where this warning was generated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.