Giter VIP home page Giter VIP logo

convey's Introduction

convey's People

Contributors

ajdamico avatar djalmapessoa avatar guilhermejacob avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

convey's Issues

main user functions probably need an `na.rm=` argument

it looks like the user-level functions svy* should have the na.rm= parameter (default to FALSE like the survey package) and then that na.rm= parameter should get passed to the lower-level functions.

you can see how the na.rm object gets used in
survey:::svymean.survey.design

missing values

Examples in the function svyarpr:

  library(survey)
  library(vardpoor)
  data(eusilc)

linearized design

  des_eusilc <- svydesign( ids = ~rb030 , strata = ~db040 ,  weights = ~rb050 , data = eusilc )
  des_eusilc <- convey_prep( des_eusilc )

linearized design using a variable with missings

  svyarpr( ~ py010n , design = des_eusilc )
   svyarpr( ~ py010n , design = des_eusilc , na.rm = TRUE )

replicate-weighted design using a variable with missings

  svyarpr( ~ py010n , design = des_eusilc_rep )
  svyarpr( ~ py010n , design = des_eusilc_rep , na.rm = TRUE )

for the third we get an error message and for the second we get (NA NA). Perharps we should change to get also NA NA in the third example?

calling stats:::model.frame.default instead of survey:::model.frame.survey.design

working with data frame

pnad2011<- dbGetQuery( db , 'select v4618 , v4617 , pre_wgt , v4609 , v4720, v8005 from pnad2011' )

pnad2011_des <- svydesign(
  id = ~v4618 ,
  strata = ~v4617 ,
  data = pnad2011 ,
  weights = ~pre_wgt ,
  nest = TRUE
  )

pop.post<- data.frame(v4609= as.character(unique(pnad2011$v4609)), Freq= unique(pnad2011$v4609))

pnad2011_des_pos <- postStratify(pnad2011_des, ~v4609, pop.post)

pnad2011_des_pos_sub <- subset(pnad2011_des_pos, v4720!=0 & is.na(v4720)==FALSE & v8005>=15 )

library(convey)
pnad2011_des_pos_sub<-convey_prep(pnad2011_des_pos_sub)

svypoormed(~as.numeric(v4720),pnad2011_des_pos_sub, na.rm=TRUE )

svygini (~as.numeric(v4720),pnad2011_des_pos_sub, na.rm=TRUE)
densfun (~as.numeric(v4720), pnad2011_des_pos_sub, 1000, fun = "F", na.rm=TRUE) 

additional information in the svyfgt output

Perhaps it would be useful to include the following information:

  1. type of threshold: abs, relq, relm
  2. value of threshold
  3. Instead of fgt could be fgt0 or fgt1 or in general : paste0("fgt", g)

confint for svyrep not working

des_eusilc <- survey:::svydesign(ids = ~rb030, strata =~db040, weights = ~rb050, data = eusilc)
des_eusilc_rep <- as.svrepdesign(des_eusilc, type= "bootstrap")
des_eusilc_rep <- convey_prep(des_eusilc_rep)
b1 <- svyarpr(~eqIncome, design = des_eusilc_rep, 0.5, 0.6)
confint(b1)

svyarpt linearization technique does not account for records outside of the subset

https://github.com/DjalmaPessoa/convey/blob/master/all_funs.R#L267-L289

# CORRECT method to break out by groups
dati <- data.frame(IDd = 1:nrow(eusilc), eusilc)
d1 <- linarpt(Y="eqIncome", id="IDd", weight = "rb050", Dom = "db040",
  dataset = dati, percentage = 60, order_quant=50)

# INCORRECT SUBSET
ics <- subset(dati,db040=='Tyrol')
d2 <- linarpt(Y="eqIncome", id="IDd", weight = "rb050", Dom = NULL,
  dataset = ics , percentage = 60, order_quant=50)

# your function matches the incorrect subsetting technique
d3 <-  svyarpt(~eqIncome, subset( des_eusilc , db040 == 'Tyrol' ) , .5, .6)


# this is the correct way to use `linarpt`
table( d1$lin$lin_arpt__db040.Tyrol )
# NOTICE that the zeroes are maintained!

# this is the incorrect way to use `linarpt`
summary(d2$lin)

# your function matches the incorrect method
table(d3$lin)

cvystat objects should print with the SE already calculated

this means convey functions need to "reach around" the svyby in order to get the full design object with sys.call() or something similar so that the SE_lin can be run within the main wrapper function rather than secondarily. once that's been implemented, re-integrate the correct print methods

fbc0da6

should be an error in svyfgt

library(vardpoor)
data(eusilc)
dati = data.frame(1:nrow(eusilc), eusilc)
colnames(dati)[1] <- "IDd"
library(survey)
# create a design object
#des_eusilc <- svydesign(ids = ~db040, weights = ~rb050, data = eusilc)

des_eusilc <- svydesign(ids = ~rb030, strata =~db040,  weights = ~rb050, data = eusilc)
des_eusilc_rep <- as.svrepdesign(des_eusilc, type = "bootstrap")

library(convey)
des_eusilc <- convey_prep(des_eusilc)
des_eusilc_rep<- convey_prep(des_eusilc_rep)

svyby(~eqIncome,  by = ~db040, des_eusilc, FUN=svyfgt,  alpha=1, deff = FALSE) 
svyby(~eqIncome,  by = ~db040, des_eusilc_rep, FUN=svyfgt, alpha=1, deff = FALSE)

## the se estimates are too different, probably an error in the svyfgt.survey.design

additional information in the functions output

There are some useful information not yet contained in the output of the functions.
One possibility would to include additional attributes like for instance:
svyarpr - threshold , quantile
svyarpt - quantile, order, percent
svypoormed - arpr
svyqsr - upper quantile, lower quantile, upper total, lower total
svyrmir - median of income of people older than 65, median of people younger than 65
svyrmpg - median poor
svyfgt - type of poverty threshold, value of threshold

this is what recall for the moment.

in order for sqlite & monetdblite to work with convey, this line needs to change

https://github.com/DjalmaPessoa/convey/blob/master/R/all_funs.R#L225

rather than passing in an object you will need to pass in an expression that will then update the survey design. remember all the trouble we had with t= expecting one length and object being a different length? this is the root of the problem..

does it make sense to eliminate SE_lin2 and write the appropriate code directly into each of the main functions? appropriate code means keeping everything within the data set, never creating an external object that is separated from the design

this might mean you need to use bquote and model.matrix more often (which is what lumley does) but i am not sure

NA treatment in variable age in the function svyrmir

library(vardpoor)
data(eusilc)
dati = data.frame(1:nrow(eusilc), eusilc)
colnames(dati)[1] <- "IDd"
library(survey)
library(convey)

create a design object

des_eusilc <- svydesign(ids = ~db040, weights = ~rb050, data = eusilc)

set.seed(123)
nas<-sample(rownames(eusilc),100,replace=FALSE)
eusilc$eqIncome0<-eusilc$eqIncome
eusilc[nas, "eqIncome0"]<-NA
nas1<-sample(rownames(eusilc),50,replace=FALSE)
eusilc$age0<-eusilc$age
eusilc[nas1, "age0"]<-NA

des_eusilc <- svydesign(ids = ~rb030, strata =~db040,  weights = ~rb050, data = eusilc)
des_eusilc_rep <- as.svrepdesign(des_eusilc, type = "bootstrap")

library(convey)
des_eusilc <- convey_prep(des_eusilc)
des_eusilc_rep<- convey_prep(des_eusilc_rep)

test NA treatment for variables eqIncome0 and age

svyrmir.survey.design( ~eqIncome0 , design = des_eusilc ,age = ~age0, agelim=65, na.rm=TRUE)

function breaks when calling iqalpha. The full_design for iqalpha discards only the NAS from eqIncome0,

and not the NAs from age0.

dynamic statements do not currently work inside any of the functions

the way that you are slimming the data.frame object inside of all of the functions down to only the variables for the current computation will not work if the variable is constructed on-the-fly.

here is a reproducible example that produces the bug. notice the as.numeric() causes the problem. this should work if we are going to align with the survey package

library(convey)
library(survey)
library(vardpoor)
data(eusilc)

# linearized design

des_eusilc <- svydesign( ids = ~rb030 , strata = ~db040 ,  weights = ~rb050 , data = eusilc )
des_eusilc <- convey_prep( des_eusilc )

# works
svyarpt( ~eqIncome , design = des_eusilc )
# breaks
svyarpt( ~as.numeric(eqIncome) , design = des_eusilc )

you can view the point that this breaks by typing

debug(convey:::svyarpt.survey.design)

here are the lines leading up the error

Browse[2]> 
debug: inc <- terms.formula(formula)[[2]]
Browse[2]> 
debug: df <- model.frame(design)
Browse[2]> 
debug: incvar <- df[[as.character(inc)]]
Browse[2]> 
Error in .subset2(x, i, exact = exact) : no such index at level 1

look at the code inside

survey:::svymean.survey.design

and also

survey:::svymean.survey.design2

for ideas about how to work around this issue?

database-backed design examples in convey could use MonetDBLite instead of RSQLite?

just replacing does not work:

database-backed design

' require(MonetDBLite)

' tfile <- tempfile()

' conn <- dbConnect( MonetDBLite() , tfile )

' dbWriteTable( conn , 'eusilc' , eusilc )

'

' dbd_eusilc <- svydesign(ids = ~rb030 , strata = ~db040 , weights = ~rb050 , data="eusilc", dbname=tfile, dbtype="MonetDBLite")

'

' dbd_eusilc <- convey_prep( dbd_eusilc )

' svyrmpg( ~ eqIncome , design = dbd_eusilc )

fix column names for coef and SE

library(devtools)
install_github( "djalmapessoa/convey" )

library(convey)
library(vardpoor)
data(eusilc)
library(survey)
des_eusilc <- svydesign(ids=~db040, weights=~rb050, data=eusilc)
des_eusilc <- convey_prep( des_eusilc )
gini_eqIncome <- svygini(~eqIncome, design=des_eusilc)

svygini( ~ eqIncome , des_eusilc )
svyby( ~ eqIncome , ~ rb090 , des_eusilc , svygini )
svygini( ~ eqIncome , subset( des_eusilc , rb090 == 'male' ) )
svygini( ~ eqIncome , subset( des_eusilc , rb090 == 'female' ) )

?svyrmir help page examples do not have all of the necessary parameters

see the examples at

https://github.com/DjalmaPessoa/convey/blob/master/R/svyrmir.R#L42-L52

could you update these with a variable that makes sense? thanks! each of these break with

> svyrmir( ~eqIncome , design = des_eusilc_rep )
Error in terms.formula(age) : argument "age" is missing, with no default
> 
> # linearized design using a variable with missings
> svyrmir( ~ py010n , design = des_eusilc )
Error in terms.formula(age) : argument "age" is missing, with no default
> svyrmir( ~ py010n , design = des_eusilc , na.rm = TRUE )
Error in terms.formula(age) : argument "age" is missing, with no default
> # replicate-weighted design using a variable with missings
> svyrmir( ~ py010n , design = des_eusilc_rep )
Error in terms.formula(age) : argument "age" is missing, with no default
> svyrmir( ~ py010n , design = des_eusilc_rep , na.rm = TRUE )
Error in terms.formula(age) : argument "age" is missing, with no default
>

deff=TRUE causing problems with svyby

    The functions svygpg has the sex and svyrmir has the age as extra
    arguments. Perhaps the NA treatment has to be extended also to these
    variables.

    For svyrmir:

    svyrmir( ~eqIncome , subset(des_eusilc,db040=="Tyrol") , age = ~age,
    agelim=65, na.rm=TRUE)

    works, but for

    svyby(~eqIncome,by=~db040, design=des_eusilc, FUN=svyrmir, age = ~age,
    agelim=65, na.rm=TRUE, deff=FALSE )

    I got:

    Error in svyrmir.survey.design(data, design[byfactor %in% byfactor[i], :
    unused argument (deff = deff). Any suggestion?

DBIsvydesign subsets

library(convey)
options( monetdb.debug.query = TRUE )
setwd( "C:\Djalma\PnadMonetdb\PNAD" )
pnad.dbfolder <- paste0( getwd() , "/MonetDB" )
db <- dbConnect( MonetDBLite() , pnad.dbfolder )
dbListTables(db)

  options(survey.lonely.psu = "adjust")

  source_url( "https://raw.githubusercontent.com/ajdamico/asdfree/master/Pesquisa%20Nacional%20por%20Amostra%20de%20Domicilios/pnad.survey.R" , prompt = FALSE )

  sample.pnad <-
    svydesign(
      id = ~v4618 ,
      strata = ~v4617 ,
      data = 'pnad2011' ,
      weights = ~pre_wgt ,
      nest = TRUE ,
      dbtype = "MonetDBLite" ,
      dbname = pnad.dbfolder
    )

  y <-
    pnad.postStratify(
      design = sample.pnad ,
      strata.col = 'v4609' ,
      oldwgt = 'pre_wgt'
    )

  y.sub <- subset (y,  !is.na(v4720) & v4720!=0 & v8005>=15)

dim(y)

dim(y.sub)

Now check:

dim.des <-function (formula, design, na.rm = FALSE, ...)
{
incvar <- model.frame(formula, design$variables, na.action = na.pass)[[1]]
w <- 1/design$prob
ncom <- names(w)
if (na.rm) {
nas <- is.na(incvar)
design <- design[!nas, ]
incvar <- incvar[!nas]
w <- w[!nas]
}

 list(dim(design), length(incvar))

}

dim.des(~ v4720,y.sub, na.rm=TRUE)

within the testthat folder, could each function have at least four use case tests?

for each of the main svy*() functions you have written,

a1 <- svy*( ~variable , design ) run on linearized design
a2 <- svyby( ~ variable , ~ byvar , design , svy*() ) run on linearized design

b1 <- svy*( ~variable , design ) run on svyrep design
b2 <- svyby( ~ variable , ~ byvar , design , svy*() ) run on svyrep design

and then could you also write testing code that checks and confirms that
coef and SE and confint all work on the outputted objects a1, a2, b1, and b2?

so for each function, 4 examples x 3 helper functions

change the variable py010n in the svyrmir example

If the �goal is only to show the NA treatment, we could create a new variable inserting NA values in the variable eqincome:

indNA<- rbinom(nrow(eusilc),1, .10)
eusilc$eqincome.miss <- eusilc$iqincome
eusilc$eqincome.miss[indNA==1] <- NA

and use the variable eqincome.miss instead of py010n in the svyrmir example?

svyarpt.DBIsvydesign is breaking with pnad data

example using pnad2011
library(RSQLite)
library(downloader)
library(survey)
library(convey)

    setwd('C:\\Djalma\\PNAD2012')
    pnad.dbname <- "pnad.db"

    db<- dbConnect(SQLite(),pnad.dbname )
    dbListTables(db)
    dbListFields(db, "pnad2011")



    options( survey.lonely.psu = "adjust" )
    source_url( "https://raw.github.com/ajdamico/usgsd/master/Pesquisa Nacional por Amostra de Domicilios/pnad.survey.R" , prompt = FALSE )

create design
sample.pnad <-
svydesign(
id = ~v4618 ,
strata = ~v4617 ,
data = "pnad2011" ,
weights = ~pre_wgt ,
nest = TRUE ,
dbtype = "SQLite" ,
dbname = "pnad.db"
)

      y <-
        pnad.postStratify(
          design = sample.pnad ,
          strata.col = 'v4609' ,
          oldwgt = 'pre_wgt'
        )

filter used by IBGE to define Gini index

    ysub<- subset(y, v4720!=0 & is.na(v4720)==FALSE & v8005>=15)
    ysub<-convey_prep(ysub)

estimate the mean of v4720

    svymean(~as.numeric(v4720), ysub, na.rm=TRUE )
    svyquantile(~as.numeric(v4720), ysub, .5, na.rm=TRUE )

estimate the arpt:

svyarpt(~as.numeric(v4720), ysub, na.rm=TRUE ) 

svygini(~as.numeric(v4720), ysub, na.rm=TRUE)

svyarpt is breaking. After some debuging, it first breaks at htot <- h_fun(incvec, wf)
invec and wf have different length. I don't see why?

Error in UseMethod("SE_lin2", design) : no applicable method for 'SE_lin2' applied to an object of class "c('survey.design2', 'survey.design')"

library(vardpoor)
data(eusilc)
dati = data.frame(1:nrow(eusilc), eusilc)
colnames(dati)[1] <- "IDd"
library(survey)

# create a design object

des_eusilc <- svydesign(ids = ~rb030, strata =~db040,  weights = ~rb050, data = eusilc)
des_eusilc_rep <- as.svrepdesign(des_eusilc, type = "bootstrap")

library(convey)
des_eusilc <- convey_prep(des_eusilc)
des_eusilc_rep<- convey_prep(des_eusilc_rep)

# svyfgt for a domain:
svyfgt(~eqIncome,  subset(des_eusilc, db040=="Tyrol"),  alpha=0)

no applicable method for SE_lin2()

still get

Error in UseMethod("SE_lin2", design) :
no applicable method for 'SE_lin2' applied to an object of class
"c('survey.design2', 'survey.design')"

in svyarpt.

example from pnad2011

library(RSQLite)
library(downloader)
library(survey)
library(convey)

setwd('C:\\Djalma\\PNAD2012')
pnad.dbname <- "pnad.db"

db<- dbConnect(SQLite(),pnad.dbname )
dbListTables(db)
dbListFields(db, "pnad2011")


options( survey.lonely.psu = "adjust" )
source_url( "https://raw.github.com/ajdamico/usgsd/master/Pesquisa Nacional por Amostra de Domicilios/pnad.survey.R" , prompt = FALSE )

create design:

  sample.pnad <-
    svydesign(
      id = ~v4618 ,
      strata = ~v4617 ,
      data = "pnad2011" ,
      weights = ~pre_wgt ,
      nest = TRUE ,
      dbtype = "SQLite" ,
      dbname = "pnad.db"
    )

post-stratify:
y <-
pnad.postStratify(
design = sample.pnad ,
strata.col = 'v4609' ,
oldwgt = 'pre_wgt'
)

filter used by IBGE to define Gini index:

ysub<- subset(y, v4720!=0 & is.na(v4720)==FALSE & v8005>=15)
ysub<-convey_prep(ysub)

estimate the mean of v4720:

svymean(~as.numeric(v4720), ysub, na.rm=TRUE )

estimate the arpt:

svyarpt(~as.numeric(v4720), ysub, na.rm=TRUE )

got the error:
Error in .subset2(x, i, exact = exact) : no such index at level 1

any idea?

should the ?examples pages include basic info about the parameters?

you previously had lines like

#' arpt_eqIncome <-svyarpt(~eqIncome, design=des_eusilc, .5, .6,comp=TRUE)

but the 0.5 and 0.6 and comp=TRUE are just the default for the function, so they are unnecessary. do you want to add examples to each svy*() function that show what the non-default parameters do? for example, what is the difference between

svyarpt( ~eqIncome , des_eusilc )

and

svyarpt( ~eqIncome , des_eusilc , order = 0.5 )

and

svyarpt( ~eqIncome , des_eusilc , comp = TRUE )

some notes about how each parameter changes each function would make it easier for users.. if you do not add notes, then the svy*() examples should only use the defaults, i think?

svyrmir outputting zeroes

using the example code from ?svyrmir

library(survey)
library(vardpoor)
data(eusilc) ; names( eusilc ) <- tolower( names( eusilc ) )

# linearized design
des_eusilc <- svydesign( ids = ~rb030 , strata = ~db040 ,  weights = ~rb050 , data = eusilc )

svyrmir( ~eqincome , design = des_eusilc , age = ~age , agelim = 65 , med_old = TRUE )

# replicate-weighted design
des_eusilc_rep <- survey:::as.svrepdesign( des_eusilc , type = "bootstrap" )
svyrmir( ~eqincome , design = des_eusilc_rep, age= ~age, agelim = 65, med_old = TRUE )

# linearized design using a variable with missings
svyrmir( ~ py010n , design = des_eusilc,age= ~age, agelim = 65)
svyrmir( ~ py010n , design = des_eusilc , age= ~age, agelim = 65, na.rm = TRUE )
# replicate-weighted design using a variable with missings
svyrmir( ~ py010n , design = des_eusilc_rep,age= ~age, agelim = 65 )
svyrmir( ~ py010n , design = des_eusilc_rep ,age= ~age, agelim = 65, na.rm = TRUE )

are these zeroes expected?

svyrmir( ~ py010n , design = des_eusilc , age= ~age, agelim = 65, na.rm = TRUE )
rmir SE
py010n 0 0.0013

svyrmir( ~ py010n , design = des_eusilc_rep ,age= ~age, agelim = 65, na.rm = TRUE )
rmir SE
py010n 0 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.