Giter VIP home page Giter VIP logo

cchsflow's People

Contributors

bhaqa044 avatar cbjerke avatar dougmanuel avatar esucha avatar jasminecandeliere avatar kittychenn avatar olivroy avatar rvyuha avatar wyusuf068 avatar yulric avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cchsflow's Issues

Binge drinking

Assess whether binge drinking can be added to cchsflow.

Variable request: sedentary behaviour

Several studies have assessed sedentary behaviour across CCHS cycles:

Prince, S.A., Melvin, A., Roberts, K.C. et al. Sedentary behaviour surveillance in Canada: trends, challenges and lessons learned. Int J Behav Nutr Phys Act 17, 34 (2020). https://doi.org/10.1186/s12966-020-00925-8

Joundi RA, Patten SB, Williams JVA, Smith EE. Association Between Excess Leisure Sedentary Time and Risk of Stroke in Young Individuals. Stroke. 2021 Aug 19:STROKEAHA121034985. doi: 10.1161/STROKEAHA.121.034985. Epub ahead of print. PMID: 34407638.

What is the name of the variable?

What is the most consistent name for this variable (usually the variable name used from 2007-2014). If this is a derived variable, what is the name of the newly derived variable?

  • Let's connect with the authors of the above papers. The papers don't include details of the variable used or harmonization.

Language variables

Introduction

We accept requests and PR for new variables. Providing information about your request helps discussion about whether and how to include the variable.

Is the variable an existing CCHS variable, or a derived variable?

These are derived variables that already exist in CCHS cycles.

What is the name of the variable?*

  1. SDCAGLNG/SDCGLNG - Language of conversation
  2. SDCGLHM - Language spoken at home
  3. SDC_5A_1 - Knowledge of official languages:
  4. SDCDFOLS - First official language spoken (FOLS)

What is the most consistent name for this variable (usually the variable name used from 2007-2014). If this is an derived variable, what is the name of the derived variable?

Description of variable

  1. Language of conversation: Language the respondent can converse
  2. Language spoken at home: Language often or regularly spoken at home by the respondent
  3. Knowledge of official languages: Knowledge of the Canadian official languages
  4. First official language spoken: Derived variable based on knowledge of official languages, mother tongue, and language spoken at home.

Is it consistent across CCHS cycles? If not, explain changes between cycles

The language of conversation is consistent after 2007
Language of conversation:

  • SDCAGLNG (c 1.1, 2.1),
  • SDCEGLNG (c 3.1),
  • SDCGLNG (c 2007-2008, c 2009-2010)

The language spoken at home is consistent across cycles after 2007Language spoken at home:

  • SDCGLHM (c 2007-2008 and onward)

The Knowledge of official languages and FOLS are consistent across cycles after 2011
Knowledge of official languages:

  • SDC_5A_1 (c 2011-2012 and onward)

First official language spoken (FOLS):
SDCDFOLS (c 2011-2012 and onward)

Which cycles is this variable found?

  1. Language of conversation (SDCAGLNG/SDCGLNG): c 1.1, c 2.1, c 3.1, c 2007-2008 and c 2009-2010)
  2. Language spoken at home (SDCGLHM): c 2007-2008, c 2009-2010, c 2011-2012, and c 2013-2014
  3. Knowledge of official languages ( SDC_5A_1): c 2011-2012, and c 2013-2014
  4. First official language spoken (SDCDFOLS ): c 2011-2012, and c 2013-2014

Lang vars for cchsflow_mar25.xlsx

Hypertension

Add two additional variables:

  • history of hypertension
  • medications for hypertension

CCHS yearly data

I have been running into some issues using the yearly data files with cchsflow.

For example, I want to look at the length of time in Canada in the 2001 CCHS file.

library(readr)
cchs2001 <- read_csv(~/cchs2001)
library(cchsflow)
library(dplyr)

# Length of time in Canada 2001 (recoding single variable)
Ctime2001 <- rec_with_table(cchs2001, "SDCGRES", log = TRUE, attach_data_name = TRUE)
View(Ctime2001)

After reading the csv file and running the rec_with_table() function, a warning message pops up and the recoded table (Ctime2001) appears in the environment window.
image

When looking at the survey answers (1, 2, ..., NA(a), NA(b), etc.) for this table, it didn't show the individual answer options but vector option (image below). The results were all NA.
image

I didn't run into this issue with the test data, I'm not sure how to correct this (does this have to do with the warning message?).

I also had an issue with running derived variable functions using the yearly data. For example, I want to look at the percentage of time in Canada using 2001 CCHS file.

# %time in Canada 2001 (derived variable)
pct_time2001 <- rec_with_table(cchs2001, c("DHHGAGE_cont","SDCGCBG", "SDCGRES","pct_time_der"),log=TRUE)

After running the rec_with_table() function, a warning message pops up and the recoded table doesn't appear in the environment window.
image
Again, I didn't run into this issue with the test data, I'm not sure how to correct this (does this have to do with the error and warning message?).

Thank you in advance!

Add export to PMML

Export transformations to Predictive Modelling Mark-up Language to support predictive analytic studies.

Provide support for Python, SAS and Stata

Create a SAS macro or Python, Stata function to transform CCHS variables using variables.csv and variableDetails.csv.

rec_with_table() is the function that supports almost all of cchsflow. This function is a straightfoward extension of common recode functions. The difference is that rec_with_table uses a data frame as opposed to the more common approach of hard coding transformation attributes.

SAS and Stata are commonly used statistical packages for the CCHS community.

New Variable Request: Back Problems, Fibromyalgia, Migraine Headache, Anxiety

Introduction

We accept requests and PR for new variables. Providing information about your request helps discussion about whether and how to include the variable.

Is the variable an existing CCHS variable, or a derived variable?

Existing CCHS Variable

Does this variable exist already in CCHS cycles, or is it a newly derived variable?

What is the name of the variable?

CCC_041 - Fibromyalgia

2001 - CCCA_041
2003 - CCCC_041
2005 - CCCE_041
2014 - CCC_041

CCC_061 (Back problems)

2001 - CCCA_061
2003 - CCCC_061
2005 - CCCE_061
2007 - 2014 - CCC_061

CCC_081 (Migraine headache)

2001 - CCCA_081
2003 - CCCC_081
2005 - CCCE_081
2007 - 2014 - CCC_081

CCC_290 (anxiety)

2001 - NA
2003 - CCCC_290
2005 - CCCE_290
2007 - 2014 - CCC_290

What is the most consistent name for this variable (usually the variable name used from 2007-2014). If this is an derived variable, what is the name of the newly derived variable?

Description of variable

Provide a brief description of the variable.

Is it consistent across CCHS cycles? If not, explain changes between cycles

For existing variables in the CCHS.

Which cycles is this variable found?

For existing variables in the CCHS.

Derived variables only. What variables are used to create this variable?

List the variables in cchsflow used to create this variable

Additional context

Add any other context or screenshots about the feature request here.

Additional instructions for derived variables

Derived variables use R code for more complex operations and multiple starting variables. Note: cchsflow currently uses only base R to derive variables. More complex derived variables that require dependancies are currently out-of-scope for cchsflow.

If possible, attach an .R file that includes documentation of the derived variable as per roxygen2 standards, along with the code to derive the variable. Include all starting variables.

Corrected BMI

BMI corrected for self-reporting bias of height and weight was added in 2015.

Let's add this variable for all cycles.

Health regions

Introduction

We accept requests and PR for new variables. Providing information about your request helps discussion about whether and how to include the variable.

Is the variable an existing CCHS variable, or a derived variable?

GEODPMF health region - currently exists in all cycles.

What is the name of the variable?

GEODPMF health region for 2013-14.

Description of variable

Geographic variable for health region.

Is it consistent across CCHS cycles? If not, explain changes between cycles

Consistency is a major challenge because health regions have changed over time.

Which cycles is this variable found?

A regional variable is available in all cycles.

Additional context

We'll need to find documentation that helps crosswalk over time. However, suggest that we start by bringing 2013-14 into cchsflow.

Perez diet score

Introduction

We accept requests and PR for new variables. Providing information about your request helps discussion about whether and how to include the variable.

Is the variable an existing CCHS variable, or a derived variable?

Does this variable exist already in CCHS cycles, or is it a newly derived variable?

What is the name of the variable?

What is the most consistent name for this variable (usually the variable name used from 2007-2014). If this is an derived variable, what is the name of the newly derived variable?

Description of variable

Provide a brief description of the variable.

Is it consistent across CCHS cycles? If not, explain changes between cycles

For existing variables in the CCHS.

Which cycles is this variable found?

For existing variables in the CCHS.

Derived variables only. What variables are used to create this variable?

List the variables in cchsflow used to create this variable

Additional context

Add any other context or screenshots about the feature request here.

Additional instructions for derived variables

Derived variables use R code for more complex operations and multiple starting variables. Note: cchsflow currently uses only base R to derive variables. More complex derived variables that require dependancies are currently out-of-scope for cchsflow.

If possible, attach an .R file that includes documentation of the derived variable as per roxygen2 standards, along with the code to derive the variable. Include all starting variables.

New variable request: income ratio

Introduction

We accept requests and PR for new variables. Providing information about your request helps discussion about whether and how to include the variable.

Is the variable an existing CCHS variable, or a derived variable?

Does this variable exist already in CCHS cycles, or is it a newly derived variable?

What is the name of the variable?

What is the most consistent name for this variable (usually the variable name used from 2007-2014). If this is an derived variable, what is the name of the newly derived variable?

Description of variable

Provide a brief description of the variable.

Is it consistent across CCHS cycles? If not, explain changes between cycles

For existing variables in the CCHS.

Which cycles is this variable found?

For existing variables in the CCHS.

Derived variables only. What variables are used to create this variable?

List the variables in cchsflow used to create this variable

Additional context

Add any other context or screenshots about the feature request here.

Additional instructions for derived variables

Derived variables use R code for more complex operations and multiple starting variables. Note: cchsflow currently uses only base R to derive variables. More complex derived variables that require dependancies are currently out-of-scope for cchsflow.

If possible, attach an .R file that includes documentation of the derived variable as per roxygen2 standards, along with the code to derive the variable. Include all starting variables.

Don't add the variable type in labels

Describe the bug

Variables which are categorized versions of continuous variables like pct_time_der_cat10 and diet_score_cat3 have the text categorical in their label. When describing variables in a paper, for example in a table, their type is usually included along with the label. This means that the labels and type currently have overlapping information for these variables as seen below,

Variable Variable Type Categories
Categorical diet score Categorical Poor diet
Fair diet
Adequate diet

Consider removing the type for a variable from its label.

To Reproduce

NA

Expected behavior

NA

Screenshots

NA

Desktop (please complete the following information):

NA

Additional context

NA

New variable request: Year as continuous variable

Introduction

We accept requests and PR for new variables. Providing information about your request helps discussion about whether and how to include the variable.

Is the variable an existing CCHS variable, or a derived variable?

Does this variable exist already in CCHS cycles, or is it a newly derived variable?

What is the name of the variable?

What is the most consistent name for this variable (usually the variable name used from 2007-2014). If this is an derived variable, what is the name of the newly derived variable?

Description of variable

Provide a brief description of the variable.

Is it consistent across CCHS cycles? If not, explain changes between cycles

For existing variables in the CCHS.

Which cycles is this variable found?

For existing variables in the CCHS.

Derived variables only. What variables are used to create this variable?

List the variables in cchsflow used to create this variable

Additional context

Add any other context or screenshots about the feature request here.

Additional instructions for derived variables

Derived variables use R code for more complex operations and multiple starting variables. Note: cchsflow currently uses only base R to derive variables. More complex derived variables that require dependancies are currently out-of-scope for cchsflow.

If possible, attach an .R file that includes documentation of the derived variable as per roxygen2 standards, along with the code to derive the variable. Include all starting variables.

Fix missings in age variables

The coding of NA(a) and NA(b) for DHH_AGE-related variables overlap with the age values, which codes age 96-99 as missings.

Smoke_simple

Add smoke_simple which a variable that makes four smoking categories from the original CCHS smoking derived variable (SMKDSTY).

Remove disease name from the categories for disease variables

Describe the bug

Most of the disease variable include the name of the disease in their category labels. For example, arthritis and back problem. This information is redundant since the category for a variable is usually presented within the context of the original variable. Practically, it makes a paper's table unnecessarily wordy as can be seen below,

Variable Type Categories
Arthritis Dichotomous Yes Arthritis/Rheumatism
No Arthritis/Rheumatism

Consider renaming the category labels to Yes and No.

To Reproduce

NA

Expected behavior

NA

Screenshots

NA

Desktop (please complete the following information):

NA

Additional context

NA

Health care access

Options:

  • regular family doctor — add
  • immunization - in the future?
  • cancer screening - in the future?

CCHS 2015 to 2018 shared variables for cardiovascular disease

Introduction

We accept requests and PR for new variables. Providing information about your request helps discussion about whether and how to include the variable.

Is the variable an existing CCHS variable, or a derived variable?

Existing variables in CCHS 2015 to 2018 used for implementation of CVDPoRT algorithm .

What is the name of the variable?

Approximately, 12 - 15 variables.
Description of variable

Variables covering sociodemographics, health behaviours, diseases.

Is it consistent across CCHS cycles? If not, explain changes between cycles

There are concerns about consistency with variables prior to 2015. As well, this project used CCHS shared data.

Which cycles is this variable found?

2015-2018.

Derived variables only. What variables are used to create this variable?

TBA

New variables will be added to cchs-shared branch.

Food insecurity

Which variable(s) are best used for multiple CCHS cycles?

Check HRUPoRT and studies by Valerie Tarasuk.

Missing weekly alcohol consumption across provinces

I was looking at weekly alcohol consumption for each province/territory over the survey cycles. I noticed that certain cycles (2007_2008, 2009_2010, 2011_2012, 2013_2014, 2017_2018) have weekly consumption for certain provinces/territories but not all. This might have to do with the skip patterns for certain provinces/territories (?).

We should look into why that is the case and make a note in the cchsflow documentation.

Installation error: over-long path length

Would it be possible to revisit the file naming conventions used in this package? By default, Windows file names (including extensions) are limited 260 characters. The current file names are too long and cause the package installation to fail. This is particularly problematic for users with limited permissions.

There is a known issue regarding file naming conventions and install_github() on Windows platforms.

> devtools::install_github("Big-Life-Lab/cchsflow")
Downloading GitHub repo Big-Life-Lab/cchsflow@masterchecking for file 'C:\Users\jboudr\AppData\Local\Temp\Rtmps9j4NF\remotes248c4e7e6d63\Big-Life-Lab-cchsflow-281846f/DESCRIPTION' (470ms)
   Warning in file.copy(pkgname, Tdir, recursive = TRUE, copy.date = TRUE) :
     over-long path length
    ERROR
   copying to build directory failed
Error: Failed to install 'cchsflow' from GitHub:
  System command error, exit status: 1, stdout + stderr:
E> * checking for file 'C:\Users\jboudr\AppData\Local\Temp\Rtmps9j4NF\remotes248c4e7e6d63\Big-Life-Lab-cchsflow-281846f/DESCRIPTION' ... OK
E> Warning in file.copy(pkgname, Tdir, recursive = TRUE, copy.date = TRUE) :
E>   over-long path length
E>  ERROR
E> copying to build directory failed
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Hypertension

Can we confirm how we've ascertained hypertension in our past papers?
CCC_071 - current hypertension
CCC_072 - previous diagnossis of hypertension.

Multiple chronic conditions

There are different options. Consider major conditions (mostly lower prevalence) and minor conditions (such as hypertension and obesity that are really risk factors more than chronic diseases).

Reference is Muggah et al.

New Variable: Symptoms of Depression

Is the variable an existing CCHS variable, or a derived variable?

Its a newly derived variable

What is the name of the variable?

There are 2 variables to consider for this PR,
DPSDSF - Depression Scale - Short form score
DPSDPP - Depression Scale - Predicted probability

Description of variable

The variable we want to include should indicate whether a person has symtoms of depression or the condition itself. We can use one of the two variables described above.

Is it consistent across CCHS cycles? If not, explain changes between cycles

DPSDSF: The derivation for this variable does not change a lot between cycles. Its name changes from DPSADSF (CCHS2001) to DPSCDSF (CCHS2003) to DPSEDSF (CCHS2005) to DPSDSF from CCHS 2007-2008 to CCHS 2014.

ADMC_PRX = 1 (CCHS2003) & ADME_PRX = 1 (CCHS2005) & ADM_PRX = 1 (CCHS2007-2008, CCHS2009 - 2010, CCHS2010, CCHS2011 - 2012, CCHS2012, CCHS2013-2014, CCHS2014) was an additional condition for 99 (NS);
DPSCFOPT = 2 (CCHS2003) & DPSEFOPT = 2 (CCHS2005) & DPSFOPT = 2 (CCHS2007-2008) & DODEP = 2 (CCHS2009-2010, CCHS2010, CCHS2011-2012, CCHS2012, CCHS2013-2014, CCHS2014) was an additional condition for 96 (NA).

DPSDPP: The derivation has not changed a lot. Its name changes from DPSADPP (CCHS 2001) to DPSCDPP (CCHS 2003) to DPSEDPP (CCHS 2005) to DPSDPP from CCHS 2007-2010 to CCHS 2014.

CCHS 2005 to CCHS 2014: ADME_PRX = 1 was also coded as NS

Which cycles is this variable found?

In all cycles from CCHS 2001 to CCHS 2014

The attached file has harmonization information for all variables found in the CCHS by searching for the term "depress"
Depression Symtoms Harmonization.xlsx

New variable request: Derived COPD variable

Introduction

We accept requests and PR for new variables. Providing information about your request helps discussion about whether and how to include the variable.

Is the variable an existing CCHS variable, or a derived variable?

There is currently resp_condition_der that is COPD (COPD + emphsema) + asthma.

There is not a harmonized COPD variable and there are quite a few changes to COPD over the CCHS cycles.

2001-2003: COPD & Emphysema are combined (CCC_091)
2005-2007: COPD & Emphysema are separate (CCC_91E & CCC_91F)
2009-2014: COPD & Emphysema combined (CCC_091)

Given COPD and asthma have different disease burdens, suggest creating a new derived COPD variable

Does this variable exist already in CCHS cycles, or is it a newly derived variable?

What is the name of the variable?

What is the most consistent name for this variable (usually the variable name used from 2007-2014). If this is a derived variable, what is the name of the newly derived variable?

Suggestion CCC_091F as the new variable name

Description of variable

See above.

Is it consistent across CCHS cycles? If not, explain changes between cycles

See above.

Which cycles is this variable found?

See above.

Derived variables only. What variables are used to create this variable?

2001-2003: COPD & Emphysema are combined (CCC_091)
2005-2007: COPD & Emphysema are separate (CCC_91E & CCC_91F)
2009-2014: COPD & Emphysema combined (CCC_091)

Additional context

Add any other context or screenshots about the feature request here.

Additional instructions for derived variables

Derived variables use R code for more complex operations and multiple starting variables. Note: cchsflow currently uses only base R to derive variables. More complex derived variables that require dependencies are currently out-of-scope for cchsflow.

If possible, attach an .R file that includes documentation of the derived variable as per roxygen2 standards, along with the code to derive the variable. Include all starting variables.

Diet - SDS

Add 'simplified diet score' using CCHS 2007 as the reference population for means (normalization).

Low-income cut-off

Is it possible to replicate ratio of low-income cut-off (LICO)? Note that CCHS does not include a variable for family size.

`CCHS share` synthetic data

Add CCHS share synthetic data to develop and test transformations for the share version of the CCHS data.

  • 200 respondents
  • all variables

Share data will be identified with _s.
e.g. CCHS2009_s

Improper recoding with round brackets in recTo

Categorical variables that use round brackets in recTo to exclude end values result in observations being improperly recoded to NA(b).

Round brackets are often used when deriving categorical variables from existing variables where a particular category contains a range of values excluding the end value. The end value is then used as a starting value for the following category.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.