Giter VIP home page Giter VIP logo

r4epi's Introduction

R4Epi hex logo

This repository is for the R for Epidemiology electronic textbook. This electronic book was originally created to accompany my Introduction to R Programming for Epidemiologic Research course at the University of Texas Health Science Center School of Public Health. However, I hope it will be useful to anyone who is interested in R and epidemiology.

Useful sites:

Tasks are located at: https://github.com/orgs/brad-cannell/projects/3
Bookdown help: https://bookdown.org/yihui/bookdown/

Textbook version Notes:

  • Major: physical copy editions
  • Minor: new chapters, deletion of chapters, chapter reordering.
  • 3rd level: significant edits to existing chapters
  • Version number doesn’t change with typo (I.e., spelling and grammar) corrections.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

r4epi's People

Contributors

angie-benedetto avatar berdeguea0367 avatar bhatiasundip avatar brad-cannell-test-user avatar cegepi avatar cjcotter avatar edambo avatar erinplaw avatar grifbai avatar hillkristinad avatar kayagrocott avatar mateussfigueiredo avatar mbcann01 avatar mbh038 avatar mpatel4321 avatar ratterstrom avatar shirlyns avatar williamlai2 avatar yiqunwangmeow avatar yuiiwase avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

r4epi's Issues

Move over front matter to Quarto

Overview

Move over the front matter from index.Rmd to this repository.

Although it isn't directly related to moving over index, there are a number of issues in the R4Epi project related to the functionality of Quarto. We should probably get those knocked out before

Tasks

  • Create a glossary, add hyperlinked keywords, and maybe a note to NOTES (#97)
  • Add reference page
  • Finish moving over the rest of the content for the contributing chapter
    • Revise the text
    • Republish so that we can get accurate screenshots
    • Update screenshots as needed
    • Move over the Issues section
  • Move over the License information section
  • Move over the About the authors page

Add skip patterns chapters

The PowerPoints should be pretty easy to adapt. I just didn't have time in Summer 2020. I assigned them the videos on YouTube instead.

Review and improve the Populations chapter

Overview

In the Fall of 2023, I moved over a bunch of stuff from PowerPoint slides (nearly) verbatim. I was in a rush, so I told myself to move it just move it over and improve it later.

Go back, reread, and improve. PowerPoint doesn't always translate perfectly to book format.

Population plots

In the future, I may actually want to show readers how to make population plots. That may be useful information in a book about using R to do applied epidemiology. Perhaps just add the functions to the appendix?

Clean up the wiki

Overview

I'm trying to create a wiki that will help me (and any other potential coauthors) create and revise content for the book in a more efficient and consistent way.

On 2022-12-14, I moved over a lot of the content from two Google Docs that were sort of serving the same purpose (R for Epidemiology Textbook Notes and 📚Textbook). However, it needs a lot of cleaning up. Some of the content is outdated, so notes were just quickly jotted down, and the organization isn't good.

Task list

  • Figure out the nuts and bolts of building the wiki (#82, #78, #79)
  • Complete first draft of formatting page
  • Complete first draft of ideas page
  • Complete first draft of references page

Left off at...

I was working on this and then jumped into #82

Add version number conventions to README

Textbook versions:

  • Major: physical copy editions
  • Minor: new chapters, deletion of chapters, chapter reordering.
  • 3rd level: significant edits to existing chapters
  • Version number doesn’t change with typo (I.e., spelling and grammar) corrections. 

Revise to intro to epi chapter

Overview

I want to add all the modules from the Fall 2022 Epidemiology III class to the epidemiology half of the book.

The plan for right now is just to get the slides into Rmd format wholesale. I can make them better later.

Left off

  • Reading through the Intro to epi chapter.
  • I got sidetracked creating the wiki. Come back to this when the wiki is done.
  • I turned the entire PowerPoint into images that can be added. I need to trim out the slides I don't actually need and give the slides that I will use more informative names.

Tasks

  • Revise the intro to epi page. There was some stuff on the Google Doc that made me feel like we the intro section needs more development.

Expand discussion of vector types in section 5.2

Overview

From CRC Press review: One suggestion I have is to maybe expand a bit the vector types in section 5.2 and introduce vectors of characters and factors. Factor vectors are introduced in section 19, which seems rather late, since the importing part often involves messing with character and/or factor variables/columns.

Add vocabulary to Using R for Epidemiology chapter

We may want to introduce some basic vocabulary very early in the book. This is not a complete appendix, it's just a short review chapter that will get us up to speed. Here is a running list of potential words to start with:

  • Model
  • Distribution
  • Sample
  • Study design
  • Primary and secondary data
  • Observe-sort of implies that we're counting

Turn downloading and installing R and RStudio into an appendix

Overview

I was reading some samples of books on Power BI and SharePoint on my Kindle last night. It was annoying how they all started with chapters on installing the software. Then, I realized that R4Epi does the same thing. Let's keep the downloading and installing material for people who want it, but let's make it an appendix. That way, people who don't need that content can jump directly into something meatier.

Write first draft of Random Error chapter

Overview

Fall 2023

This content needs to come before we start calculating confidence intervals for anything. Given that the source of random error we focus on the most is sampling variability, perhaps it should come right after our discussion of populations. We can then immediately start calculating confidence intervals in the measures of occurrence chapter. This will also mean that we won't have learned any measures that we can use for examples in the random error chapter. So, we would just have to say something like, "don't worry about the measure for now. Just interpret the p-value and/or confidence interval." I don't think I like that. So, let's keep random error between measures of occurrence and measures of association.

  • I didn't have time to write this chapter. I assigned Modern Epidemiology chapter 15 instead.
  • My intent is to create a lab warm-up in PowerPoint that can serve as a foundation for the random error chapter in R4Epi.
  • If I get that PowerPoint made this week, then I can start the chapter by moving that content over.
  • This task is related to brad-cannell/epi_3_public#26

Terms/concepts to include

  • For each of the measures covered in measures of occurrence, show them how to calculate the confidence interval, p-value, and p-value curve for each measure.
  • What is random error? Chance vs. deterministic
  • P-value curves (https://paperpile.com/app/p/9c5734e3-eace-4919-9300-f8c046bcdd5d)
  • Sample size effect on p-values, confidence intervals, and p-value curves
  • Simulate differential and non-differential misclassification using the methods in Rudolph and Fox, example 2.

Tasks

  • Currently, the lab warm-up R code uses a regression model to demonstrate p-values, but we haven't yet covered the regression. I should probably come back and change this to some measure from measures of occurrence, which we have already measured.

Add executable embedded R code chunks throughout the chapters

Overview

I want to embed interactive coding practice blocks and quiz questions in the chapters.

Best solution for embedding R coding exercises into a book down book: https://rstudio.github.io/learnr/. It doesn’t look like you can currently add interactive quiz questions directly into bookdown books. I think the best you can currently do is build a learnr app with shiny, post it to shinyapps.io, and then add links to shinyapps.io into your bookdown book. Alternatively, you could create an R4Epi package that includes data, interactive tutorials, and automatically downloads freqtables and meantables as dependencies. Eventually, this may be the sort of thing you want to charge for.

See if Quarto changes this.

2023-01-23: Brian Law suggested I try WebR. He just warns that it is "very beta" right now.

Chapter 35.2 Across with filter

Hi,

"Chapter 35.2 Across with filter" needs to be revised. The reason is that usage of across()infilter()` is deprecated.
For instance, this code in the book

df_xyz %>% 
  filter(
    across(
      .cols = everything(),
      .fns  = ~ !is.na(.x)
    )
  )

will generate the following message:
Using across() in filter() is deprecated, use if_any() or if_all().

Kind regards,
Leyla

Switch from magrittr pipe to base R pipe

Overview

From CRC Press review: The authors also use the magrittr pipe, %>%, rather than the base R pipe, |>. It might be worth mentioning both in chapter 11 and briefly comparing them (or pointing the reader to an external source for further details).

Add a section on using RStudio's Find and Replace Tool

Overview

I actually use the find and replace tool quite a bit. In the Intro to R class, I teach students how to use it the Find and Replace Tool to make it easier to copy and paste data into RStudio. I think I should also add this into the textbook.

Scenarios:

  • Add commas between values in a vector.
  • Add spaces and commas to a data frame -- use a baby example data frame.
  • Changing the name of a variable or data frame.
  • Regular expressions?

Terms to add to the measures of occurrence chapter

Overview

In the Fall of 2023, I was adding the content from PowerPoint to the book. There were some hidden slides with terms I wanted to add to the chapter, but hadn't gotten around to yet. I'm writing them below in hopes that I will get time to add them sometime soon.

Terms

In looking through this list, some of these are not appropriate for the measures of occurrence chapter. I need to move them to a different list at some point.

  • Counts
  • Numerator/denominator
  • Define ratio
  • Define proportion
  • Define probability 
  • Conditional probability
  • Define odds
  • Relationship between continuous variables and event occurrence.
  • Dummy variables
  • When to group as “any “
  • Prevalence (Point, period, cumulative lifetime)
  • Incidence
  • Cumulative incidence (incidence proportion)
  • Survival analysis
  • Censoring
  • Life table
  • Kaplan-Meier method
  • Person-time
  • Incidence rate (incidence density)
  • Hazard rate

Figure out how to automatically check for broken links

Overview

I'd like to implement some kind of automatic checks for broken links in the book. There is an open issue requesting an automatic URL checker on Quarto's GitHub. That GitHub thread also recommends this website for doing manual checks.

Currently, I'm using the Test Quarto Book to experiment.

Left off at

Still looking for an automated solution. Try continuous integration?

Tasks

  • Figure out why the "Edit page on Github" links aren't working. I ran the 2023-07-12 of Test Quarto Book through the link checker and none of those links were working.
  • Find a more automated, R-like way of checking for broken links than manually checking in dead link checker.

Edit instructions for changing preferences on Mac

Overview

Section 3.5 discusses how to change global options in RStudio.

Clicking on the Apple menu and then Preferences no longer works. Now, Mac users need to click on Global options... in the Tools menu just like Windows users.

Image

Tasks

  • Update the text
  • Add a new sreenshot

Incorporate a WebR practice exercise into the book

Overview

WebR allows users to R in the browser. Can I use it to add practice exercises, with feedback, directly into R4Epi?

I have a working example in Test Quarto Book

Useful websites/resources

Tasks

  • Figure out what to do about PDF format
  • Figure out if you can pass data to webR code chunks
  • Figure out if you can make multiple choice questions with webR chunks

Review and improve the measures of occurrence chapter

Overview

In the Fall of 2023, I moved over a bunch of stuff from PowerPoint slides (nearly) verbatim. I was in a rush, so I told myself to move it just move it over and improve it later.

Go back, reread, and improve. PowerPoint doesn't always translate perfectly to book format.

Tasks

  • Add life table methods to this module (starting on Szklo and Nieto pg. 52).
  • Add tables like Exhibit 2-1, Exhibit 2-2, and Exhibit 2-3 from Epidemiology: Beyond the Basics.
  • Change the values in the hypothetical population of 10 people. Currently, it is identical to the values used in figure 4-1 of Modern Epidemiology (pg 54)
  • Add more "here's what we did above" text to the code chunks.
  • In Fall 2023, I didn't have enough time to fully adapt a few sections of the chapter (e.g., relation between incidence and prevalence) that I wanted. The code for those sections is pasted in a comment below titled "Add these sections back". Add them back to the book and adapt the wording to be your own.
  • There was some stuff in the Google Doc that you may or may not want to use.
  • Make a table of key terms at the end of the chapter.
  • See if there is anything you can quickly add from #105
  • Add a chapter summary.
  • Add sensitivity and specificity

Review and improve the Using R for Epidemiology Chapter

Overview

In the Fall of 2023, I moved over a bunch of stuff from PowerPoint slides (nearly) verbatim. I was in a rush, so I told myself to move it just move it over and improve it later.

Go back, reread, and improve. PowerPoint doesn't always translate perfectly to book format.

Tasks

  • Move the probability and conditional probability stuff out of measures of association and into intro to epi. We need to be able to discuss probabilities early on. Especially in the random error module.

Test using a Qmd file

Overview

We can create the README markdown file using an Rmd file with the output set to github_document. I wonder if we can use a qmd document to create the markdown pages used in the wiki?

Potential advantages

  • We can add R code/output to the wiki
  • We can add font awesome icons to the wiki

Footer and sidebar

I explored generating the footer and sidebar markdown documents from a qmd document, but at this point, I don't see any advantage to doing so. I'm going to continue writing those in pure markdown for now.

Tasks

  • Convert Home
  • Convert Formatting
  • Convert Ideas

Left off

When I left off, I had just finished converting formatting and ideas. I want to double-check everything before I close out this issue. If it all looks good, then move on to cleaning up the text.

Review and improve the measures of association chapter

Overview

In the Fall of 2023, I moved over a bunch of stuff from PowerPoint slides (nearly) verbatim. I was in a rush, so I told myself to move it just move it over and improve it later.

Go back, reread, and improve. PowerPoint doesn't always translate perfectly to book format.

Left off

2023-09-25

  • Finished the first draft of the chapter. There is lots of room for improvement.

Tasks

  • Add life table and Kaplan-Meier table calculations. There are slides and R code in the cohort studies module.
  • In Fall 2023, I was just trying to get everything moved over from PP. It's probably worth rereading for clarity and figuring out if there are places where it would make sense to replace PP slide images with native R images.
  • Add a discussion of relative and absolute differences (took this out in Fall 2023). See slides (measures_of_association_tree_01, measures_of_association_tree_02, and measures_of_association_tree_03)
  • Add difference between relative and absolute difference example from GPLI presentation
  • Use Rothman's investment analogy for absolute vs. relative differences (see below)
  • Add a terminology table for probabilities (see below)
  • The slides for null values and no association are heavily geared toward incidence measures. However, they could also be about prevalence measures. l should make these more general.
  • I deleted the slides about null values (incidence_proportion_difference_null, incidence_proportion_ratio_null, and odds_ratio_null). I don't dislike them. I think they are good, but I was having trouble describing them. You may want to add them back.
  • Show readers how to interpret relative measures < 1
  • Show readers how to interpret absolute measures < 0
  • Show readers how to calculate confidence intervals, p-values, and p-value curves for each measure of association.
  • Add a terminology table for measures of association. For example, incidence proportion ratio is also called risk ratio and relative risk.
  • Demonstrate the equivalence between exposure OR and outcome OR
  • Break probability off into its own chapter

Add a section on writing commit messages

  • Do this after you complete the chapter on git and GitHub.
  • Then, go back and link the text in intro to git and github that says "That way, it will be easy to find that version in the future if we ever need to refer to it (assuming we give it an informative name)." to the newly created section on writing good commit messages.

Add a lot more general information about ggplot2

Overview

We really don't have much explanation about how to use ggplot2 in the book. There are just a couple of plots with minimal explanation. I don't think we want to cover all of the basics of ggplot2, instead we should just refer them to Hadley's book (https://ggplot2-book.org/index.html). However, learners have repeatedly asked for more information than we currently give them.

In this part:

  1. Cover the very basics of the grammar of graphics and how ggplot2 works.
  2. Numeric descriptions of variables

In the presenting results part:

  1. Formatting ggplots and making them pretty
  2. Other types of plots

Try making a new repository for a quarto version of R4Epi

Overview

I was reading about Quarto Books last night. I think we may want to try making a version of R4Epi that is created with Quarto. However, I think it's best if we use a totally new project/repository for this.

Additionally, you may first want to experiment with R Notes Bookdown.

Why?

New Repositories

Add Doug's info to about the authors

https://sph.emory.edu/faculty/profile/index.php?FID=melvin-livingston-8970

Add check on learning to the end of each chapter (learnr)

Overview

I think it would be great to add COL questions to the end of each section or chapter like Hadley does in R4DS. To begin with, the questions can be the same as the questions we use for COL quizzes in Canvas.

How?

Use learnr

Using the learnr package seems like a good place to start. However, I'm pretty sure there are big limitations as to what we can add directly to the book. One potential option is to create an accompanying package that only contains quiz questions and add links to those quiz questions into R4Epi.

    • Name the package exerciser4epi

Use webr

Brian Law from Posit suggested looking into the webr packageas well. Although, he warns that it is very beta right now.

Make an R style guide appendix

Overview

Style, and misuses of it, are one of the biggest issues I see with student code. I think it would be helpful to create a style guide appendix. I imagine it would be similar to the Tidyverse style guide.

I don't think this should replace the chapter on style best practices. It should augment it. Tell them to use this appendix to look things up.

You may want to include something like this snippet you wrote for your Power Automate wiki:

This page will serve as a style guide for authoring this wiki and for authoring Power Automate flows. The ultimate goal of a style guide is to reduce cognitive load, and as a result, make it easier to write - text or code. How does a style guide do this? First, it reduces the number of choices you have to make as you are writing. For example, "should I write this variable name in snake case or camel case?" Second, having the predetermined choices written down for references reduces the amount of information you need to store in your intentional memory (i.e., "OK, remember to always use snake case"); although, it may eventually bleed over into your incidental memory. Third, having uniformly styled text/code makes it easier for others -- including future you -- to read. You can focus on the content instead of the style and/or organization.

Turn measures of occurrence module into a chapter

Overview

I want to add all the modules from the Fall 2022 Epidemiology III class to the epidemiology half of the book.

The plan for right now is just to get the slides into Rmd format wholesale. I can make them better later.

Left off

2023-09-05
Working on moving over the PP slides.

  • Left off at slide 2 and line 127 of 03_measures_of_occurrence.
  • Just get the PowerPoint slides moved over. Improve later.

Tasks

  • Move usable parts of PowerPoint slides over to Rmd.
  • Create a first draft of this chapter.
  • Add to reading list.
  • Post announcement

Update the language in the section on Tidy Evaluation

The language about non-standard evaluation in the Tidy Evaluation section of the introduction to repeated operations chapter isn't wrong, but I think the language and examples used in the rlang data-masking article is probably more helpful. Let's update that language.

From Advanced R Second Edition:

Closely related to metaprogramming is non-standard evaluation, NSE for short. This term, which is commonly used to describe the behaviour of R functions, is problematic in two ways. Firstly, NSE is actually a property of the argument (or arguments) of a function, so talking about NSE functions is a little sloppy. Secondly, it’s confusing to define something by what it’s not (standard), so in this book I’ll introduce more precise vocabulary.

Create a cross-referenced glossary

Overview

  • I was working on moving the front matter from the original bookdown Rmd documents to the new qmd documents (#96).
  • As I was doing so, I decided it would be a good time to double-check and enforce our conventions for emphasizing text.
  • That led me to notice that a bolded word in contributing.qmd should really be hyperlinked to the glossary.
  • That led me to try to figure out how to make that happen.

I don't think we can link random words in the glossary. However, I think we may be able to make the glossary words headers, style them with CSS, then link words in the chapters to those headers in the glossary.

Useful links

Left off at

2023-07-19: I think this is working now, thanks to this SO post.

Tasks

  • Get cross-references to work in cross_references.qmd of test quarto book.
  • Link at least one word to the glossary in the test quarto book.
  • Figure out how to link to words in the glossary without them showing up in the PDF table of contents.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.