worldbank / dime-r-training Goto Github PK

View Code? Open in Web Editor NEW

109.0 18.0 67.0 273.3 MB

Dime Analytics R Training

License: MIT License

CSS 1.34% JavaScript 8.04% HTML 89.85% R 0.70% SCSS 0.06%

dime-r-training's Introduction

R for Stata users

Background

This material was developed by the DIME Analytics team as an introduction to R Statistical Package for its staff.

Course description

R is a programming language for statistical analysis and data science. It is a powerful and flexible tool widely used among statisticians and data scientists, and has a growing user base in economics research. This course is designed to familiarize participants with the language, focusing on common tasks and analysis in development research, and showing how to use R through RStudio, a popular integrated development environment for R. The course will build upon comparisons to Stata syntax and requires familiarity with the use of do-files, loops and macros. It also assumes some degree of familiarity with DIME's coding practices. All sessions are designed to last 90 minutes.

Training content

01 - Introduction to R

Introduction to the RStudio interface, R syntax, objects and classes.

Formats:

02 - Introduction to R programming

Code organization, R libraries, loops, custom functions, and R programming practices.

Formats:

03 - Data wrangling

Basic functions for processing data using the tidyverse meta library.

Formats:

04 - Data visualization

An introduction to creating and export graphs in ggplot2.

Formats:

05 - Descriptive analysis

How to create and export descriptive statistics table in R.

Formats:

06 - Geospatial data

An overview of R resources on GIS.

Formats:

07 - Introduction to R Markdown

An introduction to dynamic documents and R Markdown.

Formats:

License

This material is developed under MIT license. See http://adampritchard.mit-license.org/ or see the LICENSE file for details.

Main Contact

Luis Eduardo San Martin - [email protected]

Authors

Luiza Cardoso de Andrade
Marc-Andrea Fiorina
Robert A. Marty
Maria Reyes Retana Torre
Rony Rodriguez Ramirez
Luis Eduardo San Martin
Leonardo Teixeira Viotti

dime-r-training's People

Contributors

Stargazers

Watchers

dime-r-training's Issues

R Training - GIS

Map colors are ugly.
Put West Wing maps video back in

01-intro-to-R-part-1 review notes

Can the slides be aligned to the middle of the page instead of top?
Introduction - Might be better to phrase this part of the sentence and why it is so much better than Stata. as how Stata users can adapt utilize some features which are better in R

I notice this trend through all the intro slides. Don't think we mean to start a war between R and Stata (there's enough of that already!). If that was the plan all along, it's fine to leave those comments in but I think it might be a better approach to just point out how some of R's functions are better than R and let people decide which they prefer to use. Same for the wording of R vs Python

Introduction - move part II within brackets
Introduction - Certain common tasks are simpler in Stata. Mention a couple of examples.
Getting started - Ask participants to clone repository before pointing to file locations
Explain the interface before importing dataset and then go back to interface to explain where the data framex` shows
head(whr2015) and head(whr) use the exact same output image. Something is wrong there. Also, might be good to show both the outputs in the same slide
Data in R - Two important concepts to take note has 3 points

Add content on purrr

Let's try to add this to the last session for the next training after April 2021

Review Best Practices Session

2/7/2019 prepartion meetings

Proposed dates: 2nd to 18th of April

Transform Coding for Rep.(...) into Lab2 - Intro 2/ R 102 @luizaandrade

R training - 2018 Dec 20

Lab 1 review

Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR
Update folder paths
Romove (some) Stata jokes

R training - 2018 Nov 20

@ramarty, let's write any ideas, comments etc. that we might think of during the session in this issue thread. Anyone can add to this issue

R training - June 13, 2019

Descriptive analysis

Exercise 4: add a version of the solution using select instead of indexing
Exercise 5: use the out argument (export summary statistics)
Exercise 5: it's not clear that you should export to LaTeX

Visualization

Slide 34: How to remove the decimals from the year

Data processing - Feedback April 06 2021

R training-Spatial data

Additional resource: https://geocompr.robinlovelace.net/
Additional resource: https://serialmentor.com/dataviz/geospatial-data.html
For those participants that want to go further, it might be useful to do a tasks/assignment on geospatial data beyond the training and get feedback from the trainers.
Maybe adding a session on the sf() package.
Maybe add some slides on working with different spatial data sources and how to harmonize them. Or a bullet point on slide 54 on the sf() function to convert a data frame into geospatial data
The "introduction" from version 1 of the training (types of spatial data -vector-raster) might be useful to provide a big picture of the topic.

Feedback for session of 23 Nov 2020

Sessions: Intro I and Intro II

Lab 5 - Apr 30

Alternative way of loading shapefile: readOGR(file.path(finalData, "worldmap.shp"))
Add raster package to the list of packages
Make sure the packages we install are the same we load
geom_point() instead of geom_points()

Feedback - Geo Spatial - Dec 1 2020

R Training 1 - June 11, 2019

Tell people to write code in the script window

R training-Intro to R

Slide 12: Open datasets from the environment window
Slide 31: include matrix and array as type of objects
Slide 55: R for data science as resource https://r4ds.had.co.nz/

How to fix problem with tidyverse on macs

Installing Tidyverse: install.packages("dplyr", dep=T) worked for one person. A window appeared asking to install command line tools. Click "yes." install.packages("dplyr", dep=T) will need to be run again after the command line tools have installed.

R training - 2018 Dec 06

Don't repeat section on packages (move slides to the appendix -- we can still go over it if someone asks)
Move function inception to an earlier point. Give a better example (such as subsetting a data set when creating a graph) and discuss function results.
Help files: what are required arguments?

Notes for future trainings

Make sure to note that the spatial data session is an advanced session and requires attendance to previous classes
Where to discuss the way R treats missing values?
Data cleaning: instead of write.dta13, maybe use write_dta from the haven package? I think this is a tidyverse package -- and I think haven treats things like value labels & descriptions nicely
Data viz: maybe before teaching pdf(), dev.off(), teach ggsave() always with height/width? This could just be my preference in using ggsave... I'm wondering if this more common too? (also avoids issue with having a graphics device already open). Or maybe at least include a slide on ggsave?

Lab 3 review

Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR, e.g. treating outliers slide.
Check folder paths.
Add merging functions to presentation.
Include R's data format.

r-training-Nov-2020

Update slides to xaringan-metropolis theme:

Rwanda training - Intro to R I & II

List suggestions for training update

@luizaandrade : review content with policy-makers in mind
@luisesanmartin : think of how to remove Stata and LaTeX
@RRMaximiliano : adapt examples to use Rwanda data

To make edits, use this branch: https://github.com/worldbank/dime-r-training/tree/rwanda-2020

R training- Data processing

Slide 16: add the command skim (skimr package)

Feedback - Descriptive Analysis - Nov 30 2020

Feedback from Nov-2020 course

The course was held online. This feedback comes from the anonymized feedback survey results and from observations made by the DIME Analytics team that have not been applied yet to the materials.

Intro 2 session was the one attendants found the least useful. For future editions, we could consider spreading its content in the other sessions and increase the time of those.
The contents for Intro I, Intro II, and Data analysis are currently too much for 90 minutes sessions. For future editions we need to consider which ones we'll be moving to the annex.
Switching between the slides and the R window increased the difficulty of following the session. Some options:

Keep R/RStudio and the slides on the same screen at the same time
Focus on live code instead of the contents of the slides
Embed R in the presentation -- there's a way to do this if the presentations are html

Edit presentations so the code section doesn't use the fira font. The problem with it is that it merges sets of two characters like <- or == into a single one
Session 1: exercise 7 was confusing. Consider dropping it or adding additional instructions to clarify
Session 2: before explaining why we usually don't use the equivalents to cd() and clear all add a slide explaining why we don't want previous objects to be loaded when we start a new session
Session 2: add "raising hands" example when explaining apply() -- ask Luis Eduardo about this
Session 2 slide 47: change image for loading package from stargazer to tidyverse
Session 1: mention explicitly that this training assumes knowledge in Stata

Feedback - Session: Data Processing (24th Nov)

Session: Data Processing (Wrangling)

GIS: something in between the 2 sessions we've had

Keep all the code and material so people can copy it later, but make the part that is discussed in the session more step-by-step

Notes from 2nd session at NISR

  sapply(packages, function(x) {
      if (x %in% installed.packages() == FALSE) {
          install.packages(x, dependencies = TRUE) 
      }
    }
  )

This was a suggestion by @rmurenzi

Explaining object to Stata users

We started to talk about this already in the session, but I think it really helped to explain to Stata users what object are that I think it should go into the slide.

An object is like a global in Stata, but while you can only put a number or a string in a global, you can put anything into an object, strings, data sets, vectors, graphs. In Stata you can reference the global later in your code to get it's content, but with an object you can do so much more. Exactly what you can do with an object depends on what you saved in the object, not that it is an object.

Something like that.

Graph and table sessions: use built-in data

Notes from 1st session of NISR training

when to write code in the console vs the script?
what happens when you create two different objects with the same name?

R training - 2018 Dec 13

FCT 2019 changes

Consolidating previous issues

Overall:

Remove all mentions to master in presentations
Make sure to note that the spatial data session is an advanced session and requires attendance to previous classes #32

Intro I:

Explain how functions takes parameters, ordered parameters, named parameters (header = T) #18
Explain that functions returns values. explain that returned values can be applied directly to a function, you do not have to store in a local and then use in the next function #18

Data visualization:

change to ggsave() always with height/width #32
Descriptive analysis:

Data cleaning:

Discuss the way R treats missing values #32
Instead of write.dta13, maybe use write_dta from the haven package? #32

Descriptive analysis:

In exercise 3 (stargazers and labels) #18

make it clear that they should go back to the code they just run with stargazer and modify it so that it takes the labels as an argument.
in the solution, do not use obj rawOutput in the file.path() function, as we have not introduced that object in class

Slide with example for aggregate(): #18

the first slide has outdated syntax
this exercise is also called exercise 3 as well
did not catch what it was, but there was some type of typo in by(region)

Lab 4 review

Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR
Remove latex
Expand Create any table exercise
Better function to export to excel

Lab 5 review

Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR

Lab 4 - Apr 25

explain that whr is the data in slide 21
several typos in slide 21
Ex 5: show how data looks like and put aggregate and ggplot code in the same slide
Ex 6: Lots of typos (region not treatment
ggplot different datasets, move aes() to ggplot function?

Session 3 - Data Viz - Review

Slide 5: include the same file paths as in previous sessions, so people can reuse their code
Is there a particular reason why sometimes we see ggplot and sometimes {ggplot}?
Slide 17: you can also indicate the data inside the aesthetics. Explain that both will work, but the this will make a difference when you want to use more than one dataset or more than one mapping. Then add an example.
Slide 19: this link is super cool! I suggest you open it and scroll a bit.
Slide 26: This is a great opportunity to show the difference between using colors for numeric and factor variables. I think we should first do it without the factor, then ask how many people made the same graph, than ask what is the problem with the graph (that is, that there's little variation in the color, so it's hard to tell them apart) and show the solution of using factors
Slide 36: I expect the first thing people will ask is how to remove the legend, so maybe put that on the next slide?
Can we also add some color palettes? Doesn't need to be an exercise, but just do the same graph with a few different palettes is enough.

Keep slides 1-12
Change slide 6 to make it similar to other labs
Using admin data
Interactive map

Lab 2: process spatial data

Spatial data frame: create a polygon from scratch
Spatial operations

Intro to R - Feedback April 05 2021

Review Spatial Data presentation

Comments from @kbjarkefur :

Important to talk about projection and two data types, but do have it the first thing. It just throw people off. Have them open a data set thirst so that the participants have something tangible on their computer, before talking about abstract things
Go through a bit of terminology. I think some people were in this class to learn R, and I am not sure if all of them knew topics like:
spatial analysis - explain that it is analysis of mapped data
polygons
Since some parts so heavily depend on ggplot maybe it is worth doing a quick recap. Maybe just remind people that it is a graphing tool, and perhaps remind them of an option or two there

Comments from @MRuzzante:

It would be nice to store every map which is plotted in an object (say map_1, map_2, etc.) consistently with what @luizaandrade did in Lab 3, so people can then print all of them and see what was done in the training
Also, you could add some "bonus track" at the end (can be appendix) on embedding leaflet in a Shiny app

Comments from myself:

Possibility: Make this a session about data visualization on maps:

Move parts about raster to appendix
Move parts about processing shapefiles to appendix
Present it right after the ggplot session, since it has a lot of related content

Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR
Update folder paths in slides
Remove stata joke from packages slide
Add explanation to character.only argument when using sapply to load packages

R training - 2018 Nov 29 - Data Visualization

This is for suggestions or bugs found during the training

Add Analysis Session

Regressions: felm package, clustering, logit
Access results
Export results
Power calculations

Lab 3 - April 23

Ask in the beginning of the session if people use LaTeX or Excel
Explain how R treats missing observations
Exercise 4:
Mention out argument
Replace RawOutput by "your/file/path/here"
Exercise 5 :
Fix stata code
Exercise 9 : Typo
Add a Thank you slide

Feedback - Data Visualization - Nov 30 2020

R training-Data visualization

Maybe an additional slide with the types of plots (geom_point, geom_line, geom_density, etc.) and color palettes
For those who want to go further it might be useful to develop some visualizations beyond the training and get feedback from the instructors
The exercise looks good, but I'd provide two prepared datasets for visualizations so participants are able to work more in the code rather than in data cleaning. Besides, they could share their plots in GitHub or in a shared folder and have a brief peer-review session for feedback.

worldbank / dime-r-training Goto Github PK

dime-r-training's Introduction

R for Stata users

Background

Course description

Training content

License

Main Contact

Authors

dime-r-training's People

Contributors

Stargazers

Watchers

Forkers

dime-r-training's Issues

Descriptive analysis

Visualization

Comments from @kbjarkefur :

Comments from @MRuzzante:

Comments from myself:

Recommend Projects

Recommend Topics

Recommend Org