Giter VIP home page Giter VIP logo

dime-r-training's Introduction

R for Stata users

Background

This material was developed by the DIME Analytics team as an introduction to R Statistical Package for its staff.

Course description

R is a programming language for statistical analysis and data science. It is a powerful and flexible tool widely used among statisticians and data scientists, and has a growing user base in economics research. This course is designed to familiarize participants with the language, focusing on common tasks and analysis in development research, and showing how to use R through RStudio, a popular integrated development environment for R. The course will build upon comparisons to Stata syntax and requires familiarity with the use of do-files, loops and macros. It also assumes some degree of familiarity with DIME's coding practices. All sessions are designed to last 90 minutes.

Training content

01 - Introduction to R

  • Introduction to the RStudio interface, R syntax, objects and classes.

Formats:

02 - Introduction to R programming

  • Code organization, R libraries, loops, custom functions, and R programming practices.

Formats:

03 - Data wrangling

  • Basic functions for processing data using the tidyverse meta library.

Formats:

04 - Data visualization

  • An introduction to creating and export graphs in ggplot2.

Formats:

05 - Descriptive analysis

  • How to create and export descriptive statistics table in R.

Formats:

06 - Geospatial data

  • An overview of R resources on GIS.

Formats:

07 - Introduction to R Markdown

  • An introduction to dynamic documents and R Markdown.

Formats:

License

This material is developed under MIT license. See http://adampritchard.mit-license.org/ or see the LICENSE file for details.

Main Contact

Luis Eduardo San Martin - [email protected]

Authors

  • Luiza Cardoso de Andrade
  • Marc-Andrea Fiorina
  • Robert A. Marty
  • Maria Reyes Retana Torre
  • Rony Rodriguez Ramirez
  • Luis Eduardo San Martin
  • Leonardo Teixeira Viotti

dime-r-training's People

Contributors

kbjarkefur avatar leonardoviotti avatar luisesanmartin avatar luizaandrade avatar mariaarnalcanudo avatar mariarrt94 avatar mfiorina avatar mrimal avatar ramarty avatar rrmaximiliano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dime-r-training's Issues

01-intro-to-R-part-1 review notes

  • Can the slides be aligned to the middle of the page instead of top?

  • Introduction - Might be better to phrase this part of the sentence and why it is so much better than Stata. as how Stata users can adapt utilize some features which are better in R

I notice this trend through all the intro slides. Don't think we mean to start a war between R and Stata (there's enough of that already!). If that was the plan all along, it's fine to leave those comments in but I think it might be a better approach to just point out how some of R's functions are better than R and let people decide which they prefer to use. Same for the wording of R vs Python

  • Introduction - move part II within brackets

  • Introduction - Certain common tasks are simpler in Stata. Mention a couple of examples.

  • Getting started - Ask participants to clone repository before pointing to file locations

  • Explain the interface before importing dataset and then go back to interface to explain where the data framex` shows

  • head(whr2015) and head(whr) use the exact same output image. Something is wrong there. Also, might be good to show both the outputs in the same slide

  • Data in R - Two important concepts to take note has 3 points

Add content on purrr

Let's try to add this to the last session for the next training after April 2021

Lab 1 review

  • Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR
  • Update folder paths
  • Romove (some) Stata jokes

R training - 2018 Nov 20

@ramarty, let's write any ideas, comments etc. that we might think of during the session in this issue thread. Anyone can add to this issue

R training - June 13, 2019

Descriptive analysis

  • Exercise 4: add a version of the solution using select instead of indexing
  • Exercise 5: use the out argument (export summary statistics)
  • Exercise 5: it's not clear that you should export to LaTeX

Visualization

  • Slide 34: How to remove the decimals from the year

R training-Spatial data

  • Additional resource: https://geocompr.robinlovelace.net/

  • Additional resource: https://serialmentor.com/dataviz/geospatial-data.html

  • For those participants that want to go further, it might be useful to do a tasks/assignment on geospatial data beyond the training and get feedback from the trainers.

  • Maybe adding a session on the sf() package.

  • Maybe add some slides on working with different spatial data sources and how to harmonize them. Or a bullet point on slide 54 on the sf() function to convert a data frame into geospatial data

  • The "introduction" from version 1 of the training (types of spatial data -vector-raster) might be useful to provide a big picture of the topic.

Lab 5 - Apr 30

  • Alternative way of loading shapefile: readOGR(file.path(finalData, "worldmap.shp"))
  • Add raster package to the list of packages
  • Make sure the packages we install are the same we load
  • geom_point() instead of geom_points()

How to fix problem with tidyverse on macs

Installing Tidyverse: install.packages("dplyr", dep=T) worked for one person. A window appeared asking to install command line tools. Click "yes." install.packages("dplyr", dep=T) will need to be run again after the command line tools have installed.

R training - 2018 Dec 06

  • Don't repeat section on packages (move slides to the appendix -- we can still go over it if someone asks)

  • Move function inception to an earlier point. Give a better example (such as subsetting a data set when creating a graph) and discuss function results.

  • Help files: what are required arguments?

Notes for future trainings

  • Make sure to note that the spatial data session is an advanced session and requires attendance to previous classes
  • Where to discuss the way R treats missing values?
  • Data cleaning: instead of write.dta13, maybe use write_dta from the haven package? I think this is a tidyverse package -- and I think haven treats things like value labels & descriptions nicely
  • Data viz: maybe before teaching pdf(), dev.off(), teach ggsave() always with height/width? This could just be my preference in using ggsave... I'm wondering if this more common too? (also avoids issue with having a graphics device already open). Or maybe at least include a slide on ggsave?

Lab 3 review

  • Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR, e.g. treating outliers slide.
  • Check folder paths.
  • Add merging functions to presentation.
  • Include R's data format.

r-training-Nov-2020

Update slides to xaringan-metropolis theme:

  • Intro to R - I
  • Intro to R - II
  • Data processing
  • Data visualization
  • Exploratory analysis
  • Geospatial data

Feedback from Nov-2020 course

The course was held online. This feedback comes from the anonymized feedback survey results and from observations made by the DIME Analytics team that have not been applied yet to the materials.

  • Intro 2 session was the one attendants found the least useful. For future editions, we could consider spreading its content in the other sessions and increase the time of those.
  • The contents for Intro I, Intro II, and Data analysis are currently too much for 90 minutes sessions. For future editions we need to consider which ones we'll be moving to the annex.
  • Switching between the slides and the R window increased the difficulty of following the session. Some options:
  1. Keep R/RStudio and the slides on the same screen at the same time
  2. Focus on live code instead of the contents of the slides
  3. Embed R in the presentation -- there's a way to do this if the presentations are html
  • Edit presentations so the code section doesn't use the fira font. The problem with it is that it merges sets of two characters like <- or == into a single one
  • Session 1: exercise 7 was confusing. Consider dropping it or adding additional instructions to clarify
  • Session 2: before explaining why we usually don't use the equivalents to cd() and clear all add a slide explaining why we don't want previous objects to be loaded when we start a new session
  • Session 2: add "raising hands" example when explaining apply() -- ask Luis Eduardo about this
  • Session 2 slide 47: change image for loading package from stargazer to tidyverse
  • Session 1: mention explicitly that this training assumes knowledge in Stata

Notes from 2nd session at NISR

  • Add stata code for loops:
  • lwh_panel vs panel in Lab 3
  • Exercise 1 in lab 3 is hard to follow. Do it with the group instead of as an exercise
  • Use smaller datasets for Lab3. Takes a while to load, cannot use View() and crashes R constantly
  • Remove grep exercise, it's too confusing
  • In the indexing exercise of Lab 3, create just one vector called keep
  • income_vars was assigned twice in line 14 of the Exercise 14
  • the use of indexing to create the income_vars vector creates too many problems
  • giving people a Script that breaks can be a problem at this stage
  • maybe change the order or the labs?
  • Invert if statements and loops: use new install.packages condition
  sapply(packages, function(x) {
      if (x %in% installed.packages() == FALSE) {
          install.packages(x, dependencies = TRUE) 
      }
    }
  )

This was a suggestion by @rmurenzi

Explaining object to Stata users

We started to talk about this already in the session, but I think it really helped to explain to Stata users what object are that I think it should go into the slide.

An object is like a global in Stata, but while you can only put a number or a string in a global, you can put anything into an object, strings, data sets, vectors, graphs. In Stata you can reference the global later in your code to get it's content, but with an object you can do so much more. Exactly what you can do with an object depends on what you saved in the object, not that it is an object.

Something like that.

FCT 2019 changes

Consolidating previous issues

Overall:

  • Remove all mentions to master in presentations
  • Make sure to note that the spatial data session is an advanced session and requires attendance to previous classes #32

Intro I:

  • Explain how functions takes parameters, ordered parameters, named parameters (header = T) #18
  • Explain that functions returns values. explain that returned values can be applied directly to a function, you do not have to store in a local and then use in the next function #18

Data visualization:

  • change to ggsave() always with height/width #32
    Descriptive analysis:

Data cleaning:

  • Discuss the way R treats missing values #32
  • Instead of write.dta13, maybe use write_dta from the haven package? #32

Descriptive analysis:

  • In exercise 3 (stargazers and labels) #18
  • make it clear that they should go back to the code they just run with stargazer and modify it so that it takes the labels as an argument.
  • in the solution, do not use obj rawOutput in the file.path() function, as we have not introduced that object in class
  • Slide with example for aggregate(): #18
  • the first slide has outdated syntax
  • this exercise is also called exercise 3 as well
  • did not catch what it was, but there was some type of typo in by(region)

Lab 4 review

  • Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR
  • Remove latex
  • Expand Create any table exercise
  • Better function to export to excel

Lab 5 review

  • Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR

Lab 4 - Apr 25

  • explain that whr is the data in slide 21
  • several typos in slide 21
  • Ex 5: show how data looks like and put aggregate and ggplot code in the same slide
  • Ex 6: Lots of typos (region not treatment
  • ggplot different datasets, move aes() to ggplot function?

Session 3 - Data Viz - Review

  • Slide 5: include the same file paths as in previous sessions, so people can reuse their code
  • Is there a particular reason why sometimes we see ggplot and sometimes {ggplot}?
  • Slide 17: you can also indicate the data inside the aesthetics. Explain that both will work, but the this will make a difference when you want to use more than one dataset or more than one mapping. Then add an example.
  • Slide 19: this link is super cool! I suggest you open it and scroll a bit.
  • Slide 26: This is a great opportunity to show the difference between using colors for numeric and factor variables. I think we should first do it without the factor, then ask how many people made the same graph, than ask what is the problem with the graph (that is, that there's little variation in the color, so it's hard to tell them apart) and show the solution of using factors
  • Slide 36: I expect the first thing people will ask is how to remove the legend, so maybe put that on the next slide?
  • Can we also add some color palettes? Doesn't need to be an exercise, but just do the same graph with a few different palettes is enough.

Develop evaluation

Ideally, we would use one test to allows us to sort people between two groups: completely new to code and familiar with code. We should also be able to use the same test for a pre and post evaluation.

Lab 6 Review

Lab 1: Display spatial data

  • Keep slides 1-12
  • Change slide 6 to make it similar to other labs
  • Using admin data
  • Interactive map

Lab 2: process spatial data

  • Spatial data frame: create a polygon from scratch
  • Spatial operations

Review Spatial Data presentation

Comments from @kbjarkefur :

  • Important to talk about projection and two data types, but do have it the first thing. It just throw people off. Have them open a data set thirst so that the participants have something tangible on their computer, before talking about abstract things

  • Go through a bit of terminology. I think some people were in this class to learn R, and I am not sure if all of them knew topics like:

  • spatial analysis - explain that it is analysis of mapped data

  • polygons

  • Since some parts so heavily depend on ggplot maybe it is worth doing a quick recap. Maybe just remind people that it is a graphing tool, and perhaps remind them of an option or two there

Comments from @MRuzzante:

  • It would be nice to store every map which is plotted in an object (say map_1, map_2, etc.) consistently with what @luizaandrade did in Lab 3, so people can then print all of them and see what was done in the training

  • Also, you could add some "bonus track" at the end (can be appendix) on embedding leaflet in a Shiny app

Comments from myself:

Possibility: Make this a session about data visualization on maps:

  • Move parts about raster to appendix
  • Move parts about processing shapefiles to appendix
  • Present it right after the ggplot session, since it has a lot of related content

Lab 2 Review

  • Check if there's anything too specific to DIME's workflow that wouldn't be relevant to NISR
  • Update folder paths in slides
  • Remove stata joke from packages slide
  • Add explanation to character.only argument when using sapply to load packages

Add Analysis Session

  • Regressions: felm package, clustering, logit
  • Access results
  • Export results
  • Power calculations

Lab 3 - April 23

  • Ask in the beginning of the session if people use LaTeX or Excel
  • Explain how R treats missing observations
  • Exercise 4:
  • Mention out argument
  • Replace RawOutput by "your/file/path/here"
  • Exercise 5 :
  • Fix stata code
  • Exercise 9 : Typo
  • Add a Thank you slide

R training-Data visualization

  • Maybe an additional slide with the types of plots (geom_point, geom_line, geom_density, etc.) and color palettes

  • For those who want to go further it might be useful to develop some visualizations beyond the training and get feedback from the instructors

  • The exercise looks good, but I'd provide two prepared datasets for visualizations so participants are able to work more in the code rather than in data cleaning. Besides, they could share their plots in GitHub or in a shared folder and have a brief peer-review session for feedback.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.