Giter VIP home page Giter VIP logo

gesis's Introduction

Gesis

CRAN_Status_Badge

Introduction

The GESIS Data Catalogue offers a repository of approximately 5,000 datasets.

To install the package from github:

# install.packages("devtools")
devtools::install_github("expersso/gesis")
library(gesis)

A simple example

We start by listing all available groups of studies:

groups <- get_study_groups()
head(groups, 10)
##    group_no                                        value
## 1      0001 International Social Survey Programme (ISSP)
## 2      0002                     EB - Flash Eurobarometer
## 3      0003                               Travel Surveys
## 4      0004                            Time Budget Study
## 5      0005       EB - Central and Eastern Eurobarometer
## 6      0006       EB - Candidate Countries Eurobarometer
## 7      0007                                       ALLBUS
## 8      0008      EB - Standard and Special Eurobarometer
## 9      0009                  European Values Study (EVS)
## 10     0010                               Politbarometer

We see that the Eurobarometer has study group number 0008 Let's looks at all available Eurobarometer waves:

eurobars <- get_datasets("0008")
head(eurobars)
##    doi                           title
## 1 0078 Attitudes towards Europe (1962)
## 2 0626 European Communities Study 1970
## 3 0627 European Communities Study 1971
## 4 0628 European Communities Study 1973
## 5 0986  Eurobarometer 2 (Oct-Nov 1974)
## 6 0987      Eurobarometer 3 (May 1975)

We would now like to download the first three studies. We first need to log in to the Gesis website and then pass the DOIs (unique data set identifiers) to download_dataset:

# username and password stored as environment 
# variables "GESIS_USER" and "GESIS_PASS"
gesis_session <- login()
if(!dir.exists("downloads")) dir.create("downloads")
download_dataset(s = gesis_session, doi = eurobars$doi[1:3], 
                 path = "downloads", filetype = ".dta")
## Downloading DOI: 0078

## Downloading DOI: 0626

## Downloading DOI: 0627
(files <- list.files("downloads", full.names = TRUE))
## [1] "downloads/ZA0078_v1-0-1.dta" "downloads/ZA0626_v1-0-1.dta"
## [3] "downloads/ZA0627_v1-0-1.dta"

We can also download the codebooks for the same studies:

download_codebook(eurobars$doi[1:3], path = "downloads")
## Downloading codebook for DOI: 0078

## Downloading codebook for DOI: 0626

## Downloading codebook for DOI: 0627

Using the haven package we can now read the data sets:

library(haven)
df <- read_dta(files[1])
dim(df)
## [1] 4774  175

Disclaimer: the gesis package is neither affiliated with, nor endorsed by, the Leibniz Institute for the Social Sciences. I have been unable to find any indication that programmatic access to the website is disallowed under its terms of use (indeed, its guidelines appear to encourage it). That said, I would discourage users from using the gesis package to put undue pressure on their servers by initiating unnecessary (or unnecessarily large) batch downloads.

gesis's People

Contributors

briatte avatar expersso avatar katrinleinweber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

fsolt

gesis's Issues

argument name doi can be misleading

download_codebook and download_dataset both have an argument called doi and the description of that argument is "The unique identifier(s) for the data set(s)". This may be misleading/confusing for users as the number that is needed here is the study number (i.e., the number that comes after "ZA" in the dataset title in the DBK). For example, the DOI for ZA4587 is 10.4232/1.13048 (as specified in the bibliographic citation for that dataset), but the "doi" required by the gesis package functions is "4587". Hence, to avoid potential confusion it might help to change the name and description of the argument. One suggestion that I can think of could be argument doi = study (or studynr) + Description = "Study number(s) for the data set(s). The number that comes after "ZA" in the DBK study title".
For consistency, this would also require changing the column name created by the get_datasets function from doi to study or studynr or whatever else is used as an argument for download_codebook and download_dataset.

GESIS identifiers should always be 4-digits long

Kinda related to #12.

The download_dataset function will fail if the "DOI" (the GESIS ID) is 3-digits long instead of 4-digits long, as e.g. with old Eurobarometer survey waves.

Example:

gesis::download_dataset(s = GESIS_SESSION, doi = 990, path = ".", filetype = ".dta")

This will fail unless 990 is converted to 0990 first. In fact, the same thing happens if the "DOI" / GESIS number provided by gesis is used to access the studies online:

A simple fix should be:

doi <- dplyr::if_else(doi < 1000, str_c("0", doi), as.character(doi))

I guess the issue was introduced by converting "DOIs" / GESIS IDs to integers.

Please let me know if you can reproduce the error, and I'll submit a PR.

I do not know what other functions are affected.

Resubmit to CRAN?

Hello,

I guess it's not a priority, given how easy it is to install packages from GitHub, but I was saddened to see that CRAN had removed your excellent package. Resubmit one day, perhaps?

(I confess I do not know if packages can be resubmitted after removal).

All the best,

François

download_codebook throws an error message

First of all, I wanted to thank you for creating this great package!
I am planning on using it in a workshop I will teach and noticed that download_codebook gives the error message "Error in is.url(url) : length(url) == 1 is not TRUE"
Could it be that this is because the particular study whose codebook I wanted to download has multiple files whose names include "_cdb"?
If you want to reproduce the error, the study number is "4587".

Add GitHub topics

webservice-client, r-package, rstats, research-data for example ;-)

download_codebook fails for files without "_cdb"

Thanks for writing this package, it's very useful!

I have noticed that not all dois have a codebook section. So, there's an error when the download_codebook() function does not find one on the Gesis website. For example download_codebook(doi = c("5876", "5913", "5689")) evaluates the first doi and then stops because doi "5913" does not have a codebook on the Gesis website (although "5689" does and should be downloaded).

However, it would be great to have some documentation/codebook for the corresponding data. One solution would be to download the questionnaire for those cases and display a warning that a questionnaire was downloaded instead of a codebook. Another would be to find/construct a codebook from another online source (like ICPSR).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.