Giter VIP home page Giter VIP logo

Comments (7)

jmobrien avatar jmobrien commented on June 27, 2024 1

So, what I "know" I've mostly cobbled together from working with this package over the past couple years + my own use of it for data cleaning. My interaction with it doesn't overlap with the original owner. But basically, I think the situation was/is:

  • Qualtrics provided some extra metadata in their response downloads that was worth preserving in qualtRics, specifically question text
  • As far as storing this sort of metadata, R standard practice was to use a attribute called "label"
  • Package sjlabelled has tended to be the primary (only?) toolkit specifically for interacting with label/labels attributes.
  • The actual paradigm for using the "label" attribute comes from haven, where it's somewhat bound up in a more complex idea of a "labelled" class including other things like an attribute for response options ("labels") and more complex missing data codes. However, sjlabelled mostly just focuses on working with the label/labels attributes, AS attributes.
  • For a couple of reasons I'm surmising, going with the simpler sjlabelled approach rather than haven's was likely viewed as the best approach for this package:
    • haven itself has always presented the "labelled" class as more of a transitional tool to help with importing/exporting from other statistical suites, and not a "proper" class you'd actually want to use natively in R.
    • Qualtrics's response downloads don't include metadata about questions' response options (nor info about missing data, save for some partial info added relatively recently). So, we only had the content for populating the main "label" attribute.
    • Having the extra "labelled" class attribute could sometimes create errors with generic functions that weren't expecting them (including some really critical stuff like the common modeling functions lm(), lmer(), etc.)
    • The tidyverse was much less mature & dominant then, and neither it nor the more base-R approaches to data manipulation could be relied upon to preserve class and/or attributes effectively by default. So, you were probably going to need to bake in label preservation to your workflow regardless.

Now, even if I'm right about the above, I kind of think that a lot of that might be viewed differently today. IMO the current, post-vctrs incarnation of haven and the "labelled" class offers a lot that might warrant it being seen as a "real" class. Perhaps we should consider using if we're going to continue incorporating label metadata (and we definitely will continue). For a number of this-is-too-long-already reasons I'm not quite convinced that's the right choice, but I'm writing this up to at least put the idea in the water.

Meanwhile, if you want to preserve things in your own workflows you're going to need some options. The obvious option is to convert to the "labelled" class, though there are other approaches:

    require(tidyverse, quietly = TRUE)
    require(haven, quietly = TRUE)
    require(sjlabelled, quietly = TRUE)
    
    # Function for converting to the labelled class:
    make_labelled <- 
      \(x){
        haven::labelled(x = x, 
                        label = attr(x, "label"),
                        labels = attr(x, "labels")
        )
      }
    
    # Example data frame:
    test <- 
      tibble(
        a = sample(c(1,2,50), 15, replace = TRUE) |> 
          structure(label = "a label"),
      )
    
    test |> get_label()
#>         a 
#> "a label"
    
    test <- 
      test |> 
      mutate(
        # This approach loses label/labels attributes:
        a_conv = 
          a |> 
          case_match(50 ~ 3, .default = a),
        # But you can convert first: 
        a_lab = 
          a |> 
          make_labelled(),
        # then the standard dplyr tools will preserve attributes (if used properly):
        a_lab_conv = 
          a_lab |> 
          case_match(50 ~ 3, .default = a_lab),
        # If you want to preserve attributes but don't want to end up with 
        # labelled vars, you can do it in place (this requires magrittr's %>%):
        a_conv2 = 
          a %>%
          make_labelled() %>%
          case_match(.x = . ,50 ~ 3, .default = .) |> 
          sjlabelled::unlabel(),
        # or, an even simpler manual approach:
        a_conv3 = 
          a |> 
          case_match(50 ~ 3, .default = a) |> 
          structure(label = attr(a, "label"))
      )

# Some give you a "labelled" class, some don't:     
test |> purrr::map(class)  
#> $a
#> [1] "numeric"
#> 
#> $a_conv
#> [1] "numeric"
#> 
#> $a_lab
#> [1] "haven_labelled" "vctrs_vctr"     "double"        
#> 
#> $a_lab_conv
#> [1] "haven_labelled" "vctrs_vctr"     "double"        
#> 
#> $a_conv2
#> [1] "numeric"
#> 
#> $a_conv3
#> [1] "numeric"
# But they all (other than the one) preserve the "label" attribute
test |> sjlabelled::get_label()
#>          a     a_conv      a_lab a_lab_conv    a_conv2    a_conv3 
#>  "a label"         ""  "a label"  "a label"  "a label"  "a label"

Created on 2023-08-08 with reprex v2.0.2

from qualtrics.

juliasilge avatar juliasilge commented on June 27, 2024 1

I kind of think that a lot of that might be viewed differently today

This is spot-on IMO; the decisions around sjabelled were made quite a long time ago, before some newer and better options existed. I do think these attributes are worth revisiting so folks have data that works better with current tools. I would be open to avoiding these kinds of attributes altogether in lieu of nicer tools for dealing with the labels and other metadata, but if that would be too much of a change, we can think through how this should be updated, maybe using haven's infrastructure instead of sjlabelled.

from qualtrics.

jmobrien avatar jmobrien commented on June 27, 2024

What would you say are the current newer & better options? On its face I don't love the label/labels attribute approach either, but I'm not up-to-date on what alternatives might be emerging as best practice.

I will say that one case for sticking with the attribute-centric approach is haven's exporting tools. AFAIK haven provides the only reasonably up-to-date approach for making datasets available in Stata, SAS, etc., which can sometimes be valuable, esp. in academia/gov't. Exporting can include the metadata (and other things), but that does require following their conventions.

from qualtrics.

juliasilge avatar juliasilge commented on June 27, 2024

Ah sorry, I may not have been clear.

  • I think that current (modern) haven may be easier to work with than current sjlabelled, and this may be the way to go for better handling of, for example, question text.
  • If we have good enough tools for getting the kind of metadata that exists in the labels in a different way, like a dataframe of metadata with the question text, part of me wants to get rid of all the attributes and "labelled" class business altogether. I do not come from the SPSS or SAS world, though, so this may be too extreme of an option.

from qualtrics.

rempsyc avatar rempsyc commented on June 27, 2024

Thanks for the workaround @jmobrien. Just for the sake of completeness, here is the workaround I was using (basically saving labels and manually adding them back after to avoid relying on another package):

suppressWarnings(suppressPackageStartupMessages(library(qualtRics)))
suppressWarnings(suppressPackageStartupMessages(library(sjlabelled)))
suppressWarnings(suppressPackageStartupMessages(library(dplyr)))

# Extract all surveys
surveys <- all_surveys()

# # Identify right survey
survey1.id <- surveys$id[
  which("Projet priming-aggression (Part 1)_Study 3" == surveys$name)]

# # Fetch right survey
data <- suppressMessages(fetch_survey(surveyID = survey1.id, verbose = FALSE))

# sjlabelled works
get_label(data$Status)
#>          Status 
#> "Response Type"

# Save question labels
labels.data <- data |>
  get_label() |>
  bind_rows()

# case_when
data <- data %>% 
  mutate(Status = case_when(Status == 50 ~ 1),
         Progress = case_when(Progress == 100 ~ 1))

# Labels lost
get_label(data$Status)
#> NULL

# Repair labels
data <- data %>% 
  mutate(Status = set_label(Status, labels.data$Status))

# Labels recovered
get_label(data$Status)
#> [1] "Response Type"

# Problem: needs to be done for each variable
get_label(data$Progress)
#> NULL

Created on 2023-08-14 with reprex v2.0.2

There would probably be a way to automate this process more efficiently through a function for all relevant variables though...

from qualtrics.

jmobrien avatar jmobrien commented on June 27, 2024

Great, that works too. For automation across multiple variables, in one of my cases I ended up creating a set_attributes() function that worked analogously to set_names() for inline restoration of attributes. I just set key attributes aside en masse for (a) dataframe(s), then used across() to reapply attributes when needed. Don't have the code in front of me but it wasn't too complex.

from qualtrics.

jmobrien avatar jmobrien commented on June 27, 2024

Expanding on your response @juliasilge, yes, this runs up against where we're already using a dual-approach model wherein question text metadata can be embedded at the variable level via labels, at the dataframe level via the attached column map (attribute), or both. We could definitely move more specifically in either direction if we saw fit.

Also, I suppose an alternative approach would be to write some helper functions that can add/restore labels from the column map as needed.

from qualtrics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.