Comments (7)
So, what I "know" I've mostly cobbled together from working with this package over the past couple years + my own use of it for data cleaning. My interaction with it doesn't overlap with the original owner. But basically, I think the situation was/is:
- Qualtrics provided some extra metadata in their response downloads that was worth preserving in
qualtRics
, specifically question text - As far as storing this sort of metadata, R standard practice was to use a attribute called "label"
- Package
sjlabelled
has tended to be the primary (only?) toolkit specifically for interacting with label/labels attributes. - The actual paradigm for using the "label" attribute comes from
haven
, where it's somewhat bound up in a more complex idea of a "labelled" class including other things like an attribute for response options ("labels") and more complex missing data codes. However,sjlabelled
mostly just focuses on working with the label/labels attributes, AS attributes. - For a couple of reasons I'm surmising, going with the simpler
sjlabelled
approach rather thanhaven
's was likely viewed as the best approach for this package:haven
itself has always presented the "labelled" class as more of a transitional tool to help with importing/exporting from other statistical suites, and not a "proper" class you'd actually want to use natively in R.- Qualtrics's response downloads don't include metadata about questions' response options (nor info about missing data, save for some partial info added relatively recently). So, we only had the content for populating the main "label" attribute.
- Having the extra "labelled" class attribute could sometimes create errors with generic functions that weren't expecting them (including some really critical stuff like the common modeling functions
lm()
,lmer()
, etc.) - The tidyverse was much less mature & dominant then, and neither it nor the more base-R approaches to data manipulation could be relied upon to preserve class and/or attributes effectively by default. So, you were probably going to need to bake in label preservation to your workflow regardless.
Now, even if I'm right about the above, I kind of think that a lot of that might be viewed differently today. IMO the current, post-vctrs
incarnation of haven
and the "labelled" class offers a lot that might warrant it being seen as a "real" class. Perhaps we should consider using if we're going to continue incorporating label metadata (and we definitely will continue). For a number of this-is-too-long-already reasons I'm not quite convinced that's the right choice, but I'm writing this up to at least put the idea in the water.
Meanwhile, if you want to preserve things in your own workflows you're going to need some options. The obvious option is to convert to the "labelled" class, though there are other approaches:
require(tidyverse, quietly = TRUE)
require(haven, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
# Function for converting to the labelled class:
make_labelled <-
\(x){
haven::labelled(x = x,
label = attr(x, "label"),
labels = attr(x, "labels")
)
}
# Example data frame:
test <-
tibble(
a = sample(c(1,2,50), 15, replace = TRUE) |>
structure(label = "a label"),
)
test |> get_label()
#> a
#> "a label"
test <-
test |>
mutate(
# This approach loses label/labels attributes:
a_conv =
a |>
case_match(50 ~ 3, .default = a),
# But you can convert first:
a_lab =
a |>
make_labelled(),
# then the standard dplyr tools will preserve attributes (if used properly):
a_lab_conv =
a_lab |>
case_match(50 ~ 3, .default = a_lab),
# If you want to preserve attributes but don't want to end up with
# labelled vars, you can do it in place (this requires magrittr's %>%):
a_conv2 =
a %>%
make_labelled() %>%
case_match(.x = . ,50 ~ 3, .default = .) |>
sjlabelled::unlabel(),
# or, an even simpler manual approach:
a_conv3 =
a |>
case_match(50 ~ 3, .default = a) |>
structure(label = attr(a, "label"))
)
# Some give you a "labelled" class, some don't:
test |> purrr::map(class)
#> $a
#> [1] "numeric"
#>
#> $a_conv
#> [1] "numeric"
#>
#> $a_lab
#> [1] "haven_labelled" "vctrs_vctr" "double"
#>
#> $a_lab_conv
#> [1] "haven_labelled" "vctrs_vctr" "double"
#>
#> $a_conv2
#> [1] "numeric"
#>
#> $a_conv3
#> [1] "numeric"
# But they all (other than the one) preserve the "label" attribute
test |> sjlabelled::get_label()
#> a a_conv a_lab a_lab_conv a_conv2 a_conv3
#> "a label" "" "a label" "a label" "a label" "a label"
Created on 2023-08-08 with reprex v2.0.2
from qualtrics.
I kind of think that a lot of that might be viewed differently today
This is spot-on IMO; the decisions around sjabelled were made quite a long time ago, before some newer and better options existed. I do think these attributes are worth revisiting so folks have data that works better with current tools. I would be open to avoiding these kinds of attributes altogether in lieu of nicer tools for dealing with the labels and other metadata, but if that would be too much of a change, we can think through how this should be updated, maybe using haven's infrastructure instead of sjlabelled.
from qualtrics.
What would you say are the current newer & better options? On its face I don't love the label/labels attribute approach either, but I'm not up-to-date on what alternatives might be emerging as best practice.
I will say that one case for sticking with the attribute-centric approach is haven
's exporting tools. AFAIK haven
provides the only reasonably up-to-date approach for making datasets available in Stata, SAS, etc., which can sometimes be valuable, esp. in academia/gov't. Exporting can include the metadata (and other things), but that does require following their conventions.
from qualtrics.
Ah sorry, I may not have been clear.
- I think that current (modern) haven may be easier to work with than current sjlabelled, and this may be the way to go for better handling of, for example, question text.
- If we have good enough tools for getting the kind of metadata that exists in the labels in a different way, like a dataframe of metadata with the question text, part of me wants to get rid of all the attributes and "labelled" class business altogether. I do not come from the SPSS or SAS world, though, so this may be too extreme of an option.
from qualtrics.
Thanks for the workaround @jmobrien. Just for the sake of completeness, here is the workaround I was using (basically saving labels and manually adding them back after to avoid relying on another package):
suppressWarnings(suppressPackageStartupMessages(library(qualtRics)))
suppressWarnings(suppressPackageStartupMessages(library(sjlabelled)))
suppressWarnings(suppressPackageStartupMessages(library(dplyr)))
# Extract all surveys
surveys <- all_surveys()
# # Identify right survey
survey1.id <- surveys$id[
which("Projet priming-aggression (Part 1)_Study 3" == surveys$name)]
# # Fetch right survey
data <- suppressMessages(fetch_survey(surveyID = survey1.id, verbose = FALSE))
# sjlabelled works
get_label(data$Status)
#> Status
#> "Response Type"
# Save question labels
labels.data <- data |>
get_label() |>
bind_rows()
# case_when
data <- data %>%
mutate(Status = case_when(Status == 50 ~ 1),
Progress = case_when(Progress == 100 ~ 1))
# Labels lost
get_label(data$Status)
#> NULL
# Repair labels
data <- data %>%
mutate(Status = set_label(Status, labels.data$Status))
# Labels recovered
get_label(data$Status)
#> [1] "Response Type"
# Problem: needs to be done for each variable
get_label(data$Progress)
#> NULL
Created on 2023-08-14 with reprex v2.0.2
There would probably be a way to automate this process more efficiently through a function for all relevant variables though...
from qualtrics.
Great, that works too. For automation across multiple variables, in one of my cases I ended up creating a set_attributes()
function that worked analogously to set_names()
for inline restoration of attributes. I just set key attributes aside en masse for (a) dataframe(s), then used across()
to reapply attributes when needed. Don't have the code in front of me but it wasn't too complex.
from qualtrics.
Expanding on your response @juliasilge, yes, this runs up against where we're already using a dual-approach model wherein question text metadata can be embedded at the variable level via labels, at the dataframe level via the attached column map (attribute), or both. We could definitely move more specifically in either direction if we saw fit.
Also, I suppose an alternative approach would be to write some helper functions that can add/restore labels from the column map as needed.
from qualtrics.
Related Issues (20)
- Support for OAuth (vs. API Token)? HOT 2
- Issue with fetching surveys HOT 3
- fetch distribution summary information? HOT 2
- error on fetching distribution history (i.e., full list of survey invitees along with distribution status) HOT 3
- lag / incomplete pull of list_distribution_links() HOT 1
- feature discussion: custom temporary directory HOT 4
- Future look: Funding support from Qualtrics? HOT 2
- fetch_survey error; filename issue? HOT 18
- results from all_surveys() and list_surveys is different HOT 5
- fetch_survey produces "Error: Qualtrics API raised a bad request (400) error" HOT 5
- Overhaul of credentialling system
- Error parsing file: The file does not appear to be a valid survey. Save as qsf - am I doing it wrong? import doesn't work HOT 5
- write_qsf() currently enables only encoding UTF-8 that doesn't handle Hebrew and Arabic HOT 16
- Release qualtRics 3.2.0 HOT 1
- Qualtrics API reported a bad request error (400) HOT 3
- Qualtrics webservice call using R tools HOT 3
- Question/ comment about the documentation HOT 3
- Ordered factor conversion for multiple choice question HOT 2
- read_survey() no longer (or inconsistently) breaks out sets HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qualtrics.