Run the following command in the repository root:
docker run -d --name internal_validity_study -p 8888:8888 -v "$PWD":/home/jovyan/work jupyter/datascience-notebook
Inspect the logs to get the connection link:
docker logs internal_validity_study
-
Generation_of_Survey_Questions.ipynb
: This notebook is used to generate survey questions in qualtrics TXT format.
It uses the fileraw_data.json
and it creates the filesraw_data.md
anddata_clean_questions.txt
. -
Process_survey_responses.ipynb
: This notebook processes survey responses files (in raw Qualtrics export format) in the foldersurvey_results
and writes the filesurvey_results_processed.csv
.
All questions with reversed paragraph pairs are swapped back to the original direction (the survey response is swapped too). -
Survey_responses_analysys.ipynb
: This notebook is used to analyse survey results. It reads thesurvey_results_processed.csv
file. It produces two images (infigures/
) and the filesurvey_questions_report.md
.
IMPORTANT: all paragraph pairs are swapped so that the readability deltas show a decrease (responses are swapped accordingly).
-
raw_data.json: raw data in JSON format. An array of paragraph pairs. Each element has the following shape:
{ "_id": "nopqrst0000", "documentRepoId": "ghijklm0000", "documentRepoName": "paper-research-foo", "fkglDelta": 2.0000, "freDelta": -1.5000, "from": { "commitAuthorEmail": "[email protected]", "commitId": "abcdef0001", "readability": { "fleschKincaidGradeLevel": 20.0000, "fleschReadingEase": 3.5000 }, "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." }, "to": { "commitAuthorEmail": "[email protected]", "commitId": "abcdef0002", "readability": { "fleschKincaidGradeLevel": 22.0000, "fleschReadingEase": 2.0000 }, "text": "Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat." } }
Each element represents two successive versions of a paragraph (
from
andto
). Each version has a text and two readability measurements, and other metadata.
The root fields includefreDelta
andfkglDelta
which are the changes in the two readability metrics.
The main_id
field is used in the survey as the question identifier. -
raw_data.md
: raw data in a more readable format, generated with the -
data_clean_questions.txt
: Qualtrics survey questions in TXT format. There are 61 questions in total. A first question simply asks for the respondent's academic level. The following 60 questions are separated in 10 blocks of 6 questions.
Each block contains three questions and the three reversed versions of those questions (from
andto
paragraphs are reversed).
Each question has as question ID the original_id
of the paragraph pair (with-rev
appended for the reversed pairs). -
survey_results/
: folder containing raw survey results exported from Qualtrics. -
survey_results_processed.csv
: survey results processed to a more usable format. Each row has the following fields:ResponseId: Qualtrics response id qid: question id (original _id of paragraph pair) res: survey response (likert) Qlevel: academic level of respondent was_rev: whether the paragraph was presented reversed freDelta: delta in Flesch reading ease fkglDelta: delta in Flesch—Kincaid grade level from.text, to.text: paragraph texts from.FRE, to.FRE: Flesch reading ease of paragraphs from.FKG, to.FKG: Flesch—Kincaid grade level of paragraphs
-
figures/
: plots of survey results. There are two figures, both depicting counts of values in the likert scale. One includes neutral responses, the other does not. -
survey_questions_report.md
: report with all paragraph pairs as seen in thefigures
.