manybabies / mb2-analysis Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 10.0 556.54 MB

Analysis scripts for MB2

License: MIT License

R 100.00%

mb2-analysis's People

Contributors

Stargazers

Watchers

Forkers

datigrezzi tobiasschuwerk mahowak rhepach yuen-francis cogsciphillab angelinetsui nalessandroni galraz yanwei-w

mb2-analysis's Issues

find a way to open .edf raw data files

fix trial num

probably went wrong in 003, right now many test trials are getting numbered 1 and 2

refactor folders into 1a and 1b, adjust file reading script accordingly

@yuen-francis

check if participants files have all been through MB validator

if they haven't, we shouldn't import but should instead ask the lab to redo the data format.

be more thoughtful about exclusions

for example, make sure to output bookkeeping about what is being excluded (subjects, trials, timestamps)

fix subid

-subid is used throughout pilot_data_analysis.Rmd as if it is guaranteed to be unique but it is not guaranteed to be unique
-subid + lab + experiment_num would be unique

before averaging eyes, check that coordinates are not at the extreme of screen (e.g., 0, 1600, etc.) as this may be an indicator of missing data for one or both eyes. in the case of missingness or big delta between eyes, data should be handled carefully, e.g. not just naively averaged.

Is pilot 1b CEU data misaligned in time?

-On a quick glance, it looks like the outcome condition of pilot 1b for CEU shows an earlier shift towards the target than others. Is there some misalignment somewhere?

Did something go wrong with exclusions?

Just pulled the newest changes and after running exclusions, the sub_id column strangely adds the lab id too.

Outliers in Tobii

Tobii trackers seem to produce gaze coordinates that are outside the bounds of the screen, as shown in the image. Need to exclude these extremal values during import.R maybe?

add package management

we should install renv to facilitate package management!

Clean up pod.R

Avoid hacky point of discrimination handling for the two pilots in pod.R

implement calibration correction in main pipeline

@datigrezzi has already implemented some version of this in a shiny app, we would just like to import potentially: https://github.com/datigrezzi/firstaidgaze

aoi sanity check in animation

As a sanity check on AOIs, it would be great to have those show up on the animation videos, produced by animation_analysis.R.

specify experiment_num in import.R files in datasets

specify experiment_num in import.R files to distinguish between pilot 1a & 1b (instead of function in pilot_data_analysis.R which checks this by ID < 7)

Repo Cleanup

Due to peekbank changes between the pilot and the current study, there is some deprecated code / outdated information scattered throughout the repo (e.g. the contents of the data_integrity folder). Having a unified code separation pattern in subdirectories (a. Pilot specific code, b. Primary specific code, c. shared code) would make it easier to work with the repo and prevent mistakes when doing the analysis

Also, our generate_AOIs_for_primary_data.R now contains the code to generate all missing peekbank tables, maybe a rename would be appropriate. I would do that and update the references, but I am not sure if contributors have uncommitted changes that would cause merge conflicts.

second pilot: we need to handle "no_outcome" vs "outcome" trials, which have different video lengths

separate outcome & no_outcome conditions in import.R

separate outcome & no_outcome conditions in import.R & consider for different plots and analyses in pilot_data_analysis

Systematic discrepancies between eyelink labs

As illustrated in the scatter plot, it would seem that CEU and Copenhagen have a systematic difference in their gaze distribution. This could be due to either differences in calibration, or perhaps differences in placement of the stimuli. I will dig deeper and see if there are errors in the script that corrects for the difference in screen dimensions (which I think is unlikely because the other types of eyetrackers are corrected without issue.). Maybe contact the two labs and double check that their stimuli was displayed at the centre of the screen and not slightly off?

pilot 1b: do new analyses, comparing 1b to original pilot

Outliers in Eyelink data

Eyelink seems to output a significant number of 'impossible' coordinates. Need to exclude these during import.R?

Double-check that familiarization trials are encoded in the correct order

This is a reminder for us to make sure that trial number order is accurately reflected in the final data we analyze.

unusual first trial length for a handful of participants

A few participants have surprisingly long first trials (based on media file name) and should be double-checked:

MARCS_24
MARCS_28
2116AlYa

Final raw data storage

Where do we save the finally collected raw data?

Add first look analysis

Right now, we only have code for confirmatory analyses using proportion looking. This issue is to track work on the first look analysis.

There is also a worry about the definition of a first look in the registered report:
"First saccades will be determined as the first change in gaze occurring within the anticipatory time window that is directed towards one of the AOIs. The first look is then the binary variable denoting the target of this first saccade (i.e., either the correct or incorrect AOI) and is defined as the first AOI where participants fixated at for at least 150 ms, as in Rayner et al. (2009). The rationale for this definition was that, if participants are looking at a location within the tunnel exit AOIs before the anticipation period, they might have been looking there for other reasons than action prediction. We therefore count only looks that start within the anticipation period because they more unambiguously reflect action predictions. This further prevents us from running into a situation where we would include a lot of fixations on regions other than the tunnel exit AOIs because participants are looking somewhere else before the anticipation period begins"

Two issues:

Are we sure we want to exclude looks that are starting at one of the two tunnel exit AOIs?
What counts as a change? E.g. do within AOI fixation shifts count? Or is a change in gaze based on changing from one AOI to the other? We need to define this a bit more precisely (and then e.g. decide if we need to process the data into fixations, or what AOIs count in the "change in AOI" interpretation.

downsample data for animations

so that high Hz trackers aren't over-represented in viz.

add outcome and no_outcome in import R for pilot 1 b data

file storage

data files are too big! we should probably put them on OSF and pull them in, like peekbank does.

PKUSu has extra header rows

In row 60727 of PKUSu_adults_xy_timepoints.csv we see an extra header row inserted mid-dataset . @yanwei-W I think you were looking at this one, can you take a look? thanks!

Standardising language coding

peekds requires ISO 639-2 codes for the native_language field; however, the langN fields are not validated by the MB-2 validator (and maybe peekds? not sure). Two possible solutions:

hand-crafted ISO 639-2 mappings for each dataset separately
add requirement into validator (i.e., have labs submit langN in the correct codes)

Eyelink pupil size requires calibration

Eyelinks use an internal unit for their pupil size measurement, which is fine if we want relative units, but does not translate into absolute units (which is commonly used, especially to compare across systems). To do so, labs will need to run a calibration: printing a black circle of known absolute size, holding it at head distance away from the Eyelink, and conducting a measurement. This should be relatively brief (and labs may have already done so for other experiments), but we will need to collate the data for the pupillometry spinoff.

Source: https://www.sr-research.com/support/thread-154.html (walled behind account creation; relevant section in screenshot below)

MAIN ANALYSIS 2: differential looking

differential looking score with regard to anticipatory looking time (looking to correct tunnel exit relative to looking to correct or incorrect exit)

MAIN ANALYSIS 1: first anticipatory look

compute this binary variable and analyze using an LMER or equivalent:

Something like this:

lmer(correct_first_anticipation ~ trial_num + (1|subject) + (1|lab), data = anticipations, family = "binomial")

sanity checks on tracker data

it would be great to create a script called data_integrity.Rmd that reads in the measurements and creates a series of tests for a variety of things (e.g., the extremal values in #12).

Some ideas:

basic histograms of marginals (x distribution and y distribution) for eye-movements
density of eye-movements (e.g., 2d histogram)
maybe plotting on sample frames of the animation?
move over the trial length checking from the main analysis script

The hope for this markdown would be that we render this and see visually whether everything looks reasonable.

further animation enhancements

Various different enhancements:

registration across stimuli (so you can see all the data)
downsample xy_data appropriately so there is one time-point per frame (see #13)
consider plotting heatmaps instead of bee swarms
add AOIs (see #14)

Include generalised linear models for follow up analyses

The preregistered first look model has a very non-gaussian shape to it given there's a lot of ones.

Suggest building a second generalised model in brms that takes either beta_family as a shape if we only have proportions, or binomial if we have number of samples the proportion is calculated from.

Do we already import pupil diameter? This is important for the spin-off.

AOIs and peekds

I refactored the code to generate AOI regions in pilot data. Previously, the AOI computation was being done in peekds, but this assumes that we always have a single target and distractor area. Here, we wanted to analyze more small areas. I think it makes sense for AOI target and distractor regions to be defined in the analysis and not in peekds. Right now, the code is duplicated, though, so it would be good to clean this up. See metadata/generate_AOIs.R to see the duplicated code.

re-factor peek bank scripts for pre-processing files and also create documentation

Peekdb underwent some changes in their schema, so we need to re-factor the peek bank R scripts. Also, it will be best to create some documentation to list out how we pre-process data from each lab using peekdb

lab-specific calibration trials?

A handful of labs have an idiosyncratic name for their first calibration. Can we figure out if this is the star calibration or something else? The labs are:

labUNAM
cecBYU
childdevlabAshoka

pilot 1b: handle import.R for various labs, check data

. instead of NA for missing XY data

we see this issue in babylingOslo_toddlers_xy_timepoints.csv. @adriansteffan I think you imported this one.

Check fixation filter column in raw data

Some raw data files include a "fixation filter" column that specify (probably) the default fixation filter (e.g., "Tobii I-VT (Fixation)"). We should double-check that the data we are seeing is not filtered in any way.

Point of disambiguation

The pod.R file seems to contain only the pod timestamps for the pilot, but is missing the timestamps for some stimuli used in the actual experiment

y-maze window AOI

add AOI for the window in the Y maze and add these timecourses to the visualizations

missing familiarization trials for babylabUmassb

In the preprocessed data prior to AOI analysis, a large number of babylabUmassb participants seem to have only familiarization trial, and no test trials (specifically participants UMB016 through UMB033). This suggests that maybe something went wrong with preprocessing/ naming of media files? We should double-check how that lab's data is being processed.

Consistent t_norm sampling rate at the end of preprocessing

It looks like the data (specifically the test data, not the familiarization data) has inconsistent time values in t_norm at the end of preprocessing. I think the source of the issue is that the point of disambiguation varies across test trial stimuli (see trial_details.csv), and since we now resample before normalizing time (i.e., adjusting for the point of disambiguation), this introduces inconsistent time values across participants/ labs (depending on which test trial stimuli they used).

I think this is only really a convenience problem for summarizing and plotting the timecourse data - we currently solve this issue by downsampling to a consistent set of times before visualizing the time course.

AOIs: Top Left vs Bottom Left Origin, Video Resolution

Our current AOIs in generate_aois_for_primary_data.R assume a top left origin AND a 1200x900 stimulus video.

We need to recalculate our aoi coordinates to adhere to A) a bottom left origin and B) the 1280x960 resolution.

uniqifying participants

in pilot data, at least Trento and Goettingen participants were not unique by participant names and have been incorrectly clustered and imported (including in RR data analysis)

this needs to be checked/corrected.

@isagarbisch

apply exclusion criteria from the Registered Report

Need to make sure that we apply exclusion criteria from the Registered Report, especially data excluded due to missingness.