manybabies / mb2-analysis Goto Github PK
View Code? Open in Web Editor NEWAnalysis scripts for MB2
License: MIT License
Analysis scripts for MB2
License: MIT License
probably went wrong in 003, right now many test trials are getting numbered 1 and 2
if they haven't, we shouldn't import but should instead ask the lab to redo the data format.
for example, make sure to output bookkeeping about what is being excluded (subjects, trials, timestamps)
-subid is used throughout pilot_data_analysis.Rmd as if it is guaranteed to be unique but it is not guaranteed to be unique
-subid + lab + experiment_num would be unique
before averaging eyes, check that coordinates are not at the extreme of screen (e.g., 0, 1600, etc.) as this may be an indicator of missing data for one or both eyes. in the case of missingness or big delta between eyes, data should be handled carefully, e.g. not just naively averaged.
-On a quick glance, it looks like the outcome condition of pilot 1b for CEU shows an earlier shift towards the target than others. Is there some misalignment somewhere?
we should install renv
to facilitate package management!
Avoid hacky point of discrimination handling for the two pilots in pod.R
@datigrezzi has already implemented some version of this in a shiny app, we would just like to import potentially: https://github.com/datigrezzi/firstaidgaze
As a sanity check on AOIs, it would be great to have those show up on the animation videos, produced by animation_analysis.R.
specify experiment_num in import.R files to distinguish between pilot 1a & 1b (instead of function in pilot_data_analysis.R which checks this by ID < 7)
Due to peekbank changes between the pilot and the current study, there is some deprecated code / outdated information scattered throughout the repo (e.g. the contents of the data_integrity folder). Having a unified code separation pattern in subdirectories (a. Pilot specific code, b. Primary specific code, c. shared code) would make it easier to work with the repo and prevent mistakes when doing the analysis
Also, our generate_AOIs_for_primary_data.R now contains the code to generate all missing peekbank tables, maybe a rename would be appropriate. I would do that and update the references, but I am not sure if contributors have uncommitted changes that would cause merge conflicts.
separate outcome & no_outcome conditions in import.R & consider for different plots and analyses in pilot_data_analysis
As illustrated in the scatter plot, it would seem that CEU and Copenhagen have a systematic difference in their gaze distribution. This could be due to either differences in calibration, or perhaps differences in placement of the stimuli. I will dig deeper and see if there are errors in the script that corrects for the difference in screen dimensions (which I think is unlikely because the other types of eyetrackers are corrected without issue.). Maybe contact the two labs and double check that their stimuli was displayed at the centre of the screen and not slightly off?
This is a reminder for us to make sure that trial number order is accurately reflected in the final data we analyze.
A few participants have surprisingly long first trials (based on media file name) and should be double-checked:
Where do we save the finally collected raw data?
Right now, we only have code for confirmatory analyses using proportion looking. This issue is to track work on the first look analysis.
There is also a worry about the definition of a first look in the registered report:
"First saccades will be determined as the first change in gaze occurring within the anticipatory time window that is directed towards one of the AOIs. The first look is then the binary variable denoting the target of this first saccade (i.e., either the correct or incorrect AOI) and is defined as the first AOI where participants fixated at for at least 150 ms, as in Rayner et al. (2009). The rationale for this definition was that, if participants are looking at a location within the tunnel exit AOIs before the anticipation period, they might have been looking there for other reasons than action prediction. We therefore count only looks that start within the anticipation period because they more unambiguously reflect action predictions. This further prevents us from running into a situation where we would include a lot of fixations on regions other than the tunnel exit AOIs because participants are looking somewhere else before the anticipation period begins"
Two issues:
so that high Hz trackers aren't over-represented in viz.
data files are too big! we should probably put them on OSF and pull them in, like peekbank does.
In row 60727 of PKUSu_adults_xy_timepoints.csv
we see an extra header row inserted mid-dataset . @yanwei-W I think you were looking at this one, can you take a look? thanks!
peekds requires ISO 639-2 codes for the native_language
field; however, the langN
fields are not validated by the MB-2 validator (and maybe peekds? not sure). Two possible solutions:
langN
in the correct codes)Eyelinks use an internal unit for their pupil size measurement, which is fine if we want relative units, but does not translate into absolute units (which is commonly used, especially to compare across systems). To do so, labs will need to run a calibration: printing a black circle of known absolute size, holding it at head distance away from the Eyelink, and conducting a measurement. This should be relatively brief (and labs may have already done so for other experiments), but we will need to collate the data for the pupillometry spinoff.
Source: https://www.sr-research.com/support/thread-154.html (walled behind account creation; relevant section in screenshot below)
differential looking score with regard to anticipatory looking time (looking to correct tunnel exit relative to looking to correct or incorrect exit)
compute this binary variable and analyze using an LMER or equivalent:
Something like this:
lmer(correct_first_anticipation ~ trial_num + (1|subject) + (1|lab), data = anticipations, family = "binomial")
it would be great to create a script called data_integrity.Rmd
that reads in the measurements and creates a series of tests for a variety of things (e.g., the extremal values in #12).
Some ideas:
The hope for this markdown would be that we render this and see visually whether everything looks reasonable.
The preregistered first look model has a very non-gaussian shape to it given there's a lot of ones.
Suggest building a second generalised model in brms that takes either beta_family as a shape if we only have proportions, or binomial if we have number of samples the proportion is calculated from.
I refactored the code to generate AOI regions in pilot data. Previously, the AOI computation was being done in peekds, but this assumes that we always have a single target and distractor area. Here, we wanted to analyze more small areas. I think it makes sense for AOI target and distractor regions to be defined in the analysis and not in peekds. Right now, the code is duplicated, though, so it would be good to clean this up. See metadata/generate_AOIs.R to see the duplicated code.
Peekdb underwent some changes in their schema, so we need to re-factor the peek bank R scripts. Also, it will be best to create some documentation to list out how we pre-process data from each lab using peekdb
A handful of labs have an idiosyncratic name for their first calibration. Can we figure out if this is the star calibration or something else? The labs are:
we see this issue in babylingOslo_toddlers_xy_timepoints.csv
. @adriansteffan I think you imported this one.
Some raw data files include a "fixation filter" column that specify (probably) the default fixation filter (e.g., "Tobii I-VT (Fixation)"). We should double-check that the data we are seeing is not filtered in any way.
The pod.R file seems to contain only the pod timestamps for the pilot, but is missing the timestamps for some stimuli used in the actual experiment
add AOI for the window in the Y maze and add these timecourses to the visualizations
In the preprocessed data prior to AOI analysis, a large number of babylabUmassb participants seem to have only familiarization trial, and no test trials (specifically participants UMB016 through UMB033). This suggests that maybe something went wrong with preprocessing/ naming of media files? We should double-check how that lab's data is being processed.
It looks like the data (specifically the test data, not the familiarization data) has inconsistent time values in t_norm
at the end of preprocessing. I think the source of the issue is that the point of disambiguation varies across test trial stimuli (see trial_details.csv), and since we now resample before normalizing time (i.e., adjusting for the point of disambiguation), this introduces inconsistent time values across participants/ labs (depending on which test trial stimuli they used).
I think this is only really a convenience problem for summarizing and plotting the timecourse data - we currently solve this issue by downsampling to a consistent set of times before visualizing the time course.
Our current AOIs in generate_aois_for_primary_data.R assume a top left origin AND a 1200x900 stimulus video.
We need to recalculate our aoi coordinates to adhere to A) a bottom left origin and B) the 1280x960 resolution.
in pilot data, at least Trento and Goettingen participants were not unique by participant names and have been incorrectly clustered and imported (including in RR data analysis)
this needs to be checked/corrected.
Need to make sure that we apply exclusion criteria from the Registered Report, especially data excluded due to missingness.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.