dfalster / datamashr Goto Github PK
View Code? Open in Web Editor NEWCode for merging data from different studies in a transparent way
License: Other
Code for merging data from different studies in a transparent way
License: Other
Want a function which checks each directory in data and returns TRUE if passes all the tests, checking that all the right parts are there.
Then embed this within getStudyNames --> returns a list of directories that pass all the tests. These dirs are then loaded and built. Make an option to test=FALSE (for getStudyNames, default=FALSE).
startDataMashR
load correctly?mashRdetail
functionlookupValue
, it fails when the lookupVariable
contains missing values. For example,lookupVariable,lookupValue,newVariable,newValue,source
species,Cedrela odorata,family,Meliaceae,
species,Tabebuia rosea,family,Bignonaceae,
would fail if species
contains NA
in the data
tidy.dir("R", replace.assign=TRUE, left.brace.newline=FALSE, reindent.spaces=TRUE)
It needs to be renamed, and baad needs to include this update.
To fix 1, need to find smart quote equivalent, \uXXXX
. Use tools::showNonASCIIfile("R/import.R")
to find non ascii, look up codes here
Currently we have a parameter data in loadStudies but I can't see any application for it at the moment?
Is it something old that we should remove or is it something yet to be implemented?
It looks like the different lines in the script all run, when we might want them to stop if one thing fails. Consider replacing
- make install
- make test
with
- make install test
(or make install && make test
).
For example, getStudyNames() needs global variable 'dir.rawData'. We need a function 'setFolders' or something, which makes global hidden variables, '.dir_rawData' etc. The user would run that the first time, to 'init' a project. dataMashR_init() ?
Run R CMD check
for list of missing items
the final step in loadStudies (or processStudy?) should be to convert variables to numeric that are listed as such in the variableDefinitions.csv file. Now, in baad, h.c (and many others) are character even though no values are contained that cannot be converted to numeric.
Current sets up object .mashrConfig. Instead enter as function arguments.
Looking at xtable, they use options() function to do this kind of stuff, e.g.
myfun <- function(type = getOption("xtable.type", "latex"), ...){}
Skeleton added, but more tests are needed or this is just going to tell you something has happened and give you no good clues.
We have a table already, put function directly into that table, e.g. x/100
Change code so that does not require studyName in datafile itself, rather gets this from name of directory (ie. single source)
Need to make list of things to check.
when addNewData adds a new variable conditionally on a lookupValue
, it fails when the lookupVariable
contains missing values. For example,
lookupVariable,lookupValue,newVariable,newValue,source
species,Cedrela odorata,family,Meliaceae,
species,Tabebuia rosea,family,Bignonaceae,
would fail if species
contains NA
in the data
I get this error after a while (when I have baad open, and have run loadStudies a few times)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'data/Aiba2005/dataImportOptions.csv': Too many open files
Somewhere in dataMashR, I assume some file is opened and not closed.
Currently we have functions that are used when we import new studies. It would be cool to have something that recognises new studies automatically, add them to the progress file (by the way, we also need a function that sets up this file initially).
Maybe adding a warning message when somebody tries to process studies but one of them was still not even set up.
For example,
dat <- processStudy("Epron2011")
Error in if (unit.from != unit.to) x <- match.fun(paste(unit.from, unit.to, :
missing value where TRUE/FALSE needed
The function (or the underlying one) should print a message WHICH variable did not have units (because units are not expected for all)
Needs to modify
Want to manage which data folders are complete or incomplete within data folder.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.