donders-institute / data-streamer Goto Github PK
View Code? Open in Web Editor NEWA service and web UI for managing lab data flowing to the DCCN project storage and the Donders Repository
Home Page: https://uploader.dccn.nl
License: MIT License
A service and web UI for managing lab data flowing to the DCCN project storage and the Donders Repository
Home Page: https://uploader.dccn.nl
License: MIT License
so that the information in the drop down is correct.
ieee
to ieeg
on the web page, so that it is not visible in the user interface anymore.
The interfacing for the Donders Repository and the related authentication is shifted to phase 2 of this project.
In the streamer-ui
add the target project storage path.
In streamer-ui
add DCCN
to logo.
so that the design of the interface can be discussed and finalized before implementation.
project id
, subject id
, session id
, and data type
.
data_type
should consists predefined labels as well as other
by which user can provides his own data_type.The code name is modalityUser
.
The module is mapped to URL https://streamer:3000/user/
.
It accepts a POST
to the following URL for scheduling a streamer task:
https://streamer:3000/user/{project_id}/{subject_id}/{session_id}/{type}
During the execution of the streamer job, the user data is copied from a directory shared with the streamer ui, and assuming data is organized accordingly with the following structure:
{STREAMER_UI_BUFFER_DIR}/{project_id}/sub-{subject_id}/ses-{session_id}/{type}/*
,
to the corresponding project storage, with the following directory structure:
/project/{project_id}/raw/sub-{subject_id}/ses-{session_id}/{type}
.
so that I can empty the list of files to be uploaded.
In the streamer-ui
, make use of subject and session input text fields instead of dropdown lists.
Note: the cascading selection fields should be same as in the calendar booking form:
[Project, Subject, Session, Modality]
so that users can go to streamer.dccn.nl
.
Remove gateway
component from architecture diagram
so that I do not have to click on the button.
enter
keyso that the user can interact with it using the web browser, and upload files to the buffer of the streamer ui.
so that users can upload files larger than 1 MB.
client_max_body_size 1000M;
to NGNINX configurationso that we a have a central point of contact.
so that I do not accidentally transfer files under someone else's username.
so that I can go to the folder of interest on the project storage.
/raw
directory in title of streamer emailCurrently, the streamer is using the admin credential for copying data from the catchall collection into the project storage. Although the user can login successfully from the UI, the authorization of writing data to the destination project storage is not checked before the transfer is started by the admin user. This can result in data being accidentally uploaded to a directory in which the authenticated user is not allowed to write (e.g. a wrong project storage, or the user is in the "viewer" role of the project).
The authorization should be checked before the transfer is started by the admin user. Running the transfer using individual user's credential may be difficult as it requires read permission to data in the catchall collection.
The implementation should be:
raw/sub-XXX/ses-megYYY
) within the individual project storage; otherwise the dataset is copied right into the raw
directoryAt the moment, the project number is scanning the regex: "^.*(30[0-9]{5}\.{0,1}[0-9]{2}).*$"
on the dataset name.
The subject-session number will be determined by scanning the sub([0-9]+)ses([0-9]+)
pattern on the first part of the dataset name split by _
.
so that I can read some documentation what I am expected to do.
The directory structure should be
sub-xxx/ses-yyy/type/datafiles
where type=meg
in our case it happens to be that yyy=meg01, but in the MEG session multiple data types could be recorded (e.g. behavioral log files, eye tracker data), which would go in different type-specific directories under the same session.
this relates to line
The following header fields are available
roboos@odin> dumpDshead /data/20161123/301202610tobnavs07_1200hz_20161123_01.ds
----------------------------------------
Dataset: /data/20161123/301202610tobnavs07_1200hz_20161123_01.ds
WARNING: Cannot connect to the database. Error code: 1045, Error message: Access denied for user: '[email protected]' (Using password: NO)
Collected 23-Nov-2016 starting at 11:53
Run Name: 301202610tobnavs07_1200hz_20161123_01
Run Titl: run title
Col Desc:
Run Desc:
Operator: operator
Patient : 301202610tobnavs07
Channels: 358
Samples : 12000 per trial
Rate : 1200 samples/sec
Trials : 372
Duration: 10 seconds/trial
Pre-trig: 0 seconds
File Gradient: 0
Current Gradient: 0
Units : User
Sens Num: 4304
BW LPass: 300 Hz
BW HPass: 0 Hz
In modalityMEG.js, the 'rsync' process of the 'meg_copy.sh' called via child_process.spawn can eventually being killed by the system (kernel?) after few days of operation.
Restarting the streamer resolved the problem. The killing seems to be due to "running out of resource (memory in this case?)".
Implement catchall for all lab data streamed
As long as the project number can be resolved from the DICOM tag PatientID
, the data should be copied to the individual project and collection.
If one of the project and session numbers is not resolvable, the dataset will be named using the StudyDescription
and StudyDate
and StudyTime
, and being put right inside the raw
directory.
so that I have insight into how the data streamer is used.
up to now we had (as cron under roboos@mentat001)
meg_copy.sh from odin to catch-all on central storage
meg_organize.sh from catch all to project organization on central storage
meg_archive.sh from catch-all on odin to catch-all on RDM
meg_cleanup.sh
meg_quality.sh
Under the date folder, data are still organised by project/subject/session if those attributes are provided.
A component diagram is sufficient.
Also illustrate the dataflow to project as well as the DR.
The diagrams should show the internal design of the two components, as well as interactions between components.
This is to replace the "cron" approach of triggering data transfer.
The service URL is something like:
http://streamer.dccn.nl/:modality/:date/:ds
and the method should be POST
. The call should be asynchronous, meaning that there needs a queue to maintain the incoming calls and perform relevant data transfer actions accordingly.
so that I get a pleasant user experience.
so that I get informed.
There has been two times that streamer job failed due to the project storage is out of quota.
The baseline is that the data is anyway streamed to the catch-all RDM collection, but not to the project storage.
Currently I have to check the failed job and send a notification email manually (and for some reason, I didn't get feedback in one of the two cases).
Should it be improved?
so that I do not have to fill in project number, subject label, session label, and data type again.
When - after the 1st upload - I try to upload another batch, the project, sub and ses label remain on the right side. However, the data type disappears and the only way to get it back is by starting all over again (i.e. by selecting project,then sub, โฆ).
I have already been thinking about (but not implementing) the detection of a dataset being complete, i.e. the same problem as faced with the dicom files. Here is the strategy that I came up with:
The acquisition software on odin starts creating a xxx.ds directory that contains the following files
xxx.res4
xxx.meg4
xxx.1_meg4
xxx.2_meg4
xxx.acq
xxx.eeg
xxx.hc
xxx.hist
xxx.infods
xxx.infods.bak
xxx.newds
BadChannels
ChannelGroupSet.cfg
ClassFile.cls
and the hz.ds subdirectory (and possibly a hz2.ds and more, see below).
The res4 file contains most header information (basically a dump of a C/C++ structure to disk). It is written at start of acquisition, and rewritten at the end upon closure. The meg4, 1_meg4, 2_meg4 etc files contain an 8-byte header followed by the data, each file being max 2GB large.
At the start, the acquisition software also initiates
xxx.meg4
xxx.1_meg4
xxx.2_meg4
xxx.3_meg4
xxx.4_meg4
etc
which all have the 8-byte header but no data. At the end of acquisition, these files (when not written to) are deleted again by the acquisition software. I.e. it pre-creates the files and cleans up at the end.
So one way to check whether a MEG dataset is currently being written to, is to check for the presence of 8-byte "meg4" data files. And if those do not exist, but there exist "meg4" datafilee larger than 8 bytes, the MEG dataset can be assumed to be finished.
At the start of acquisition there is always a head-localizer scan, which is reflected in the hz.ds subdir. It is actually a small nested dataset by itself. At the end of acquisition there can be one more (optional) which would be the hz2.ds. Since I don't know how this interacts with the closure procedure I think it would be good to wait for a few minutes after the main data seems to have closed to ensure that the final optional head-localizer is done (and all other header files are also closed and flushed to disk).
If needs be, I can of course look into the file creation in detail with a short dummy scan.
so that I can find the button where I expect it.
so that I get a better user experience.
its hostname is lab-meg001, the user name and password are the same as on the old console.
Below you can see and compare the location of the data.
[meg@odin ~]$ ls /ctfmeg/odin/data/meg/ACQ_Data/
20161209 20161212 20161213 20161214 20161215 20161216 20161219 20161220 20161221 20161222 20161223
[meg@lab-meg001 ~]$ ls /ctfmeg/ACQ_Data/
20161223
Currently the streamer-ui and streamer service implement their own config files and the way the configuration is loaded. It would be ideal to factor out this and shared between ui and service.
idea:
streamer.conf
getConfig
to load the configuration@hurngchunlee I see a script that does
#!/bin/bash
raw_dir=$1
min_back=$2
find $raw_dir -type f -path '*.ds/*' -mmin -${min_back} | awk -F '.ds' '{print $1".ds"}' | sort | uniq
that would also detect hz.ds and hz[0-9]*.ds as datasets. Those are not separate datasets but belong to the containing ds folder. They contain the short head-localization measurement at the start (and sometimes in between and/or at the end).
The dash (-
) between modality and digits should be removed. For example, the session directory for MRI be ses-mri01
, and for MEG ses-meg01
.
so that I am not confused and that I know for sure I can write to these project folders.
@rutgervandeelen The config file should be loaded using the config module. This causes the error message I sent you few days ago:
[email protected] | internal/modules/cjs/loader.js:638
[email protected] | throw err;
[email protected] | ^
[email protected] |
[email protected] | Error: Cannot find module '/opt/streamer-ui-server/config/streamer-ui-adconfig.json'
[email protected] | at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15)
[email protected] | at Function.Module._load (internal/modules/cjs/loader.js:562:25)
[email protected] | at Module.require (internal/modules/cjs/loader.js:692:17)
[email protected] | at require (internal/modules/cjs/helpers.js:25:18)
[email protected] | at Object.<anonymous> (/opt/streamer-ui-server/routes/mod_authentication.js:5:18)
[email protected] | at Module._compile (internal/modules/cjs/loader.js:778:30)
[email protected] | at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
[email protected] | at Module.load (internal/modules/cjs/loader.js:653:32)
[email protected] | at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
[email protected] | at Function.Module._load (internal/modules/cjs/loader.js:585:3)
Furthermore, it can only be one default.json
file that can be further overwritten by ${NODE_ENV}.json
file under the config
folder. The value of ${NODE_ENV}
is provided as an environment variable. In the documentation, it also illustrates a way to use it for different environment (e.g. development, acceptance, production), which we might want to adopt.
so that the user keeps a clear understanding what the file list of files to be uploaded exactly contains.
Currently, the rejection is for individual files. Other files in the batch are added to the file list. This is not what we want.
so that the service for user interaction can be implemented in this folder, using TypeScript.
provision the code structure within the ui
folder for development.
Current the MEG datasets in the catchall project storage are organised by date (because the data is rsync-ed from the MEG modality); while in the catchall collection they are organised either in project folder or in date depending on whether the project number can be determined from the dataset name.
To make it consistent, we should organise the datasets in the catchall collection also by date.
This can be an issue with scans of a pilot project of PI.
so that I do not have to fill in everything again in the uploader page.
See this Orthanc page: http://pacs.dccn.nl:8042/app/explorer.html#patient?uuid=8856067c-f691809f-0744ce9e-cb0ea5a0-43439c79
It results in data ends up in
/project/3055010.01/raw//unkown_
and
/rdm/di/dccn/DAC_3055010.01_490/raw//unknown_
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.