donders-institute / data-streamer Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 3.0 5.45 MB

A service and web UI for managing lab data flowing to the DCCN project storage and the Donders Repository

Home Page: https://uploader.dccn.nl

License: MIT License

JavaScript 47.74% Shell 4.24% Tcl 0.25% Dockerfile 1.22% HTML 0.16% TypeScript 45.62% Less 0.76%

data-streamer's People

Contributors

Stargazers

Watchers

Forkers

aardkronkel nian-jingqing yarikoptic

data-streamer's Issues

As a user, I want the data type `ieeg`

so that the information in the drop down is correct.

Change ieee to ieeg

As a product owner, I want the Donders Repository target path to be removed

on the web page, so that it is not visible in the user interface anymore.

The interfacing for the Donders Repository and the related authentication is shifted to phase 2 of this project.

Remove Donders Repository target path

Add target project storage path

In the streamer-ui add the target project storage path.

Add DCCN to logo

In streamer-ui add DCCN to logo.

Create a mock-up of the streamer ui

so that the design of the interface can be discussed and finalized before implementation.

how to upload files?
how to provide attributes such as project id, subject id, session id, and data type.
- authentication is a key to avoid mistake in streaming data to a wrong project.
- data_type should consists predefined labels as well as other by which user can provides his own data_type.

Implement a new modality module for handling streamer request from the ui

The code name is modalityUser.

The module is mapped to URL https://streamer:3000/user/.

It accepts a POST to the following URL for scheduling a streamer task:

https://streamer:3000/user/{project_id}/{subject_id}/{session_id}/{type}

During the execution of the streamer job, the user data is copied from a directory shared with the streamer ui, and assuming data is organized accordingly with the following structure:

{STREAMER_UI_BUFFER_DIR}/{project_id}/sub-{subject_id}/ses-{session_id}/{type}/*,

to the corresponding project storage, with the following directory structure:

/project/{project_id}/raw/sub-{subject_id}/ses-{session_id}/{type}.

As a user, I want to have a button to clear the filelist

so that I can empty the list of files to be uploaded.

Make a "clear all" button at the bottom of the filelist

Make use of subject and session input text fields instead of dropdown lists.

In the streamer-ui, make use of subject and session input text fields instead of dropdown lists.

Note: the cascading selection fields should be same as in the calendar booking form:
[Project, Subject, Session, Modality]

As a product owner, I want to have a production version of the data streamer

so that users can go to streamer.dccn.nl.

Relocate Jenkins to production server
Update Jenkins workflow

Remove `gateway` component from architecture diagram

Remove gateway component from architecture diagram

As a user, I want to submit the login form by pressing enter

so that I do not have to click on the button.

Allow submit with enter key

Implement streamer ui component

so that the user can interact with it using the web browser, and upload files to the buffer of the streamer ui.

As a product owner I want the maximum file size to be set to 1 GB

so that users can upload files larger than 1 MB.

Add line client_max_body_size 1000M; to NGNINX configuration
Update streamer ui client max file size settings

As a product owner, I want the users to contact the helpdesk in case of trouble

so that we a have a central point of contact.

Do not use the developer's email address but the help desk email address on the contact page

As a user, I want to see my DCCN username when I am logged in

so that I do not accidentally transfer files under someone else's username.

Show logged in DCCN username next to logout button

As a user, I want the title of the data streamer email to be correct

so that I can go to the folder of interest on the project storage.

Add missing /raw directory in title of streamer email

As the user, I want the streamer to protect me from accidentally upload data to a directory that I am not allow to write data.

Currently, the streamer is using the admin credential for copying data from the catchall collection into the project storage. Although the user can login successfully from the UI, the authorization of writing data to the destination project storage is not checked before the transfer is started by the admin user. This can result in data being accidentally uploaded to a directory in which the authenticated user is not allowed to write (e.g. a wrong project storage, or the user is in the "viewer" role of the project).

The authorization should be checked before the transfer is started by the admin user. Running the transfer using individual user's credential may be difficult as it requires read permission to data in the catchall collection.

enabling MEG dataflow to individual project storage

The implementation should be:

as long as the project number can be determined, data will flow into the individual project storage
if both the subject and session numbers can also be determined, the dataset is copied into a proper sub-directory (i.e. raw/sub-XXX/ses-megYYY) within the individual project storage; otherwise the dataset is copied right into the raw directory
the dataset name is kept as it is produced on the MEG console

At the moment, the project number is scanning the regex: "^.*(30[0-9]{5}\.{0,1}[0-9]{2}).*$" on the dataset name.

The subject-session number will be determined by scanning the sub([0-9]+)ses([0-9]+) pattern on the first part of the dataset name split by _.

As a user, I want to have a help link

so that I can read some documentation what I am expected to do.

Add help link
Add basic help documentation on Intranet

the data type "meg" should be added to the directory structure.

The directory structure should be
sub-xxx/ses-yyy/type/datafiles
where type=meg

in our case it happens to be that yyy=meg01, but in the MEG session multiple data types could be recorded (e.g. behavioral log files, eye tracker data), which would go in different type-specific directories under the same session.

this relates to line

data-streamer/docker/streamer/lib/modalityMEG.js

Line 190 in 9fdfd0e

 path_list.push(path.join(prefix_prj, 'raw', 'sub-' + m[1], 'ses-meg' + m[2], path.basename(ds))); 

get header information from CTF dataset

The following header fields are available

roboos@odin> dumpDshead /data/20161123/301202610tobnavs07_1200hz_20161123_01.ds
----------------------------------------
Dataset: /data/20161123/301202610tobnavs07_1200hz_20161123_01.ds
WARNING: Cannot connect to the database. Error code: 1045, Error message: Access denied for user: '[email protected]' (Using password: NO)
Collected 23-Nov-2016 starting at 11:53
Run Name: 301202610tobnavs07_1200hz_20161123_01
Run Titl: run title
Col Desc: 
Run Desc: 
Operator: operator
Patient : 301202610tobnavs07
Channels: 358
Samples : 12000 per trial
Rate    : 1200 samples/sec
Trials  : 372
Duration: 10 seconds/trial
Pre-trig: 0 seconds
File Gradient: 0
Current Gradient: 0
Units   : User
Sens Num: 4304
BW LPass: 300 Hz
BW HPass: 0 Hz

rsync child process being killed immediately

In modalityMEG.js, the 'rsync' process of the 'meg_copy.sh' called via child_process.spawn can eventually being killed by the system (kernel?) after few days of operation.

Restarting the streamer resolved the problem. The killing seems to be due to "running out of resource (memory in this case?)".

Create catchall project and DAC for all lab data streamed from the UI

Implement catchall for all lab data streamed

allow MRI data flow into individual project and collection if project number can be resolved

As long as the project number can be resolved from the DICOM tag PatientID, the data should be copied to the individual project and collection.

If one of the project and session numbers is not resolvable, the dataset will be named using the StudyDescription and StudyDate and StudyTime, and being put right inside the raw directory.

As a product owner I want to have usage statistics of the data streamer

so that I have insight into how the data streamer is used.

Enrich the logging of the streamer-ui server
Store the following information:
- username
- ip address
- start [timestamp]
- end [timestamp]
- max filesize [bytes]
- number of files
- total upload size [bytes]
- user agent (i.e. web browser, operating system)
- error if any
wontfix Make end point that gives stats report. It must provide the following information:
- Unique user count
- Number of sessions count
- Total bytes uploaded since start
- Top users - uploaded [bytes]
- Top users - number of sessions
- Largest file encountered [bytes]
- Web browser stats [%]
- Operating system stats [%]
wontfix Provide this info to Grafana
wontfix Update diagram

transition to new data flow

up to now we had (as cron under roboos@mentat001)

meg_copy.sh from odin to catch-all on central storage
meg_organize.sh from catch all to project organization on central storage
meg_archive.sh from catch-all on odin to catch-all on RDM
meg_cleanup.sh
meg_quality.sh

In the MRI catch-all project storage, organise data within the date folder under the 'raw' directory

Under the date folder, data are still organised by project/subject/session if those attributes are provided.

As a product owner, I want to see a complete design diagram of the streamer and streamer-ui components

A component diagram is sufficient.

Also illustrate the dataflow to project as well as the DR.

The diagrams should show the internal design of the two components, as well as interactions between components.

Add a service interface to accept calls from the acquisition computer of a modality that triggers the dataflow

This is to replace the "cron" approach of triggering data transfer.

The service URL is something like:

http://streamer.dccn.nl/:modality/:date/:ds

and the method should be POST. The call should be asynchronous, meaning that there needs a queue to maintain the incoming calls and perform relevant data transfer actions accordingly.

As a user, I want a login form with a fixed with (i.e. not a percentage width)

so that I get a pleasant user experience.

Use fixed width for login form

As a user, I want to receive an email if the stager is not allowed to write to a non-editable collection

so that I get informed.

streamer should send an email if stager is not allowed to write

How to handle the error when project storage (or RDM collection) is out of quota

There has been two times that streamer job failed due to the project storage is out of quota.

The baseline is that the data is anyway streamed to the catch-all RDM collection, but not to the project storage.

Currently I have to check the failed job and send a notification email manually (and for some reason, I didn't get feedback in one of the two cases).

Should it be improved?

Make a tar ball on the DICOM series level before uploading to the MRI catch-all collection in the data repository

As a user, I want to be able to upload another batch starting with the settings of the previous batch

so that I do not have to fill in project number, subject label, session label, and data type again.

When - after the 1st upload - I try to upload another batch, the project, sub and ses label remain on the right side. However, the data type disappears and the only way to get it back is by starting all over again (i.e. by selecting project,then sub, …).

Fix bug

detect dataset completion

Hi @hurngchunlee

I have already been thinking about (but not implementing) the detection of a dataset being complete, i.e. the same problem as faced with the dicom files. Here is the strategy that I came up with:

The acquisition software on odin starts creating a xxx.ds directory that contains the following files

xxx.res4
xxx.meg4
xxx.1_meg4
xxx.2_meg4
xxx.acq
xxx.eeg
xxx.hc
xxx.hist
xxx.infods
xxx.infods.bak
xxx.newds
BadChannels
ChannelGroupSet.cfg
ClassFile.cls

and the hz.ds subdirectory (and possibly a hz2.ds and more, see below).

The res4 file contains most header information (basically a dump of a C/C++ structure to disk). It is written at start of acquisition, and rewritten at the end upon closure. The meg4, 1_meg4, 2_meg4 etc files contain an 8-byte header followed by the data, each file being max 2GB large.

At the start, the acquisition software also initiates
xxx.meg4
xxx.1_meg4
xxx.2_meg4
xxx.3_meg4
xxx.4_meg4
etc
which all have the 8-byte header but no data. At the end of acquisition, these files (when not written to) are deleted again by the acquisition software. I.e. it pre-creates the files and cleans up at the end.

So one way to check whether a MEG dataset is currently being written to, is to check for the presence of 8-byte "meg4" data files. And if those do not exist, but there exist "meg4" datafilee larger than 8 bytes, the MEG dataset can be assumed to be finished.

At the start of acquisition there is always a head-localizer scan, which is reflected in the hz.ds subdir. It is actually a small nested dataset by itself. At the end of acquisition there can be one more (optional) which would be the hz2.ds. Since I don't know how this interacts with the closure procedure I think it would be good to wait for a few minutes after the main data seems to have closed to ensure that the final optional head-localizer is done (and all other header files are also closed and flushed to disk).

If needs be, I can of course look into the file creation in detail with a short dummy scan.

As a user, I want the upload button to be at the bottom of the destination card

so that I can find the button where I expect it.

Move button inside destination card, at the bottom

As a user, I want the upload progress bar to be based on file size instead of number of files

so that I get a better user experience.

base progress on uploaded bytes instead number of files

we need the new MEG console to be supported

its hostname is lab-meg001, the user name and password are the same as on the old console.

Below you can see and compare the location of the data.

[meg@odin ~]$ ls /ctfmeg/odin/data/meg/ACQ_Data/
20161209  20161212  20161213  20161214  20161215  20161216  20161219  20161220  20161221  20161222  20161223

[meg@lab-meg001 ~]$ ls /ctfmeg/ACQ_Data/
20161223

As a developer, I want to share the same configuration files between ui and the service component

Currently the streamer-ui and streamer service implement their own config files and the way the configuration is loaded. It would be ideal to factor out this and shared between ui and service.

idea:

single configuration file called streamer.conf
a common js function getConfig to load the configuration

hz.ds is not a separate dataset

@hurngchunlee I see a script that does

#!/bin/bash
raw_dir=$1
min_back=$2
find $raw_dir -type f -path '*.ds/*' -mmin -${min_back} | awk -F '.ds' '{print $1".ds"}' | sort | uniq

that would also detect hz.ds and hz[0-9]*.ds as datasets. Those are not separate datasets but belong to the containing ds folder. They contain the short head-localization measurement at the start (and sometimes in between and/or at the end).

Rename the session directory

The dash (-) between modality and digits should be removed. For example, the session directory for MRI be ses-mri01, and for MEG ses-meg01.

As a user, I want to see only the project numbers in the drop down list for which I am contributor or manager

so that I am not confused and that I know for sure I can write to these project folders.

Fetch a list of projects from the project database for which the user is manager or contributor
Show the list in the drop down

Wrong way to use the config module

data-streamer/streamer-ui/server/routes/mod_authentication.js

Line 5 in d797d19

 const adconfig = require(path.join(__dirname + '/../config/streamer-ui-adconfig.json')); 

@rutgervandeelen The config file should be loaded using the config module. This causes the error message I sent you few days ago:

[email protected]    | internal/modules/cjs/loader.js:638
[email protected]    |     throw err;
[email protected]    |     ^
[email protected]    | 
[email protected]    | Error: Cannot find module '/opt/streamer-ui-server/config/streamer-ui-adconfig.json'
[email protected]    |     at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15)
[email protected]    |     at Function.Module._load (internal/modules/cjs/loader.js:562:25)
[email protected]    |     at Module.require (internal/modules/cjs/loader.js:692:17)
[email protected]    |     at require (internal/modules/cjs/helpers.js:25:18)
[email protected]    |     at Object.<anonymous> (/opt/streamer-ui-server/routes/mod_authentication.js:5:18)
[email protected]    |     at Module._compile (internal/modules/cjs/loader.js:778:30)
[email protected]    |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
[email protected]    |     at Module.load (internal/modules/cjs/loader.js:653:32)
[email protected]    |     at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
[email protected]    |     at Function.Module._load (internal/modules/cjs/loader.js:585:3)

Furthermore, it can only be one default.json file that can be further overwritten by ${NODE_ENV}.json file under the config folder. The value of ${NODE_ENV} is provided as an environment variable. In the documentation, it also illustrates a way to use it for different environment (e.g. development, acceptance, production), which we might want to adopt.

As a product owner, I want the whole batch of files to be rejected if one of them already exists in the upload filelist

so that the user keeps a clear understanding what the file list of files to be uploaded exactly contains.

Currently, the rejection is for individual files. Other files in the batch are added to the file list. This is not what we want.

Add a check function to check if the filename already exists in the file list
Update the add file function

Create a ui subdirectory within the data-stream repository

so that the service for user interaction can be implemented in this folder, using TypeScript.

provision the code structure within the ui folder for development.

Reorganise data in to "year" subdirectory in DR

data-streamer/docker/streamer/lib/modalityMEG.js

Line 396 in 7399abc

src.replace(config.streamerDataDirRoot + '/', ''));

In the catchall collection for MEG, data should always be organised by date

Current the MEG datasets in the catchall project storage are organised by date (because the data is rsync-ed from the MEG modality); while in the catchall collection they are organised either in project folder or in date depending on whether the project number can be determined from the dataset name.

To make it consistent, we should organise the datasets in the catchall collection also by date.