Giter VIP home page Giter VIP logo

data-streamer's People

Contributors

dependabot[bot] avatar hurngchunlee avatar robertoostenveld avatar yannickman avatar yarikoptic avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

data-streamer's Issues

Create a mock-up of the streamer ui

so that the design of the interface can be discussed and finalized before implementation.

  • how to upload files?
  • how to provide attributes such as project id, subject id, session id, and data type.
    • authentication is a key to avoid mistake in streaming data to a wrong project.
    • data_type should consists predefined labels as well as other by which user can provides his own data_type.

Implement a new modality module for handling streamer request from the ui

The code name is modalityUser.

The module is mapped to URL https://streamer:3000/user/.

It accepts a POST to the following URL for scheduling a streamer task:

https://streamer:3000/user/{project_id}/{subject_id}/{session_id}/{type}

During the execution of the streamer job, the user data is copied from a directory shared with the streamer ui, and assuming data is organized accordingly with the following structure:

{STREAMER_UI_BUFFER_DIR}/{project_id}/sub-{subject_id}/ses-{session_id}/{type}/*,

to the corresponding project storage, with the following directory structure:

/project/{project_id}/raw/sub-{subject_id}/ses-{session_id}/{type}.

As the user, I want the streamer to protect me from accidentally upload data to a directory that I am not allow to write data.

Currently, the streamer is using the admin credential for copying data from the catchall collection into the project storage. Although the user can login successfully from the UI, the authorization of writing data to the destination project storage is not checked before the transfer is started by the admin user. This can result in data being accidentally uploaded to a directory in which the authenticated user is not allowed to write (e.g. a wrong project storage, or the user is in the "viewer" role of the project).

The authorization should be checked before the transfer is started by the admin user. Running the transfer using individual user's credential may be difficult as it requires read permission to data in the catchall collection.

enabling MEG dataflow to individual project storage

The implementation should be:

  • as long as the project number can be determined, data will flow into the individual project storage
  • if both the subject and session numbers can also be determined, the dataset is copied into a proper sub-directory (i.e. raw/sub-XXX/ses-megYYY) within the individual project storage; otherwise the dataset is copied right into the raw directory
  • the dataset name is kept as it is produced on the MEG console

At the moment, the project number is scanning the regex: "^.*(30[0-9]{5}\.{0,1}[0-9]{2}).*$" on the dataset name.

The subject-session number will be determined by scanning the sub([0-9]+)ses([0-9]+) pattern on the first part of the dataset name split by _.

the data type "meg" should be added to the directory structure.

The directory structure should be
sub-xxx/ses-yyy/type/datafiles
where type=meg

in our case it happens to be that yyy=meg01, but in the MEG session multiple data types could be recorded (e.g. behavioral log files, eye tracker data), which would go in different type-specific directories under the same session.

this relates to line

path_list.push(path.join(prefix_prj, 'raw', 'sub-' + m[1], 'ses-meg' + m[2], path.basename(ds)));

get header information from CTF dataset

The following header fields are available

roboos@odin> dumpDshead /data/20161123/301202610tobnavs07_1200hz_20161123_01.ds
----------------------------------------
Dataset: /data/20161123/301202610tobnavs07_1200hz_20161123_01.ds
WARNING: Cannot connect to the database. Error code: 1045, Error message: Access denied for user: '[email protected]' (Using password: NO)
Collected 23-Nov-2016 starting at 11:53
Run Name: 301202610tobnavs07_1200hz_20161123_01
Run Titl: run title
Col Desc: 
Run Desc: 
Operator: operator
Patient : 301202610tobnavs07
Channels: 358
Samples : 12000 per trial
Rate    : 1200 samples/sec
Trials  : 372
Duration: 10 seconds/trial
Pre-trig: 0 seconds
File Gradient: 0
Current Gradient: 0
Units   : User
Sens Num: 4304
BW LPass: 300 Hz
BW HPass: 0 Hz

rsync child process being killed immediately

In modalityMEG.js, the 'rsync' process of the 'meg_copy.sh' called via child_process.spawn can eventually being killed by the system (kernel?) after few days of operation.

Restarting the streamer resolved the problem. The killing seems to be due to "running out of resource (memory in this case?)".

As a product owner I want to have usage statistics of the data streamer

so that I have insight into how the data streamer is used.

  • Enrich the logging of the streamer-ui server
  • Store the following information:
    • username
    • ip address
    • start [timestamp]
    • end [timestamp]
    • max filesize [bytes]
    • number of files
    • total upload size [bytes]
    • user agent (i.e. web browser, operating system)
    • error if any
  • wontfix Make end point that gives stats report. It must provide the following information:
    • Unique user count
    • Number of sessions count
    • Total bytes uploaded since start
    • Top users - uploaded [bytes]
    • Top users - number of sessions
    • Largest file encountered [bytes]
    • Web browser stats [%]
    • Operating system stats [%]
  • wontfix Provide this info to Grafana
  • wontfix Update diagram

transition to new data flow

up to now we had (as cron under roboos@mentat001)

meg_copy.sh from odin to catch-all on central storage
meg_organize.sh from catch all to project organization on central storage
meg_archive.sh from catch-all on odin to catch-all on RDM
meg_cleanup.sh
meg_quality.sh

How to handle the error when project storage (or RDM collection) is out of quota

There has been two times that streamer job failed due to the project storage is out of quota.

The baseline is that the data is anyway streamed to the catch-all RDM collection, but not to the project storage.

Currently I have to check the failed job and send a notification email manually (and for some reason, I didn't get feedback in one of the two cases).

Should it be improved?

detect dataset completion

Hi @hurngchunlee

I have already been thinking about (but not implementing) the detection of a dataset being complete, i.e. the same problem as faced with the dicom files. Here is the strategy that I came up with:

The acquisition software on odin starts creating a xxx.ds directory that contains the following files

xxx.res4
xxx.meg4
xxx.1_meg4
xxx.2_meg4
xxx.acq
xxx.eeg
xxx.hc
xxx.hist
xxx.infods
xxx.infods.bak
xxx.newds
BadChannels
ChannelGroupSet.cfg
ClassFile.cls

and the hz.ds subdirectory (and possibly a hz2.ds and more, see below).

The res4 file contains most header information (basically a dump of a C/C++ structure to disk). It is written at start of acquisition, and rewritten at the end upon closure. The meg4, 1_meg4, 2_meg4 etc files contain an 8-byte header followed by the data, each file being max 2GB large.

At the start, the acquisition software also initiates
xxx.meg4
xxx.1_meg4
xxx.2_meg4
xxx.3_meg4
xxx.4_meg4
etc
which all have the 8-byte header but no data. At the end of acquisition, these files (when not written to) are deleted again by the acquisition software. I.e. it pre-creates the files and cleans up at the end.

So one way to check whether a MEG dataset is currently being written to, is to check for the presence of 8-byte "meg4" data files. And if those do not exist, but there exist "meg4" datafilee larger than 8 bytes, the MEG dataset can be assumed to be finished.

At the start of acquisition there is always a head-localizer scan, which is reflected in the hz.ds subdir. It is actually a small nested dataset by itself. At the end of acquisition there can be one more (optional) which would be the hz2.ds. Since I don't know how this interacts with the closure procedure I think it would be good to wait for a few minutes after the main data seems to have closed to ensure that the final optional head-localizer is done (and all other header files are also closed and flushed to disk).

If needs be, I can of course look into the file creation in detail with a short dummy scan.

we need the new MEG console to be supported

its hostname is lab-meg001, the user name and password are the same as on the old console.

Below you can see and compare the location of the data.

[meg@odin ~]$ ls /ctfmeg/odin/data/meg/ACQ_Data/
20161209  20161212  20161213  20161214  20161215  20161216  20161219  20161220  20161221  20161222  20161223
[meg@lab-meg001 ~]$ ls /ctfmeg/ACQ_Data/
20161223 

hz.ds is not a separate dataset

@hurngchunlee I see a script that does

#!/bin/bash
raw_dir=$1
min_back=$2
find $raw_dir -type f -path '*.ds/*' -mmin -${min_back} | awk -F '.ds' '{print $1".ds"}' | sort | uniq

that would also detect hz.ds and hz[0-9]*.ds as datasets. Those are not separate datasets but belong to the containing ds folder. They contain the short head-localization measurement at the start (and sometimes in between and/or at the end).

Rename the session directory

The dash (-) between modality and digits should be removed. For example, the session directory for MRI be ses-mri01, and for MEG ses-meg01.

Wrong way to use the config module

const adconfig = require(path.join(__dirname + '/../config/streamer-ui-adconfig.json'));

@rutgervandeelen The config file should be loaded using the config module. This causes the error message I sent you few days ago:

[email protected]    | internal/modules/cjs/loader.js:638
[email protected]    |     throw err;
[email protected]    |     ^
[email protected]    | 
[email protected]    | Error: Cannot find module '/opt/streamer-ui-server/config/streamer-ui-adconfig.json'
[email protected]    |     at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15)
[email protected]    |     at Function.Module._load (internal/modules/cjs/loader.js:562:25)
[email protected]    |     at Module.require (internal/modules/cjs/loader.js:692:17)
[email protected]    |     at require (internal/modules/cjs/helpers.js:25:18)
[email protected]    |     at Object.<anonymous> (/opt/streamer-ui-server/routes/mod_authentication.js:5:18)
[email protected]    |     at Module._compile (internal/modules/cjs/loader.js:778:30)
[email protected]    |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
[email protected]    |     at Module.load (internal/modules/cjs/loader.js:653:32)
[email protected]    |     at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
[email protected]    |     at Function.Module._load (internal/modules/cjs/loader.js:585:3)

Furthermore, it can only be one default.json file that can be further overwritten by ${NODE_ENV}.json file under the config folder. The value of ${NODE_ENV} is provided as an environment variable. In the documentation, it also illustrates a way to use it for different environment (e.g. development, acceptance, production), which we might want to adopt.

In the catchall collection for MEG, data should always be organised by date

Current the MEG datasets in the catchall project storage are organised by date (because the data is rsync-ed from the MEG modality); while in the catchall collection they are organised either in project folder or in date depending on whether the project number can be determined from the dataset name.

To make it consistent, we should organise the datasets in the catchall collection also by date.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.