gstt-csc / xnat Goto Github PK

View Code? Open in Web Editor NEW

15.0 6.0 0.0 2.43 MB

Tracks general documentation, standard operating procedures (SOP) and helper scripts for XNAT.

Python 88.57% Shell 11.43%

dicom gdcm nhs pydicom xnat xnat-ubuntu

xnat's Introduction

XNAT

View Repo . Report Error . Request Feature . Request Document

About The Project
- Tools
Getting Started
Overview
- Data collection
- Anonymisation
Resources
Roadmap
Contributing
Contact
Acknowledgements

About The Project

This project aims to track general documentation, standard operating procedures (SOP) and helper scripts for XNAT.

Tools

XNAT Imaging informatics platform which can be used to support a wide range of imaging-based projects
Grassroots DICOM (GDCM) C++ library for DICOM and ACR-NEMA medical images, which is automatically wrapped to Python, C#, Java and PHP (using SWIG)
Midnight Commander Visual file manager, which is licensed under GNU General Public License
PuTTY Terminal emulator, serial console and network file transfer application, which supports several network protocols, including SCP, SSH, Telnet, rlogin, and raw socket connection
WinSCP Free SFTP, SCP, S3 and FTP client for Windows
- If you are working on a remote environment that requires elevated privileges to install 3rd party software, download the portable executable version.

Getting Started

Contact Dika for access to XNAT.

Overview

XNAT is an platform capable of storing and managing medical images and associated data. Within Guy’s and St Thomas’ NHS Foundation Trust (GSTT), it forms a part of the local secure enclave for the purpose of federated learning in artificial intelligence projects. The data is ingested from PACS into XNAT where it is anonymised and sorted into relevant projects, ensuring data is only visible to those who need it, and allowing for data deletion upon project completion. The following describes the process of data collection, anonymisation (or more accurately, de-identification) and data storage in XNAT, as well as how compliance with DICOM Standards Supplement 142 is achieved.

Medical imaging data is typically stored in a DICOM format. DICOM stands for Digital Imaging and Communications in Medicine and is an international standard format for medical image storage, retrieval, processing and transfer. DICOM images consist of the actual acquired image as a set of pixels and a DICOM header. Data coded within the DICOM header are a series of attributes describing the scan and patient. Each attribute is tagged with a unique DICOM tag which consists of a group and element number, and each tag has a name to identify the type of information (or attribute) contained within the tag. This principle of data tagging allows DICOMs to be compared, transferred, stored and queried.

Before any medical data can be used in research or for training of artificial intelligence (AI) algorithms, it must first be completely anonymised/de-identified such that no data used can be traced back to any individual. To do this, the DICOM tags need to be altered, deleted or manipulated in such a way that the image no longer describes the individual. However, because there are many DICOM tags within a DICOM header and since what is and what is not identifiable information is not always straightforward, a DICOM Standards Supplement 142 was created. This outlines best de-identification practices for purposes of clinical trials, and we have adopted this same standard for our de-identification approach.

De-identification in XNAT is done at 2 levels; firstly, when data arrives into XNAT from PACS (site-wide de-identification) and secondly, when data is moved from the Pre-Archive into the assigned project (project-level de-identification).

Data collection

Data can be ingested into XNAT via two routes:

Pushed from PACS via teleradiology
Pulled from within XNAT using query-retrieve (Q/R)

Teleradiology is used for sending individual DICOM objects to XNAT. The sender must have access to and the necessary permissions to send from Sectra PACS (managed by the PACS team) to the GSTT_XNAT destination.

Q/R is used for importing batches of data. The data required can be manually searched by accession number, patient name, patient ID or the date range. Alternatively, the data can be requested by uploading a CSV file. The instructions on how to format the CSV file can be found here.

Data collection process

Before you start make yourself a cup of tea or get a snack, put on some soothing music or a podcast in the background. XNAT is slow and you will need to be patient with it else you will keep cancelling your own commands. Pop-ups can be slow and the amount of data moved takes much time. I recommend you start this at the end of the day instead of at the beginning and leave it to go over night to account for higher bandwidth demands on PACS during office hours.

Log into the GSTT network and open Chrome (it won't work well on Edge and it hasn't been tested on Mozilla).
Go to https://sp-pr-flipml01.gstt.local and log in with your username. If you do not have a username, contact Dika to make one for you.
If the project does not exist yet, contact Dika to create it. If you have any specific de-identification requirements, please also let her know about them.
To open your project page, click on Browse - My Projects and select the project you are working with and this will open your project page.
- If you have a small number of scans to upload (<100), click on Import From PACS on the right-hand side.
- If you have a large cohort, we recommended using the REST API to perform the upload and so please speak to Dika about this process.
If your dataset is reasonably small, use the data Q/R (DQR) route. Clicking on Import from PACS will open the DQR page which will allow you to query Sectra PACS for the requested images. You can use the DQR in two ways:
- By entering the search criteria
- By importing a CSV file

Entering the search criteria

You can use accession number or patient ID, date range, * for wildcard.

Click on Search PACS and wait for the results to show.
Once you find the correct scan(s), select these by ticking the checkbox on the left-hand side.
Click on the Begin Import button at the bottom of the screen. A pop-up will appear to select all series which are relevant. If you’re importing CTs, de-select the Dose Info, Dose Report and the Patient Protocol series, as they contain burnt-in patient data!
Click Import.

Importing a CSV file

Upload a correctly-formatted CSV file to perform a bulk search. The CSV file should contain either the unique accession numbers or a patient ID, and a study date (which should be YYYYMMDD to match the date formatting in PACS).
Click Import CSV at the top of the page, select your CSV file and click on Upload. The query will then begin, but please note it may take a while to complete. If an error occurs, test your CSV file by reducing it to only one query and see if that comes up with what you expect.
Once your query returns all your requested scans without issue, select which scans to import. **XNAT can only handle about 20-30 import queries at once, else it will get stuck on the Retrieving series information screen. If you’re importing CTs, de-select the Dose Info, Dose Report and the Patient Protocol series, as they contain burnt-in patient data!
Wait until you get a notification that the query had been sent to PACS.
Click on Close. You will need to upload the CSV again to select the next batch for import.

After following either of the above two data retrieval methods, you will need to:

Click on Upload - Go to prearchive to see the imported data.
Click Refresh if your data does not appear, or click on Upload - Import Queue/History as the data may take a while to transfer.
In the Pre-archive, you can review the details of your import, such as files and DICOM tags. Once you have confirmed you are happy with the data, select it in the Pre-archive by ticking the left-hand side checkbox.
Click on Change projects on the right-hand side and select your project. This will assign the data to your project, and you can then click to Archive it. This will move the data from the Pre-archive into your project folder.
You can now view, amend and interact with the data in the project Browse - My Projects - your project folder. You can check the DICOM tags of each data session by clicking on the Subject and hovering over the scan details until three icons appear, i.e. View Details to view tags, View Session to view the session, etc. You can also delete or download the images here or in the project details.

Anonymisation

The following image represents the data flow from within an NHS trust to their research environment, including the steps taken to remove all identifiable information from medical images to ensure no personal information ever leaves the NHS trust.

The two methods of de-identification are outlined below.

De-identification via teleradiology

Teleradiology is used to send individual DICOM objects from PACS to XNAT. As the data leave PACS, the following text is automatically appended to two DICOM tags, i.e. the Patient Comment Field (0010,4000) and Study Comment Field (0032,4000):

Project:Unassigned Subject:subj001 Session:subj001_sess001

This ensures that the patient name and accession number do not reach XNAT’s Pre-archive. It instead sets the subject ID to subj001 and the session ID as subj001_sess001. This means that all data which is sent via teleradiology from PACS will arrive in XNAT with the same subject and session ID. These are then manually changed when moving the data from the Pre-archive to the Archive by the project owner. The data in XNAT can be distinguished based on timestamp of arrival matched to the timestamp in Sectra PACS so the project owner can identify which DICOM object belongs to which study subject.

Only individual scans should be sent via this route – for batches larger than 3 to 5 scans, please use Q/R.

De-identification via Q/R

When data is imported using Q/R functionalities of XNAT, new subject ID and session ID can be assigned to each scan imported. It's recommended a .csv is used for DQR upload, in which case the subject and session IDs can be specified in directly in the CSV file.

In both of the above methods, all other data in the DICOM headers is removed, changed or replaced as detailed in the de-identification scripts, such as:

Patient age, size, weight, ethnic group, smoking status and pregnancy status are retained
Manufacturer private tags are removed
Some UIDs which do not contain any time of date data are also retained

The manipulation of the DICOM data for the purpose of de-identification follows the DICOM Standards Supplement 142, which specifies which tags should be removed, replaced, or manipulated to ensure traceability to individual from the data shared is not possible.

Data access and storage

The project owner has the reading, writing, updating and deleting rights of all data they own. They can grant access to other users to either view-only or modify the existing data. Data can also be shared between projects as read-only.

The data will be stored on on-premises physical storage managed and controlled by GSTT.

Resources

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contact

Acknowledgements

xnat's People

Contributors

Stargazers

Watchers

xnat's Issues

Create support accounts for users who may need to use sudo access

We should create separate user accounts and add these to the sudoers file so that we can better track:

Who needs access to root privileges
Which user is accessing root privileges, i.e. versus multiple users using the hnadmin account

This can be similar to GSTT's support accounts, e.g. GSTT\Suprt_<username>, which are used for sysadmin purposes.

Add SOP for Face Masking

We should add an SOP for face masking.

Create an error log to track issues and known solutions

This is to keep track of issues and solutions we find that may recur and which can be shared with other users

Update sudoers file and grant specific root user privileges to certain users

For importing data, data processing in the iFIND project, etc., users currently do not use their own account but instead use the hnadmin profile because it seems that is the only account that can access super-user privileges, i.e. is included in the sudoers file.

This means that if users need to run certain commands as the root user, we are less able to accurately monitor and audit user actions, resource usage, etc.

I suggest we:

Update the sudoers file so that certain users who require super-user privileges can do so via their own user profile rather than the hnadmin user profile
Identify which commands can/should be run by certain users, rather than only the root user, such as the data processing commands in the iFIND project

Add SOP for Data Checking

We should add an SOP for checking the data post-XNAT import.

Replace file upload with REST API call

We should use direct REST API calls to XNAT instead of using the JSON response files in report.py.

Add SOP for mandatory XNAT training

Add scripts to query CogStack

We should add scripts to query CogStack as an alternative to using the GUI.

Add example images for facemasking and DeID to training SOP or assets

So they can be shown to trainees during training for illustrative purposes

can this be made public?

is there a good reason why this is private? this could be great resource for others in NHS looking to replicate our environment.

Add SOP for Data Import

We should add an SOP for importing data from/into XNAT.

Introduce modality-specific anonymisation modules

This will allow us to make modality-specific scripts, cleaning up dicom tags and making the anonymisation process faster and more transparent

We need a script to make restAPI for dicom-query-retrieve-api batch function

Current method of querying rest API which retrieves the JSON of corresponding PACS matches allows an upload of approximately 100 queries at a time, else it times out. We need to be able to run queries of 1000 or more.

Should add paeds defacing to resources

Newly open sourced, we should add this to our XNAT:

https://github.com/d3b-center/peds-auto-defacing-public

Install XNAT ML Plugin

We should install the XNAT ML Plugin, particularly for the data labelling functionality in the XNAT OHIF Viewer.

Add calendar to schedule access to XNAT

We should add an easily-accessible calendar, e.g. to the CSC website, to display what days and time slots are reserved for certain pieces of XNAT work, as well as a guide on how to do so.

Add Frequently Asked Questions (FAQ) section

We should add an FAQ section to the repo and ideally, the CSC website as well.

I think this should include questions related to gaining access to XNAT and the Secure Enclave, setup, XNAT-related governance and data processing.

XNAT doesnt process fused PET images

It keeps throwing errors relating to repeated SOP instance in the 2 images which are overlayed

Automate reporting on data processing and transfers

Similar to the QMS, we should automate how we track and report on XNAT projects, data processed and transferred.

typo in anon-chart in 'cardiology'

Add special handling to import-tester script to deal with names that have ' in them

Currently this throws an error which leads to the scan not being ingested

This may be a good time to re-name the script to something sensible too

Upgrade to XNAT v1.8.7 from current v1.8.3

See release notes here.

Anonymisation script generator

Currently, we have several manually-created anonymisation scripts in this repo here (based on XNAT official guidance here). Since different projects may require different combinations of tags to be anonymised, and potentially specific anonymisation requirements per tag, it would be helpful to have a script that can produce this based on an input file e.g., required deanonymised tags.

Add SOP for ROI uploader

https://bitbucket.org/icrimaginginformatics/roiuploadassistant/src/master/

Add a script for series description extraction

This will help us do data quality checks en masse

Links to anonymisation scripts broken in README

If you try to open the anonymisation scripts from the links in the README, you get a 404.

https://github.com/GSTT-CSC/XNAT#de-identification-via-qr

Think they should be relinked to here:

https://github.com/GSTT-CSC/XNAT/tree/main/xnat-csc/helpers

Add SOP for XNAT access requests

We should add an SOP for XNAT access requests to the repo and ideally, the CSC website as well. This could also be included in/based on the QMS repo and as a template on this repo.

I think this should include details required in the initial access request, any required training and information governance, and the applicable role-based access control (RBAC).

For example:

Which project(s) does the request cover?
What are the contact details of the user?
How long is access required?
What type of data is required?
What is the volume of data required?
What is the expected access frequency/schedule?

Add project invoice calculator

An interactive calculator should be added to a widely-accessible space, e.g. the CSC website, to enable XNAT data requestors to input details about their request and receive a rough estimate of the timelines and costs applicable.

Examples:

Decide on branch versioning strategy

We should create a develop branch and release/ branches to keep track of what processes, scripts, etc. were used when.

Add scripts to query PIMS for demographic details

We should add scripts to query PIMS for demographic details (mainly NHS numbers for National Data Opt Out compliance).

We need an SOP for instructions on how to use each script current in xnat-csc scripts folder

There are quite a few scripts now so clear instructions should be added for each. Some are already covered in other SOPs and can be references from the script SOP

Need a trackable way for users to report issues

we need to write an SOP for incident reporting, set up a channel for incident reporting, and set up a way to audit incidents - particularly if unanonymised data is found in data we sent elsewhere.

Reverse proxy to host multiple apps on headnode

Since we host multiple applications on the headnode it would be easier to access non-XNAT applications if we could direct HTTP traffic on the headnode to multiple apps. We can do this by placing a reverse proxy such as nginx on the default HTTP ports 80/443 and redirecting based on the given URL.

We have requested a number of different URLs to be added to the GSTT DNS to facilitate accessing certain applications, such as the radiation safety URLs. We have also requested a generic csc.gstt.nhs.uk DNS entry so we can redirect to different applications using subdomains, this means we won't need to update the GSTT DNS every time we add/remove an application.

The diagram below shows this schematically:

We will still have the sp-pr-flipml01 URL which can redirect through nginx to XNAT, so no process changes should be needed.

@dangerdika, @hshuaib90, @heyhaleema, can you see any issues with this approach? Ideally we would set up a test proxy first before modifying XNAT. I think we could do this on the dgx which has equivalent networking (but no DNS entry currently) or we could do it on a different head node port.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.