srdc / tofhir Goto Github PK

Mapping toolset to migrate/transform existing datasets to HL7 FHIR

License: Apache License 2.0

Scala 99.10% Shell 0.74% Dockerfile 0.16%

tofhir's Issues

mvn test does not run some test cases although they are detected

For example, mvn test detects 6 test cases for FhirMappingFolderRepositoryTest class, runs 2 of them and prints All tests passed message ingoring the 4 test cases.

Problem with reading archived logs

When the log file grows, old logs are archived as a zip file. When there are logs belonging to the same exection in the archived logs and the last log file read, the logs could be splitted and the log-server gives an error.

Implement joining of multiple source data for batch mapping jobs

reload command sometimes does not work

reload command of tofhir-engine sometimes work, sometimes it does not. I simply update a mapping and run reload command, then I try to run it, the old mapping is executed.

Execution Error Handling

If any error occurs during the execution process, the job should stop or continue according to the error handling option.

Implement for non-stream data sources(file and SQL).

Implement for stream data sources(Kafka).

The errors that are not row-based (malformed URL, wrong file extent, etc.) should be appropriately handled.

When the csv file is not found, an appropriate error should be logged.

When the sink url is not reachable, an appropriate error should be logged.

When the csv columns and schema columns do not match, an appropriate error should be logged.

Definitions are not deleted from memory when the project is deleted

When a project is deleted, we only delete the folders in the repo.
They are not deleted from caches. When a project is deleted, we need to clear the caches from the memories.

Bugs: FhirDefinitionService when trying coverChild mappings

Resource: https://hl7.org/fhir/R4/QuestionnaireResponse.html
Endpoint:
http://localhost:8085/tofhir/fhir-definitions?q=elements&profile=http://hl7.org/fhir/StructureDefinition/QuestionnaireResponse

onFHIR: Default configurations without common data model
toFHIR Mapping Repo: https://gitlab.srdc.com.tr/medic/coverchild-integrations

According to the FHIR website, there should be a value[x] element under item.answer object. But it does not exist in the response from toFHIR API.

Also, according to the FHIR docs, there should be an object with path item.answer.item. But it does not exist.
Paths of elements under item.item are wrong. For example, marked path should be item.item.linkId instead of item.linkId

In the initialization of tofhir-server, some structure definitions are not validated

If tofhir-server has definitions-root-urls = ["http://hl7.org/fhir/"] in the application.conf file, we get the following exception:

Exception in thread "main" io.onfhir.exception.InitializationException: Some of the given infrastructure resources (http://hl7.org/fhir/StructureDefinition/clinicaldocument,http://hl7.org/fhir/StructureDefinition/Composition,http://hl7.org/fhir/StructureDefinition/catalog) of type StructureDefinition does not conform to base FHIR specification! http://hl7.org/fhir/StructureDefinition/clinicaldocument :: JObject(List((severity,JString(error)), (code,JString(invalid)), (diagnostics,JString(Invalid value 'http://terminology.hl7.org/ValueSet/v3-ConfidentialityClassification|2014-03-26' for FHIR primitive type 'canonical'!)), (expression,JArray(List(JString(snapshot.element[19].binding.valueSet)))))),JObject(List((severity,JString(warning)), (code,JString(invalid)), (diagnostics,JString(Constraint 'sdf-0' is not satisfied for the given value! Constraint Description: 'Name should be usable as an identifier for the module by machine processing applications such as code generation'. FHIR Path expression: 'name.matches('[A-Z]([A-Za-z0-9_]){0,254}')')), (expression,JArray(List(JString($this))))))
http://hl7.org/fhir/StructureDefinition/Composition :: JObject(List((severity,JString(error)), (code,JString(invalid)), (diagnostics,JString(Invalid value 'http://terminology.hl7.org/ValueSet/v3-ConfidentialityClassification|2014-03-26' for FHIR primitive type 'canonical'!)), (expression,JArray(List(JString(snapshot.element[18].binding.valueSet)))))),JObject(List((severity,JString(error)), (code,JString(invalid)), (diagnostics,JString(Invalid value 'http://terminology.hl7.org/ValueSet/v3-ConfidentialityClassification|2014-03-26' for FHIR primitive type 'canonical'!)), (expression,JArray(List(JString(differential.element[10].binding.valueSet))))))
http://hl7.org/fhir/StructureDefinition/catalog :: JObject(List((severity,JString(error)), (code,JString(invalid)), (diagnostics,JString(Invalid value 'http://terminology.hl7.org/ValueSet/v3-ConfidentialityClassification|2014-03-26' for FHIR primitive type 'canonical'!)), (expression,JArray(List(JString(snapshot.element[19].binding.valueSet)))))),JObject(List((severity,JString(warning)), (code,JString(invalid)), (diagnostics,JString(Constraint 'sdf-0' is not satisfied for the given value! Constraint Description: 'Name should be usable as an identifier for the module by machine processing applications such as code generation'. FHIR Path expression: 'name.matches('[A-Z]([A-Za-z0-9_]){0,254}')')), (expression,JArray(List(JString($this))))))
	at io.onfhir.config.BaseFhirConfigurator.validateGivenInfrastructureResources(BaseFhirConfigurator.scala:190)
	at io.onfhir.config.BaseFhirConfigurator.initializePlatform(BaseFhirConfigurator.scala:85)
	at io.tofhir.server.service.FhirDefinitionsService.<init>(FhirDefinitionsService.scala:55)
	at io.tofhir.server.endpoint.FhirDefinitionsEndpoint.<init>(FhirDefinitionsEndpoint.scala:16)
	at io.tofhir.server.endpoint.ToFhirServerEndpoint.<init>(ToFhirServerEndpoint.scala:36)
	at io.tofhir.server.ToFhirServer$.start(ToFhirServer.scala:15)
	at io.tofhir.server.Boot$.delayedEndpoint$io$tofhir$server$Boot$1(Boot.scala:4)
	at io.tofhir.server.Boot$delayedInit$body.apply(Boot.scala:3)

Refactor custom exceptions to introduce a hierarchy

Currently, we have the following exceptions:

EngineInitializationException extends Exception
FhirMappingException extends Exception
FhirMappingInvalidResourceException extends Exception

We'll implement an umbrella exception class for the whole system and the others will extend this for specific things for example: FhirMappingException (exceptions during mapping execution), FhirWriteException (exceptions while communicating with FHIR Server), FhirSourceReadException (exceptions while reading source data) etc.

SimpleStructureDefinitionService: Bug with sliced element

Profile: https://aiccelerate.eu/fhir/StructureDefinition/AIC-Practitioner
Search for qualification element in json response returned by SimpleStructureDefinitionService. It is sliced and has two elements (slices) as children: mainQualification and No Slice.
There is a problem with No Slice section. No Slice should have 4 elements as its children but it has 1 instead. It seems service creates an extra element between its children and No Slice element. See the image below:

Return details of custom function libraries

As the title indicates, details of custom function libraries should be passed to the frontend to be able to provide suggestions on them.

converting time-series data to fhir resource

In case of time-series type of source data, multiple rows should be mapped to the one fhir resource.
As in example, each number in the data field may be representing different row on the source.
Note: Is this really a requirement?

  ...
   "valueSampledData": {
      "origin": {
         "value": "0.0",
         "unit": "mg/dl",
         "system": "http://unitsofmeasure.org",
         "code": "mg/dl"
      },
      "period": "512.0",
      "dimensions": "1",
      "data": "99 103 108 114 121 128 132 137 142 148 157 192 197 201 205 208 206 198 207 171 157 143 128 115 106 103 107 103 110 122 138 154 165 170 176 184 188 188 194 198 208 211 215 212 213 216 220 225 228 231 238 239 240 244 249 252 255 256 257 257 257 254 255 258 259 260 254 244 230 214 198 185 177 180 173 174 174 176 177 176 176 174 172 170 167 165 164 162 162 161 162 159 156 153 152 148 141 143 144 147 148 146 144 144 142 142 142 141 139 137 132 130 130 125 121 105 102 100 97 95 92 90 84 84 84 82 79 77 76 74 74 75 73 73 76 77 78 78 79 79 80"
   },
  ...

Implement a mechanism to patch existing resources based on mapped data via FHIR Patch interaction

Enable users to map certain information to FHIR Path (http://hl7.org/fhir/fhirpatch.html) or JSON Patch(https://tools.ietf.org/html/rfc6902.) content which then can be used to patch a specific existing record with supplied values by executing FHIR Patch interaction.

e.g. Add a condition to EpisodeOfCare as the main diagnosis as a reference via FHIR Patch
{
"expression": {
"name": "result",
"language": "application/fhir-template+json",
"value": [
{
"op": "add",
"path": "/diagnosis/-",
"value": {
"condition": {
"reference": "Condition/{{conditionId}}"
}
}
}
]
},
"interaction": "json-patch",
"rid": "{{episodeId}}"
}

Handle job deletion if its running

If a running job is wanted to be deleted, either

we should prevent user deleting by showing an warninig message
or shut down the execution of that job silently in the backend

Missing sliceName property in sliced element

Profile: http://hl7.org/fhir/StructureDefinition/bp
Search for component element in json response returned by SimpleStructureDefinitionService. It is sliced and has 5 elements (slices) as children: No Slice, SystolicBP, SBPCode, DiastolicBP, DBPCode.
SBPCode and DBPCode slices are missing sliceName property.

Reload mappings

It would be nice if mappings referred by a mapping job are re-fetched from the file system when the mapping jobs is loaded again

Show appropriate error message when project folders do not exist on init

It is necessary to display a clear error message when there are no project folders for all definitions (e.g. schemas, mappings).

Log FhirMappingJobResult if the validation of some resources fails

If the validation of some resources fails while execution a mapping with sinkSettings.errorHandling = halt, we do not get the FhirMappingJobResult log.

However, if we use sinkSettings.errorHandling = continue, we get it.

Execution Manager

This issue describes how each execution of mapping jobs will be managed.

Jobs should be run in an asynchronous manner. If the job to be executed is valid, the execution should start in the background and job submission result should return to the client immediately. If the submitted job is not valid, an appropriate error message should be returned.
An ExecutionManager component should keep track of the active (i.e. running) jobs. This component should provide an API to stop running executions.
It should also be possible to start and stop individual mapping tasks. This would be required when a mapping is updated and it should be restarted.
There should be only one execution of a mapping task at the same time.
For file system streaming, processed files should be archived; mappings with errors should be aggregated in a separate file. This should be configurable.

Scenarios:

System restart / crash

Kafka stream
- Clear checkpoints config
  - true -> Existing checkpoint directory for the job will be deleted. Earliest and latest configs will apply as expected
  - false -> Records will be read as of the last offset
File system stream
- Clear checkpoints config
  - true -> Existing checkpoint directory for the job will be deleted. Existing files in the data source directory will be reprocessed. Users will need to put already processed files again to the data source directory monitored by Spark.
  - false -> Processing will continue from the last read point.

Mapping update (only the updated mapping task will be restarted)

Kafka stream
- Clear checkpoints config
  - true -> Existing checkpoint directory for the mapping will be deleted. Earliest and latest configs will apply as expected. (Note that if the config is set to latest, old records won't be affected by the mapping updates)
  - false -> Records will be read as of the last offset. Old records won't be affected from the mapping updates.
File system stream
- Clear checkpoints config
  - true -> Existing checkpoint directory for the mapping will be deleted. Existing files in the data source directory will be reprocessed. Users will need to put already processed files again to the data source directory monitored by Spark.
  - false -> Processing will continue from the last read point. Already processed records won't be affected from the mapping updates.

Technical specs:

Checkpoints should be per job (not per execution to be able to continue execution of a job in case of a crash) and per mapping task included in the job
There should be a configuration to clear Spark's checkpoint directory for a job as a whole and for individual mappings.

Sub-issues:

Available Bugs

When running a file streaming job, if new data source file is put in configured streaming folder, mapping task count is increased for that execution although the same mapping task is used.

toFHIR consumes too much resources

While running a streaming mapping job even if it does not process anything i.e. file, it consumes too much CPU and memory. Is there any Spark configuration to resolve this problem ?

Make sure that FhirMappingJobResult log is printed for all the combination of error handling settings

We can configure the error handling settings of a mapping job using mappingErrorHandling and sinkSettings.errorHandling options.
This gives us the following combinations:

continue | continue
continue | halt
halt | continue
halt | halt

Further, we should test it with batch and streaming jobs. In total, it should work for the eight different use cases.

Currently, it does not work if mappingErrorHandling is set to Halt.

Finally, we should add some tests for this functionality.

Bug with sliced extensions

Resource: https://hl7.org/fhir/R4/Patient.html
Endpoint:
http://localhost:8085/tofhir/fhir-definitions?q=elements&profile=https://www.medizininformatik-initiative.de/fhir/core/modul-person/StructureDefinition/PatientPseudonymisiert

onFHIR: REDCap common data model
toFHIR Mapping Repo: https://gitlab.srdc.com.tr/medic/redcap-integration-mapping
Mapping: Patient mapping in erker-mapping
Path: address.extension

There is missing data on sliced extensions:
Response:

The response that should be

Difference is isArray and dataTypes fields.

Wildcard for common unit conversions

We can use (* -> <source_unit>) , (<target_unit> -> <conversion_function>) kind of entries to define unit conversions regardless of the source code.

Duplicate slice name in simple structure definition

Profile Url: http://localhost:8085/tofhir/fhir-definitions?q=elements&profile=https://aiccelerate.eu/fhir/StructureDefinition/AIC-LabResultWithinSurgicalWorkflow
code.coding is sliced. There are duplicate slices with the same name labResultLoincCode as its elements.

Handle large csv files for terminology-services & mapping-contexts

Instead of transferring whole csv file during updates, we should be using pagination/chunking methods. Large csv files causes some problems such as unnecessary data transfering, slow response etc.

Support FHIR Server as a data source

Extend data source types with a FHIR Server option. It is particularly beneficial while adapting FHIR Resources to a new version.

Implement a caching mechanism to cache some values as key-value during a mapping to use them later in other mappings

e.g. In MIMIC-IV, in diagnosis table there is no date information to use as diagnosis date but we know which encounter it is done. So during 'admissions' to Encounter mapping if we can store the admission times per admission in cache we can use this during the diagnosis mapping to use it as the date of diagnosis.

Cache mechanism will be implemented base on Redis

Handling duplicate rows in the source data

Sometimes, there is no unique id in the data source. In this case, we can generate an id by combining and hashing the data in columns of that row.

In some scenarios, even if we combine the data in the columns, there may be duplicate rows in the data source. For those situations, we can add a condition to check duplicate ids for each batch (if not in the same batch, the resource with the same id is updated because we are using PUT) and eliminate the duplicate ones.
And we can log the duplicate rows for better identification.

Usage service of a schema

Return list of mappings and mapping jobs that use the spesific schema

Read HL7 mssages

This is rather a question/suggestion, and not really an issue.
Does toFHIR read HL7 messages?
If so, how to identify the fields and sub-fields in the Schema?
If not, is it planned in the future?

Comments and descriptions to the mappings.

It would have been nice to add textual comments, descriptions to the mappings.

Multiple data source problem in erroneous records

After executing a mapping with multiple data sources, only the erroneous records from the main data source are saved. Records from secondary schemas are not saved.
Also, reconsider using takeSample from spark in writeErroneousDataset method. Currently, we infer the schema by taking a sample from erroneous records assuming the schema of all records is the same.

Get schema by URL

Since our mapping models have only URL field about schema, a getSchemaByUrl service is needed.
Make sure no same URL is created more than once

Scheduling job logs cause error

Logs of scheduling jobs does not have the all fields that other job logs have. This causes error while reading the execution logs. Because filtering in log-server needs fields like jobId, projectId etc.

FHIR R5 warnings

After switching to the FHIR R5 release, the app started to display some warnings related to tokens/keywords. Similar warnings are also present in the onfhir app.

Schema error when path field contains space

Tofhir throws an error when the path field contains a space character.

Expand toFHIR spark read options for non-file system sources

toFHIR supports additional Spark options exclusively for file system data sources.
For example, you can find Spark data source options for various formats, such as CSV, at https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option.
This capability should be expanded to include other data source types, such as SQL and Kafka.

Capability to launch toFHIR server with configurable custom function libraries

Custom function libraries are required for some mappings. It would be nice to provide configurations to inject required libraries while launching the toFHIR server. It could be a kind of packing configuration as the libraries are expected to be provided as external dependencies. To be decided...

✨Initialize engine with existing mapping/schema repositories and external function libraries

The engine should be initialized directly with the data structures maintained by the repositories so that it would always be in sync with them. For instance, ExecutionService initialize a ToFhirEngine and FhirMappingJobManager only once by retrieving and caching the resources (mappings, schemas, etc.) from a pre-configured location. This means that it is not in sync with server repositories making updates on the resources.
The engine should also be able to get external function libraries without adding an explicit dependency. For example, in DT4H, we develop case-specific mapping functions, which are not appropriate to include in the main source base but to be provided as an external library.

tofhir-server does not handle environment variables

Although tofhir-engine allows us to use environment variables such as DATA_FOLDER_PATH and FHIR_REPO_URL inside a mapping job definition, we can not run such jobs using tofhir-server.

Extend mapping execution with the option to clear Spark's checkpoint directory

We can have a single configuration parameter such as clearCheckpointDirectory, which could be used to implement job and mapping level configurations to clear corresponding folders. The implementation could be as follows:

If no valid value provided for the parameter, the executions will continue from where they were last time.
The configuration could take comma separated values.
- If the configuration has only one value, which is "job", checkpoint directories for all of the mappings in the job will be deleted.
- If the configuration has other values, each value will be treated as a mapping url. Checkpoint directories of the corresponding mappings will be deleted.

Relates to #84

Same error logs should be generated when `mappingErrorHandling` paramater of a job is set `continue` or `halt`

When this setting is set to halt, an error log is generated for a resource that cannot be mapped. However, there is no such error log in case of continue

Internal Server Error while fetching simple structure definition of a profile

Base Resource: http://hl7.org/fhir/StructureDefinition/Observation
Profile: https://github.com/DataTools4Heart/common-data-model/blob/main/profiles/DT4H-Electrocardiograph.StructureDefinition.json
Endpoint:
http://localhost:8085/tofhir/fhir-definitions?q=elements&profile=https://datatools4heart.eu/fhir/StructureDefinition/DT4H-Electrocardiograph

onFHIR: DT4H common data model
toFHIR Mapping Repo: https://github.com/DataTools4Heart/data-ingestion-suite

Something wrong with slices on Observation.component. Error from toFHIR server:

Implement proxy for resource validation against onfhir

Currently, the frontend directly calls onfhir using the FHIR repository URL in the job definition, but this causes problems in the production environment. For example, the validation of a resource on the mapping testing page cannot call onfhir. To address this issue, we need to add proxy logic to tofhir-server that redirects the request to onfhir and returns the response to the frontend as it is.

Add more informative and fine-grained logs during file loading.

The following has been reported while using toFHIR to load considerably large files:

"While loading files into the labresults_csv folder, we've noticed a possible improvement. It might be beneficial to include an initial log entry, such as: "#timestamp #log_level ... #file_name file has been successfully loaded for ingestion." This could help us ascertain the success of the file loading process and provide assurance during the waiting period.

Recently, we encountered a situation where we were ingesting a considerably large file. The logging process took over 30 minutes to initiate, which led to a moment of uncertainty regarding the status of the data load."

Improve README by adding all possible configuration options.

application.conf is improved with several parameters.
Mapping Jobs can be configured with several parameters.

In tests, we need to include test jobs to examplify all available config parameters.

In application.conf, we need to ensure that all possible config parameters are written into the file.

README needs to be updated to describe the use of all configuration options.

About transactional operations on file system

In some cases, there may be a need to update/delete operation for two different files in the same transaction. For example, when updating a terminolosy system file, job files using that terminology service should be updated as well.
Since we use file system as repository, we have to handle transactional operations and implement some kind of rollback mechanism ourselves.

Removal or alias change of a schema from a mapping

When multiple schemas are added to a mapping and their source contexts are defined, removing a schema from the mapping does not automatically remove the associated source context in the job. This leads to conflicts between the remaining source contexts and schemas in the mapping. To resolve this, the related source context should be deleted from all jobs that include the mapping whenever a schema is removed from that mapping.
A similar issue is encountered on schema alias change .

Possible scenarios:

Schema alias renamed from schema management -> After updating mapping, update all FhirMappingTasks inside jobs using that mapping so that they match with updated alias.
A schema removed from schema management -> Find all jobs using that mapping and remove corresponding FhirMappingSourceContext from FhirMappingTask s
A new schema added from schema management -> Add FhirMappingSourceContext inside each FhirMappingTask with new schema alias to all jobs using that mapping.

srdc / tofhir Goto Github PK

tofhir's Issues

Recommend Projects

Recommend Topics

Recommend Org