Giter VIP home page Giter VIP logo

tofhir's Issues

Problem with reading archived logs

When the log file grows, old logs are archived as a zip file. When there are logs belonging to the same exection in the archived logs and the last log file read, the logs could be splitted and the log-server gives an error.

reload command sometimes does not work

reload command of tofhir-engine sometimes work, sometimes it does not. I simply update a mapping and run reload command, then I try to run it, the old mapping is executed.

Execution Error Handling

  • If any error occurs during the execution process, the job should stop or continue according to the error handling option.
  • Implement for non-stream data sources(file and SQL).
  • Implement for stream data sources(Kafka).
  • The errors that are not row-based (malformed URL, wrong file extent, etc.) should be appropriately handled.
  • When the csv file is not found, an appropriate error should be logged.
  • When the sink url is not reachable, an appropriate error should be logged.
  • When the csv columns and schema columns do not match, an appropriate error should be logged.

Bugs: FhirDefinitionService when trying coverChild mappings

Resource: https://hl7.org/fhir/R4/QuestionnaireResponse.html
Endpoint:
http://localhost:8085/tofhir/fhir-definitions?q=elements&profile=http://hl7.org/fhir/StructureDefinition/QuestionnaireResponse

onFHIR: Default configurations without common data model
toFHIR Mapping Repo: https://gitlab.srdc.com.tr/medic/coverchild-integrations

  • According to the FHIR website, there should be a value[x] element under item.answer object. But it does not exist in the response from toFHIR API.

image

image

  • Also, according to the FHIR docs, there should be an object with path item.answer.item. But it does not exist.
  • Paths of elements under item.item are wrong. For example, marked path should be item.item.linkId instead of item.linkId

image

In the initialization of tofhir-server, some structure definitions are not validated

If tofhir-server has definitions-root-urls = ["http://hl7.org/fhir/"] in the application.conf file, we get the following exception:

Exception in thread "main" io.onfhir.exception.InitializationException: Some of the given infrastructure resources (http://hl7.org/fhir/StructureDefinition/clinicaldocument,http://hl7.org/fhir/StructureDefinition/Composition,http://hl7.org/fhir/StructureDefinition/catalog) of type StructureDefinition does not conform to base FHIR specification! http://hl7.org/fhir/StructureDefinition/clinicaldocument :: JObject(List((severity,JString(error)), (code,JString(invalid)), (diagnostics,JString(Invalid value 'http://terminology.hl7.org/ValueSet/v3-ConfidentialityClassification|2014-03-26' for FHIR primitive type 'canonical'!)), (expression,JArray(List(JString(snapshot.element[19].binding.valueSet)))))),JObject(List((severity,JString(warning)), (code,JString(invalid)), (diagnostics,JString(Constraint 'sdf-0' is not satisfied for the given value! Constraint Description: 'Name should be usable as an identifier for the module by machine processing applications such as code generation'. FHIR Path expression: 'name.matches('[A-Z]([A-Za-z0-9_]){0,254}')')), (expression,JArray(List(JString($this))))))
http://hl7.org/fhir/StructureDefinition/Composition :: JObject(List((severity,JString(error)), (code,JString(invalid)), (diagnostics,JString(Invalid value 'http://terminology.hl7.org/ValueSet/v3-ConfidentialityClassification|2014-03-26' for FHIR primitive type 'canonical'!)), (expression,JArray(List(JString(snapshot.element[18].binding.valueSet)))))),JObject(List((severity,JString(error)), (code,JString(invalid)), (diagnostics,JString(Invalid value 'http://terminology.hl7.org/ValueSet/v3-ConfidentialityClassification|2014-03-26' for FHIR primitive type 'canonical'!)), (expression,JArray(List(JString(differential.element[10].binding.valueSet))))))
http://hl7.org/fhir/StructureDefinition/catalog :: JObject(List((severity,JString(error)), (code,JString(invalid)), (diagnostics,JString(Invalid value 'http://terminology.hl7.org/ValueSet/v3-ConfidentialityClassification|2014-03-26' for FHIR primitive type 'canonical'!)), (expression,JArray(List(JString(snapshot.element[19].binding.valueSet)))))),JObject(List((severity,JString(warning)), (code,JString(invalid)), (diagnostics,JString(Constraint 'sdf-0' is not satisfied for the given value! Constraint Description: 'Name should be usable as an identifier for the module by machine processing applications such as code generation'. FHIR Path expression: 'name.matches('[A-Z]([A-Za-z0-9_]){0,254}')')), (expression,JArray(List(JString($this))))))
	at io.onfhir.config.BaseFhirConfigurator.validateGivenInfrastructureResources(BaseFhirConfigurator.scala:190)
	at io.onfhir.config.BaseFhirConfigurator.initializePlatform(BaseFhirConfigurator.scala:85)
	at io.tofhir.server.service.FhirDefinitionsService.<init>(FhirDefinitionsService.scala:55)
	at io.tofhir.server.endpoint.FhirDefinitionsEndpoint.<init>(FhirDefinitionsEndpoint.scala:16)
	at io.tofhir.server.endpoint.ToFhirServerEndpoint.<init>(ToFhirServerEndpoint.scala:36)
	at io.tofhir.server.ToFhirServer$.start(ToFhirServer.scala:15)
	at io.tofhir.server.Boot$.delayedEndpoint$io$tofhir$server$Boot$1(Boot.scala:4)
	at io.tofhir.server.Boot$delayedInit$body.apply(Boot.scala:3)

Refactor custom exceptions to introduce a hierarchy

Currently, we have the following exceptions:

  • EngineInitializationException extends Exception
  • FhirMappingException extends Exception
  • FhirMappingInvalidResourceException extends Exception

We'll implement an umbrella exception class for the whole system and the others will extend this for specific things for example: FhirMappingException (exceptions during mapping execution), FhirWriteException (exceptions while communicating with FHIR Server), FhirSourceReadException (exceptions while reading source data) etc.

SimpleStructureDefinitionService: Bug with sliced element

Profile: https://aiccelerate.eu/fhir/StructureDefinition/AIC-Practitioner
Search for qualification element in json response returned by SimpleStructureDefinitionService. It is sliced and has two elements (slices) as children: mainQualification and No Slice.
There is a problem with No Slice section. No Slice should have 4 elements as its children but it has 1 instead. It seems service creates an extra element between its children and No Slice element. See the image below:

Capture

converting time-series data to fhir resource

In case of time-series type of source data, multiple rows should be mapped to the one fhir resource.
As in example, each number in the data field may be representing different row on the source.
Note: Is this really a requirement?

  ...
   "valueSampledData": {
      "origin": {
         "value": "0.0",
         "unit": "mg/dl",
         "system": "http://unitsofmeasure.org",
         "code": "mg/dl"
      },
      "period": "512.0",
      "dimensions": "1",
      "data": "99 103 108 114 121 128 132 137 142 148 157 192 197 201 205 208 206 198 207 171 157 143 128 115 106 103 107 103 110 122 138 154 165 170 176 184 188 188 194 198 208 211 215 212 213 216 220 225 228 231 238 239 240 244 249 252 255 256 257 257 257 254 255 258 259 260 254 244 230 214 198 185 177 180 173 174 174 176 177 176 176 174 172 170 167 165 164 162 162 161 162 159 156 153 152 148 141 143 144 147 148 146 144 144 142 142 142 141 139 137 132 130 130 125 121 105 102 100 97 95 92 90 84 84 84 82 79 77 76 74 74 75 73 73 76 77 78 78 79 79 80"
   },
  ...

Implement a mechanism to patch existing resources based on mapped data via FHIR Patch interaction

Enable users to map certain information to FHIR Path (http://hl7.org/fhir/fhirpatch.html) or JSON Patch(https://tools.ietf.org/html/rfc6902.) content which then can be used to patch a specific existing record with supplied values by executing FHIR Patch interaction.

e.g. Add a condition to EpisodeOfCare as the main diagnosis as a reference via FHIR Patch
{
"expression": {
"name": "result",
"language": "application/fhir-template+json",
"value": [
{
"op": "add",
"path": "/diagnosis/-",
"value": {
"condition": {
"reference": "Condition/{{conditionId}}"
}
}
}
]
},
"interaction": "json-patch",
"rid": "{{episodeId}}"
}

Handle job deletion if its running

If a running job is wanted to be deleted, either

  • we should prevent user deleting by showing an warninig message
  • or shut down the execution of that job silently in the backend

Reload mappings

It would be nice if mappings referred by a mapping job are re-fetched from the file system when the mapping jobs is loaded again

Execution Manager

This issue describes how each execution of mapping jobs will be managed.

  • Jobs should be run in an asynchronous manner. If the job to be executed is valid, the execution should start in the background and job submission result should return to the client immediately. If the submitted job is not valid, an appropriate error message should be returned.
  • An ExecutionManager component should keep track of the active (i.e. running) jobs. This component should provide an API to stop running executions.
  • It should also be possible to start and stop individual mapping tasks. This would be required when a mapping is updated and it should be restarted.
  • There should be only one execution of a mapping task at the same time.
  • For file system streaming, processed files should be archived; mappings with errors should be aggregated in a separate file. This should be configurable.

Scenarios:

  1. System restart / crash
  • Kafka stream
    • Clear checkpoints config
      • true -> Existing checkpoint directory for the job will be deleted. Earliest and latest configs will apply as expected
      • false -> Records will be read as of the last offset
  • File system stream
    • Clear checkpoints config
      • true -> Existing checkpoint directory for the job will be deleted. Existing files in the data source directory will be reprocessed. Users will need to put already processed files again to the data source directory monitored by Spark.
      • false -> Processing will continue from the last read point.
  1. Mapping update (only the updated mapping task will be restarted)
  • Kafka stream
    • Clear checkpoints config
      • true -> Existing checkpoint directory for the mapping will be deleted. Earliest and latest configs will apply as expected. (Note that if the config is set to latest, old records won't be affected by the mapping updates)
      • false -> Records will be read as of the last offset. Old records won't be affected from the mapping updates.
  • File system stream
    • Clear checkpoints config
      • true -> Existing checkpoint directory for the mapping will be deleted. Existing files in the data source directory will be reprocessed. Users will need to put already processed files again to the data source directory monitored by Spark.
      • false -> Processing will continue from the last read point. Already processed records won't be affected from the mapping updates.

Technical specs:

  • Checkpoints should be per job (not per execution to be able to continue execution of a job in case of a crash) and per mapping task included in the job
  • There should be a configuration to clear Spark's checkpoint directory for a job as a whole and for individual mappings.

Sub-issues:

  • Interactive CLI should be able to run, stop and list streaming queries
  • Running status information should be included in the execution summary
  • Simultaneous execution of the same mapping should be prevented
  • Checkpoint resetting
  • Long-running batch should be tracked
  • Services returning logs does not have spesific models. A new model needs to be built to cover all types of logs.
  • Currently, a startTime (the time streaming started) and endTime (the time streaming ended) are calculated in the frontend for the streaming mapping tasks. We need to calculate them and create a well-formatted response in the backend instead.
  • If there is no log in batch mappings, we cannot show the mapping URL on the execution detail page. Adding the started log may be the solution for this.
  • If there is no log for streaming mappings, we do not show the mapping URL. Again, adding a mapping started log for that mapping task could be a solution.
  • While running file streaming mapping task, if a data source (csv) gets invalid fhir resource error while writing to the onfhir server, that execution stops reading new data source files which means streaming execution stops.

Available Bugs

  • When running a file streaming job, if new data source file is put in configured streaming folder, mapping task count is increased for that execution although the same mapping task is used.

toFHIR consumes too much resources

While running a streaming mapping job even if it does not process anything i.e. file, it consumes too much CPU and memory. Is there any Spark configuration to resolve this problem ?

Make sure that FhirMappingJobResult log is printed for all the combination of error handling settings

We can configure the error handling settings of a mapping job using mappingErrorHandling and sinkSettings.errorHandling options.
This gives us the following combinations:

  • continue | continue
  • continue | halt
  • halt | continue
  • halt | halt

Further, we should test it with batch and streaming jobs. In total, it should work for the eight different use cases.

Currently, it does not work if mappingErrorHandling is set to Halt.

Finally, we should add some tests for this functionality.

Bug with sliced extensions

Resource: https://hl7.org/fhir/R4/Patient.html
Endpoint:
http://localhost:8085/tofhir/fhir-definitions?q=elements&profile=https://www.medizininformatik-initiative.de/fhir/core/modul-person/StructureDefinition/PatientPseudonymisiert

onFHIR: REDCap common data model
toFHIR Mapping Repo: https://gitlab.srdc.com.tr/medic/redcap-integration-mapping
Mapping: Patient mapping in erker-mapping
Path: address.extension

There is missing data on sliced extensions:
Response:
wrong

The response that should be
correct

Difference is isArray and dataTypes fields.

Wildcard for common unit conversions

We can use (* -> <source_unit>) , (<target_unit> -> <conversion_function>) kind of entries to define unit conversions regardless of the source code.

Handling duplicate rows in the source data

Sometimes, there is no unique id in the data source. In this case, we can generate an id by combining and hashing the data in columns of that row.

In some scenarios, even if we combine the data in the columns, there may be duplicate rows in the data source. For those situations, we can add a condition to check duplicate ids for each batch (if not in the same batch, the resource with the same id is updated because we are using PUT) and eliminate the duplicate ones.
And we can log the duplicate rows for better identification.

Read HL7 mssages

This is rather a question/suggestion, and not really an issue.
Does toFHIR read HL7 messages?
If so, how to identify the fields and sub-fields in the Schema?
If not, is it planned in the future?

Multiple data source problem in erroneous records

  • After executing a mapping with multiple data sources, only the erroneous records from the main data source are saved. Records from secondary schemas are not saved.
  • Also, reconsider using takeSample from spark in writeErroneousDataset method. Currently, we infer the schema by taking a sample from erroneous records assuming the schema of all records is the same.

Get schema by URL

Since our mapping models have only URL field about schema, a getSchemaByUrl service is needed.
Make sure no same URL is created more than once

Scheduling job logs cause error

Logs of scheduling jobs does not have the all fields that other job logs have. This causes error while reading the execution logs. Because filtering in log-server needs fields like jobId, projectId etc.

FHIR R5 warnings

After switching to the FHIR R5 release, the app started to display some warnings related to tokens/keywords. Similar warnings are also present in the onfhir app.

image

✨Initialize engine with existing mapping/schema repositories and external function libraries

  • The engine should be initialized directly with the data structures maintained by the repositories so that it would always be in sync with them. For instance, ExecutionService initialize a ToFhirEngine and FhirMappingJobManager only once by retrieving and caching the resources (mappings, schemas, etc.) from a pre-configured location. This means that it is not in sync with server repositories making updates on the resources.
  • The engine should also be able to get external function libraries without adding an explicit dependency. For example, in DT4H, we develop case-specific mapping functions, which are not appropriate to include in the main source base but to be provided as an external library.

Extend mapping execution with the option to clear Spark's checkpoint directory

We can have a single configuration parameter such as clearCheckpointDirectory, which could be used to implement job and mapping level configurations to clear corresponding folders. The implementation could be as follows:

  • If no valid value provided for the parameter, the executions will continue from where they were last time.
  • The configuration could take comma separated values.
    • If the configuration has only one value, which is "job", checkpoint directories for all of the mappings in the job will be deleted.
    • If the configuration has other values, each value will be treated as a mapping url. Checkpoint directories of the corresponding mappings will be deleted.

Relates to #84

Internal Server Error while fetching simple structure definition of a profile

Implement proxy for resource validation against onfhir

Currently, the frontend directly calls onfhir using the FHIR repository URL in the job definition, but this causes problems in the production environment. For example, the validation of a resource on the mapping testing page cannot call onfhir. To address this issue, we need to add proxy logic to tofhir-server that redirects the request to onfhir and returns the response to the frontend as it is.

Add more informative and fine-grained logs during file loading.

The following has been reported while using toFHIR to load considerably large files:

"While loading files into the labresults_csv folder, we've noticed a possible improvement. It might be beneficial to include an initial log entry, such as: "#timestamp #log_level ... #file_name file has been successfully loaded for ingestion." This could help us ascertain the success of the file loading process and provide assurance during the waiting period.

Recently, we encountered a situation where we were ingesting a considerably large file. The logging process took over 30 minutes to initiate, which led to a moment of uncertainty regarding the status of the data load."

Improve README by adding all possible configuration options.

  • application.conf is improved with several parameters.
  • Mapping Jobs can be configured with several parameters.

In tests, we need to include test jobs to examplify all available config parameters.

In application.conf, we need to ensure that all possible config parameters are written into the file.

README needs to be updated to describe the use of all configuration options.

About transactional operations on file system

In some cases, there may be a need to update/delete operation for two different files in the same transaction. For example, when updating a terminolosy system file, job files using that terminology service should be updated as well.
Since we use file system as repository, we have to handle transactional operations and implement some kind of rollback mechanism ourselves.

Removal or alias change of a schema from a mapping

When multiple schemas are added to a mapping and their source contexts are defined, removing a schema from the mapping does not automatically remove the associated source context in the job. This leads to conflicts between the remaining source contexts and schemas in the mapping. To resolve this, the related source context should be deleted from all jobs that include the mapping whenever a schema is removed from that mapping.
A similar issue is encountered on schema alias change .

Possible scenarios:

  • Schema alias renamed from schema management -> After updating mapping, update all FhirMappingTasks inside jobs using that mapping so that they match with updated alias.
  • A schema removed from schema management -> Find all jobs using that mapping and remove corresponding FhirMappingSourceContext from FhirMappingTask s
  • A new schema added from schema management -> Add FhirMappingSourceContext inside each FhirMappingTask with new schema alias to all jobs using that mapping.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.