Giter VIP home page Giter VIP logo

rfhb / ctrdata Goto Github PK

View Code? Open in Web Editor NEW
40.0 40.0 5.0 31.75 MB

Aggregate and analyse information on clinical trials from public registers

Home Page: https://rfhb.github.io/ctrdata/

License: Other

R 96.84% Shell 0.08% JavaScript 3.08%
clinical-data clinical-research clinical-studies clinical-trials cran ctgov database duckdb mongodb nodbi postgresql r r-package register rstats sqlite studies trial

ctrdata's People

Contributors

olivroy avatar rfhb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ctrdata's Issues

Recruitment fields for trial on ClinicalTrials.Gov not available through ctrdata?

Observation

Analysis

Solution

  • Inspecting the network traffic for this last URL, additional endpoints were found to be accessed that could be used for additionally retrieving data for historical versions of a trial record. However, these endpoints have to be accessed one-by-one for each trial and each version of a trial, likely impacting performance notably.
  • An implementation will be started for ctrdata function ctrLoadQueryIntoDb() to obtain an additional parameter, e.g. ctgov2history = {1,-1,n,n:m,TRUE}, which triggers the additional retrieval of a specific (first, last-but-on), a certain number, a range, or all historic versions for trials that are retrieved.
  • This applies only to CTGOV2 because no corresponding endpoint is available for other registers
  • The data model in ctrdata for a given trial could have an additional object history, e.g.
{
"_id":"NCT03594955", 
"record_last_import": "2024-04-24 20:21:22"
"register": "CTGOV2", 
"title": "The study's full title", 
"protocolSection": {"designModule": {"enrollmentInfo": {"count": 7, "type": "ACTUAL"}}}, 
...
"history": [
 {
 "history_version: {
   "version_number": 1, 
   "version_date": "2020-21-22 10:11:12"
 },
 "title": "The study's original title",
 "protocolSection": {"designModule": {"enrollmentInfo": {"count": 77, "type": "ESTIMATED"}}},
 ...
 }
]
}

Fields generated by ctrdata ("_id", "record_last_import", "register", "history", "history_version", "version_number", "version_date") follow snake_case formatting; other field names are as retrieved from the respective register.

Invalid numeric literal

Dear Ralf Herold,

When using your package to retrieve data from the CTIS registry, I encounter the following error message in step 3 (for some narrow queries the error does not occur). Would there be a workaround for this issue?

Best wishes,

Amos de Jong

ctrLoadQueryIntoDb(

  • queryterm = "",
  • register = "CTIS",
  • con = ctis_trials
  • )
  • Found search query from CTIS:
  • Checking trials in EUCTR...
    (1/5) Downloading trials list . . found 400 trials
    (2/5) Downloading and processing part I and parts II... (estimate: 60 Mb)
    Download status: 400 done; 0 in progress. Total size: 58.97 Mb (100%)... done!

    (3/5) Downloading and processing additional data:
    publicevents, summary, layperson, csr, cm, inspections, publicevaluation
    Error: Invalid numeric literal at line 1, column 38668542

Does not work when proxy is used to connect to the internet

Main functions in the package use internet connections for retrieval of trial information over the internet. These functions do not work and raise a time out error if internet connections have to use a proxy. This is usually the case in corporate environments. Better support for automated detection and handling should be included in the ctrdata package (which uses package curl, which has functions for handling proxies but does not do this automatically).

dbGetFieldsIntoDf() empty for fields with multibyte characters using src_sqlite() under Windows

The issue is noticed when dbGetFieldsIntoDf() does not return any data for a specified field, and this occurs with a database connection using src_sqlite(), under Windows only, when the field's JSON data contain multibyte characters. This was fixed with ropensci/nodbi#33.

Workarounds, for the moment:

  • use devtools::install_github("rfhbi/nodbi@sqlite-fix-multibyte").
  • use a Mongo instead of an SQLite connection
  • use a different operating system

Variable fields in EUCTR are not always retrieved

Some fields in EUCTR change their name depending on certain characteristics. For example, E.8.4 can have two names; "The trial involves multiple sites in the Member State concerned" or "Will this trial be conducted at multiple sites globally?", depending on whether the trial record pertains to countries outside the European Economic Area (hence, a "/3RD" protocol).
Apparently, the package is not retrieving the field E.8.4 if it falls under the '/3RD' protocol. Is there a potential fix for this issue?
Thank you for your valuable work!

invalid numeric literal when accumulating information in database

This concerns loading certain clinical trial information from CTIS.

(3/5) Downloading and processing additional data:
publicevents, summary, layperson, csr, cm, inspections, publicevaluation
Error: Invalid numeric literal at line 1, column 37048550

9. jqr_feed(program, json)
8. writeLines(buf, out, useBytes = TRUE)
7. callback(jqr_feed(program, json))
6. jqr.connection(x, query = query, flags = flags, out = out)
5. jq.connection(file(fPartIPartsIINdjson), " { ctNumber: .ctNumber, applicationIds: [ .applications[] | .id ] } ",
flags = jqr::jq_flags(pretty = FALSE), out = fApplicationsJson)
4. jqr::jq(file(fPartIPartsIINdjson), " { ctNumber: .ctNumber, applicationIds: [ .applications[] | .id ] } ",
flags = jqr::jq_flags(pretty = FALSE), out = fApplicationsJson) at ctrLoadQueryIntoDbCtis.R#373
3. (function (queryterm = queryterm, register, euctrresults, euctrresultshistory,
documents.path, documents.regexp, annotation.text, annotation.mode,
only.count, con, verbose, queryupdateterm) { ...

Error: Invalid numeric literal at line 1, column 913610

Thanks for the report. Import of NDJSON fails:

library(ctrdata)
q <- "https://www.clinicaltrialsregister.eu/ctr-search/search?query=&dateFrom=2023-01-14&dateTo=2024-01-24"
db <- nodbi::src_sqlite(dbname = "sqlite_file.sql", collection = "dbe")
ctrLoadQueryIntoDb(queryterm = q, con = db)

Returns:

* Checking trials in EUCTR...
Retrieved overview, multiple records of 563 trial(s) from 29 page(s) to be downloaded (estimate: 30 MB)
(1/3) Downloading trials...
Note: register server cannot compress data, transfer takes longer (estimate: 100 s)
Download status: 29 done; 0 in progress. Total size: 45.60 Mb (100%)... done!             
(2/3) Converting to NDJSON (estimate: 10 s)...
(3/3) Importing records into database…
Error: Invalid numeric literal at line 1, column 913610

Certain result documents are causing an error: "No such file or directory"

Please see the below code to reproduce:

 dbc <- nodbi::src_mongo(collection = "test2", db = "db")
 q <- "https://www.clinicaltrialsregister.eu/ctr-search/search?query=2014-001203-50"
 ctrLoadQueryIntoDb(
   queryterm = q,
   con = dbc,
   euctrresults = TRUE,
   documents.path = "d",
   verbose = TRUE,
 )
  • Found search query from EUCTR: query=2014-001203-50
    Checking helper binaries: done
  • Checking trials in EUCTR...
    DEBUG: queryterm is https://www.clinicaltrialsregister.eu/ctr-search/search?query=2014-001203-50
    Retrieved overview, multiple records of 1 trial(s) from 1 page(s) to be downloaded (estimate: 0.05 MB)
    Created directory d
    (1/3) Downloading trials...
    Note: register server cannot compress data, transfer takes longer, about 0.5s per trial
    Download status: 1 done; 0 in progress. Total size: 26.00 Kb (100%)... done!
    (2/3) Converting to JSON, 1 records converted
    DEBUG: c:\cygwin\bin\bash.exe --noprofile --norc --noediting -c "PATH=/usr/local/bin:/usr/bin; "/cygdrive/C/Users/ES-PHI1/AppData/Local/R/WIN-LI1/4.2/ctrdata/exec/EUCTR21.SH" /cygdrive/C/Users/ES-PHI1/AppData/Local/Temp/RTMPKF1/CTRDAT4"
    (3/3) Importing JSON records into database...
    DEBUG: C:\Users\ES-Philip\AppData\Local\Temp\Rtmpkf8Xvs\ctrDATA4838515e7a2b
    = Imported or updated 1 records on 1 trial(s)
  • Checking results if available from EUCTR for 1 trials:
    (1/4) Downloading and extracting results (. = data, F = file[s] and data, x = none):
    Download status: 1 done; 0 in progress. Total size: 992.96 Kb (100%)... done!
    Error in utils::unzip(zipfile = f, exdir = tempDir) :
    cannot open file 'C:/Users/ES-Philip/AppData/Local/Temp/Rtmpkf8Xvs/ctrDATA4838515e7a2b/Cartier.2019.Repeated Full-Face Aesthetic Combination Treatment With AbobotulinumtoxinA, Hyaluronic Acid Filler, and Skin-Boosting Hyaluronic Acid After Monotherapy With AbobotulinumtoxinA or Hyaluronic Acid Filler.pdf': No such file or directory

I believe it is related to this code in main.R

          tmp <- utils::unzip(
            zipfile = f,
            exdir = tempDir)
          if (is.null(tmp)) return(NULL)

Some EUCTR records not properly transformed to JSON

When an IMP is declared to exist but is not presented with any information, the JSON array of IMPs is not properly constructed and left without closing brackets. Subsequently, none or only some of the records are imported into mongodb.

Example code doesn't work

I am getting this:

Error: ctrGetQueryUrlFromBrowser(): 'url' and / or 'register' is not a single character string.

while following the example code

ctrOpenSearchPagesInBrowser(input = "cancer&age=under-18",register="EUCTR")
q <- ctrGetQueryUrlFromBrowser()
db <- nodbi::src_sqlite(
dbname = "sqlite_file.sql",
collection = "test"
)

ctrLoadQueryIntoDb(
queryterm = q,
only.count = TRUE,
con = db
)$n

I have windows 10 64 bit, R4.0.3.

failing if path to package contains space

ctrdata::ctrLoadQueryIntoDb() failed when package ctrdata is install in a path that contains spaces in its name, aborting with an error indicating a file was not found.

Shiny app

By any chance, do you know of any shiny app that utilizes this package? While attempting to create one, I found it works perfectly on my local machine, but it fails if published shinyapps.io.
The error appears to originate from ctrLoadQueryIntoDb(), so I suspect this could be linked to the necessary command-line tools.

dbFindIdsUniqueTrials() may fail if fields missing in database

dbFindIdsUniqueTrials() obtains by calling dbGetVariablesIntoDf() from the database the fields needed for deduplication. dbGetVariablesIntoDf() would stop if a field was missing from all records in the database, such as may occur with few trial records. This led to dbFindIdsUniqueTrials() failing, silently, and no identifying duplicates.

ctrLoadQueryIntoDb works on windows but not ubuntu

ctrLoadQueryIntoDb works on windows machine when searching EUCTR, but fails when I try the code on a ubuntu server with the following error.

  • Found search query from EUCTR: query=2020-001921-30+OR+2020-001739-28+OR+2020-001891-14+OR+2020-001823-15
  • Checking trials in EUCTR: Error : Host https://www.clinicaltrialsregister.eu/ does not respond, cannot continue.

Similar searches with other queries don't work. Searching CTGOV works fine on both machines. I've updated R and all packages with no success, and tried the dev version of ctr data. Other API calls work from the ubuntu machine, (as does using the CTGOV registry). I've checked sed, php, cat, perl are installed on the ubuntu machine, no luck. I'm using a sqlite database if that makes any difference.

Any thoughts on what might be the issue?

firstreceived_results_date is NA in non-English locale

Reproducible example:
Sys.setlocale("LC_TIME", "de_DE"); ctrdata::ctrLoadQueryIntoDb(queryterm = "2012-004228-40", register = "EUCTR", euctrresults = TRUE
leads to field "firstreceived_results_date" being an empty string in the mongo data base collection.

Fixed with commit 4b02df6.

CTGOV changed website layout

ClinicalTrials.Gov just made the previous beta website layout now the default layout. Links to the previous website start now with https://classic.clinicaltrials.gov/. Search parameters are different. ctrdata is being updated to handle the situation.

Trial records not imported into database - stack error

Users may have encountered errors such as the following:

Error : C stack usage 8329845 is too close to the limit or

Error in rcpp_read_ndjson_file(normalizePath(ndjson), get_download_mode(), Not compatible with STRSXP

when using package ctrdata function ctrLoadQueryIntoDb() executing the step (3/3) Importing JSON records into database.... The cause may be highly complex JSON data that need to be processed and that exceed the available stack.

Potential solutions increase the stack size and can be found here for

The occurence of this issue cannot be prevented by code in package ctrdata because it depends on the complexity of data in a trial record as retrieved from one of the register. With the suggestions above, users may configure their operating system to handle such complex datsa.

Error message printed with ctrLoadQueryIntoDb()

This issue is opened to collect and document issues with content from registers that has errors and where package ctrdata cannot mitigate the errors. Commands and output are in the next sections.

db <- nodbi::src_sqlite(dbname = "db.sqlite", collection = "errorTrials")
ctrdata::ctrLoadQueryIntoDb(queryterm = "2020-000876-40", register = "EUCTR", verbose = TRUE, con = db)

The messages to the user from the above command are ending with,

Error: lexical error: invalid char in json text.
                           { "_id": "{"e521_timepoints_of_evaluation_o
                     (right here) ------^

The published trial record included in section "E.5.2 Secondary end point(s)" the following content, which since opening the issue has been corrected in the register:

- Toxicity
- Compliance to treatment
EudraCT Number: 2020-000876-40
Sponsor Protocol Number: IJB-REGINA-2020
ClinicalTrials.gov Number: NCT04503694
[...]

The field contains in the register specific identifiers that are not expected in a field and which are used by ctrdata for splitting downloaded information on a series of trials. This error in content in the register is not readily fixable by ctrdata.

Loading to MongoDB fail

Hi,

This is my first time using r and mongo so please for give me if there's something I've missed.
I get error when importing into mongoDB, object; 'ids' not found.

e.g.
ctrLoadQueryIntoDb(queryterm = q)

  • Downloading trials from EUCTR:
    Retrieved overview, 1 trial(s) from 1 page(s) to be downloaded.
    (1/3) Downloading trials (max. 10 page[s] in parallel):
    p 1-1 .

"Search should return 3 results in this example"

(2/3) Converting to JSON ...
(3/3) Importing JSON into mongoDB ...
Error in dbCTRLoadJSONFiles(dir = tempDir, mongo = mongo) :
object 'ids' not found
In addition: Warning message:
In system(euctr2json, intern = TRUE) :
running command 'cmd.exe /c c:\cygwin\bin\bash.exe

I've taken a look at the dbCTRLoadJSONFiles script to see if I could make any sense if the issue but I'm too novice at this point. https://rdrr.io/cran/ctrdata/src/R/main.R#sym-dbCTRLoadJSONFiles

Kind regards,

Joe

====================================================
I'm running under the following setup;

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] mongolite_2.0.1 ctrdata_0.18

loaded via a namespace (and not attached):
[1] httr_1.4.0 clipr_0.5.0 compiler_3.5.3 magrittr_1.5
[5] R6_2.4.0 tools_3.5.3 curl_3.3 Rcpp_1.0.1
[9] xml2_1.2.0 jsonlite_1.6 rvest_0.3.3 openssl_1.3
[13] askpass_1.1

===
CYGWIN_NT-10.0 DESKTOP-12J62K2 3.0.6(0.338/5/3) 2019-04-06 16:18 x86_64
mongodb-win32-x86_64-2008plus-ssl-4.0.8-signed

Manual input of trial IDs instead of search functions

I have a specific requirement for my workflow: i would need to manual input the trial IDs to populate the database, instead of using the package's built-in functions to search the websites. I didn't find how to do this in the available documentation. Is it possible to do that with this package?

CTGOV2 API call does not filter by study sponsor

Consider the following example. With the old API, the filter is respected, whereas with the new one, all studies would be downloaded.

library("tibble")
library("ctrdata")

q1 <- tibble(`query-term` = "spons=Pfizer", `query-register` = "CTGOV")

ctrLoadQueryIntoDb(
  queryterm = q1,
  only.count = TRUE
)$n
# 5639

q2 <- tibble(`query-term` = "spons=Pfizer", `query-register` = "CTGOV2")

ctrLoadQueryIntoDb(
  queryterm = q2,
  only.count = TRUE
)$n
# 470145

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.