rfhb / ctrdata Goto Github PK
View Code? Open in Web Editor NEWAggregate and analyse information on clinical trials from public registers
Home Page: https://rfhb.github.io/ctrdata/
License: Other
Aggregate and analyse information on clinical trials from public registers
Home Page: https://rfhb.github.io/ctrdata/
License: Other
Observation
ctrLoadQueryIntoDb()
from package ctrdata
for this trial.Analysis
ctrdata
accesses the endpoint /studies
of the API specified at https://www.clinicaltrials.gov/data-api/api. This endpoint provides (only) the latest version of trial data that are available. For trials that started or completed enrollment (recruitment), these API endpoint data include only the actual number enrolled.Solution
ctrdata
function ctrLoadQueryIntoDb()
to obtain an additional parameter, e.g. ctgov2history = {1,-1,n,n:m,TRUE}
, which triggers the additional retrieval of a specific (first, last-but-on), a certain number, a range, or all historic versions for trials that are retrieved.CTGOV2
because no corresponding endpoint is available for other registersctrdata
for a given trial could have an additional object history
, e.g.{
"_id":"NCT03594955",
"record_last_import": "2024-04-24 20:21:22"
"register": "CTGOV2",
"title": "The study's full title",
"protocolSection": {"designModule": {"enrollmentInfo": {"count": 7, "type": "ACTUAL"}}},
...
"history": [
{
"history_version: {
"version_number": 1,
"version_date": "2020-21-22 10:11:12"
},
"title": "The study's original title",
"protocolSection": {"designModule": {"enrollmentInfo": {"count": 77, "type": "ESTIMATED"}}},
...
}
]
}
Fields generated by ctrdata
("_id", "record_last_import", "register", "history", "history_version", "version_number", "version_date") follow snake_case
formatting; other field names are as retrieved from the respective register.
The reason is because the file name is too long to be unzipped. Since the pdf is named after the study the study name may be very very long.
Windows 7, 64 Bit
Error in if (tmp$AutoConfigURL != "") { : argument is of length zero
Dear Ralf Herold,
When using your package to retrieve data from the CTIS registry, I encounter the following error message in step 3 (for some narrow queries the error does not occur). Would there be a workaround for this issue?
Best wishes,
Amos de Jong
ctrLoadQueryIntoDb(
Main functions in the package use internet connections for retrieval of trial information over the internet. These functions do not work and raise a time out error if internet connections have to use a proxy. This is usually the case in corporate environments. Better support for automated detection and handling should be included in the ctrdata package (which uses package curl, which has functions for handling proxies but does not do this automatically).
The issue is noticed when dbGetFieldsIntoDf()
does not return any data for a specified field, and this occurs with a database connection using src_sqlite()
, under Windows only, when the field's JSON data contain multibyte characters. This was fixed with ropensci/nodbi#33.
Workarounds, for the moment:
devtools::install_github("rfhbi/nodbi@sqlite-fix-multibyte")
.Some fields in EUCTR change their name depending on certain characteristics. For example, E.8.4 can have two names; "The trial involves multiple sites in the Member State concerned" or "Will this trial be conducted at multiple sites globally?", depending on whether the trial record pertains to countries outside the European Economic Area (hence, a "/3RD" protocol).
Apparently, the package is not retrieving the field E.8.4 if it falls under the '/3RD' protocol. Is there a potential fix for this issue?
Thank you for your valuable work!
Hi!
Do you have any preference regarding how this package should be cited?
This concerns loading certain clinical trial information from CTIS.
(3/5) Downloading and processing additional data:
publicevents, summary, layperson, csr, cm, inspections, publicevaluation
Error: Invalid numeric literal at line 1, column 37048550
9. jqr_feed(program, json)
8. writeLines(buf, out, useBytes = TRUE)
7. callback(jqr_feed(program, json))
6. jqr.connection(x, query = query, flags = flags, out = out)
5. jq.connection(file(fPartIPartsIINdjson), " { ctNumber: .ctNumber, applicationIds: [ .applications[] | .id ] } ",
flags = jqr::jq_flags(pretty = FALSE), out = fApplicationsJson)
4. jqr::jq(file(fPartIPartsIINdjson), " { ctNumber: .ctNumber, applicationIds: [ .applications[] | .id ] } ",
flags = jqr::jq_flags(pretty = FALSE), out = fApplicationsJson) at ctrLoadQueryIntoDbCtis.R#373
3. (function (queryterm = queryterm, register, euctrresults, euctrresultshistory,
documents.path, documents.regexp, annotation.text, annotation.mode,
only.count, con, verbose, queryupdateterm) { ...
As far as I can tell, this is not supported, correct?
Some JSON trial data have mixed types of values of the same key in different records, schematic example: [{"a_key": "a text value"},{"a_key": ["text one", "text two"]}]
. This leads to errors or no returned data when using dbGetFieldsIntoDf()
. This was addressed with PR ropensci/nodbi#34.
Until a new release of nodbi
, please
devtools::install_github("ropensci/nodbi")
.Thanks for the report. Import of NDJSON fails:
library(ctrdata)
q <- "https://www.clinicaltrialsregister.eu/ctr-search/search?query=&dateFrom=2023-01-14&dateTo=2024-01-24"
db <- nodbi::src_sqlite(dbname = "sqlite_file.sql", collection = "dbe")
ctrLoadQueryIntoDb(queryterm = q, con = db)
Returns:
* Checking trials in EUCTR...
Retrieved overview, multiple records of 563 trial(s) from 29 page(s) to be downloaded (estimate: 30 MB)
(1/3) Downloading trials...
Note: register server cannot compress data, transfer takes longer (estimate: 100 s)
Download status: 29 done; 0 in progress. Total size: 45.60 Mb (100%)... done!
(2/3) Converting to NDJSON (estimate: 10 s)...
(3/3) Importing records into database…
Error: Invalid numeric literal at line 1, column 913610
When using dbFindIdsUniqueTrials(), the function returns no unique identifier when the trial has a exactly one record from a EU Member State and exactly one third-country record.
ctrdata::dbFindIdsUniqueTrials()
produces non-unique and incompletely deduplicated trial ids with the default setting of preferregister. It works correct with preferregister="CTGOV".
When converting information downloaded from EUCTR, any information on placebo was associated with (nested into) the last defined IMP instead of being presented as separate fields.
Please see the below code to reproduce:
dbc <- nodbi::src_mongo(collection = "test2", db = "db")
q <- "https://www.clinicaltrialsregister.eu/ctr-search/search?query=2014-001203-50"
ctrLoadQueryIntoDb(
queryterm = q,
con = dbc,
euctrresults = TRUE,
documents.path = "d",
verbose = TRUE,
)
I believe it is related to this code in main.R
tmp <- utils::unzip(
zipfile = f,
exdir = tempDir)
if (is.null(tmp)) return(NULL)
Function dbGetFIeldsIntoDf()
includes typing of certain fields, such as "n_date_of_ethics_committee_opinion"
as calendar date. However, when a register record has a non-conforming entry in such a field (e.g., "2020-88-99") the user is not made aware that the typing in this function just replaces this field with NA
, silently. Reported by @florianlasch .
When an IMP is declared to exist but is not presented with any information, the JSON array of IMPs is not properly constructed and left without closing brackets. Subsequently, none or only some of the records are imported into mongodb.
Parsing of EUCTR protocol-related information incorrectly inserts the section header "MedDRA Classification" into the preceding field (E.1.1).
I am getting this:
Error: ctrGetQueryUrlFromBrowser(): 'url' and / or 'register' is not a single character string.
while following the example code
ctrOpenSearchPagesInBrowser(input = "cancer&age=under-18",register="EUCTR")
q <- ctrGetQueryUrlFromBrowser()
db <- nodbi::src_sqlite(
dbname = "sqlite_file.sql",
collection = "test"
)
ctrLoadQueryIntoDb(
queryterm = q,
only.count = TRUE,
con = db
)$n
I have windows 10 64 bit, R4.0.3.
ctrdata::ctrLoadQueryIntoDb()
failed when package ctrdata
is install in a path that contains spaces in its name, aborting with an error indicating a file was not found.
By any chance, do you know of any shiny app that utilizes this package? While attempting to create one, I found it works perfectly on my local machine, but it fails if published shinyapps.io.
The error appears to originate from ctrLoadQueryIntoDb(), so I suspect this could be linked to the necessary command-line tools.
dbFindIdsUniqueTrials() obtains by calling dbGetVariablesIntoDf() from the database the fields needed for deduplication. dbGetVariablesIntoDf() would stop if a field was missing from all records in the database, such as may occur with few trial records. This led to dbFindIdsUniqueTrials() failing, silently, and no identifying duplicates.
ctrLoadQueryIntoDb works on windows machine when searching EUCTR, but fails when I try the code on a ubuntu server with the following error.
Similar searches with other queries don't work. Searching CTGOV works fine on both machines. I've updated R and all packages with no success, and tried the dev version of ctr data. Other API calls work from the ubuntu machine, (as does using the CTGOV registry). I've checked sed, php, cat, perl are installed on the ubuntu machine, no luck. I'm using a sqlite database if that makes any difference.
Any thoughts on what might be the issue?
Continuous integration of my project with mongodb 3.2 triggers the error:
Unrecognized field in update operation: arrayFilters
,
see https://travis-ci.org/rfhb/ctrdata/jobs/320570386#L1772.
The suspected issue is jeroen/mongolite#116.
if (any(!saved)) {
warning("Could not save ", nonXmlFiles[!saved],
call. = FALSE, immediate. = TRUE)
}
the variable saved is an invalid argument. After commenting out this clause it worked.
Reproducible example:
Sys.setlocale("LC_TIME", "de_DE"); ctrdata::ctrLoadQueryIntoDb(queryterm = "2012-004228-40", register = "EUCTR", euctrresults = TRUE
leads to field "firstreceived_results_date" being an empty string in the mongo data base collection.
Fixed with commit 4b02df6.
ClinicalTrials.Gov just made the previous beta website layout now the default layout. Links to the previous website start now with https://classic.clinicaltrials.gov/
. Search parameters are different. ctrdata
is being updated to handle the situation.
Users may have encountered errors such as the following:
Error : C stack usage 8329845 is too close to the limit
or
Error in rcpp_read_ndjson_file(normalizePath(ndjson), get_download_mode(), Not compatible with STRSXP
when using package ctrdata
function ctrLoadQueryIntoDb()
executing the step (3/3) Importing JSON records into database...
. The cause may be highly complex JSON data that need to be processed and that exceed the available stack.
Potential solutions increase the stack size and can be found here for
The occurence of this issue cannot be prevented by code in package ctrdata
because it depends on the complexity of data in a trial record as retrieved from one of the register. With the suggestions above, users may configure their operating system to handle such complex datsa.
This issue is opened to collect and document issues with content from registers that has errors and where package ctrdata
cannot mitigate the errors. Commands and output are in the next sections.
db <- nodbi::src_sqlite(dbname = "db.sqlite", collection = "errorTrials")
ctrdata::ctrLoadQueryIntoDb(queryterm = "2020-000876-40", register = "EUCTR", verbose = TRUE, con = db)
The messages to the user from the above command are ending with,
Error: lexical error: invalid char in json text.
{ "_id": "{"e521_timepoints_of_evaluation_o
(right here) ------^
The published trial record included in section "E.5.2 Secondary end point(s)" the following content, which since opening the issue has been corrected in the register:
- Toxicity
- Compliance to treatment
EudraCT Number: 2020-000876-40
Sponsor Protocol Number: IJB-REGINA-2020
ClinicalTrials.gov Number: NCT04503694
[...]
The field contains in the register specific identifiers that are not expected in a field and which are used by ctrdata
for splitting downloaded information on a series of trials. This error in content in the register is not readily fixable by ctrdata
.
Hi,
This is my first time using r and mongo so please for give me if there's something I've missed.
I get error when importing into mongoDB, object; 'ids' not found.
e.g.
ctrLoadQueryIntoDb(queryterm = q)
"Search should return 3 results in this example"
(2/3) Converting to JSON ...
(3/3) Importing JSON into mongoDB ...
Error in dbCTRLoadJSONFiles(dir = tempDir, mongo = mongo) :
object 'ids' not found
In addition: Warning message:
In system(euctr2json, intern = TRUE) :
running command 'cmd.exe /c c:\cygwin\bin\bash.exe
I've taken a look at the dbCTRLoadJSONFiles script to see if I could make any sense if the issue but I'm too novice at this point. https://rdrr.io/cran/ctrdata/src/R/main.R#sym-dbCTRLoadJSONFiles
Kind regards,
Joe
====================================================
I'm running under the following setup;
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] mongolite_2.0.1 ctrdata_0.18
loaded via a namespace (and not attached):
[1] httr_1.4.0 clipr_0.5.0 compiler_3.5.3 magrittr_1.5
[5] R6_2.4.0 tools_3.5.3 curl_3.3 Rcpp_1.0.1
[9] xml2_1.2.0 jsonlite_1.6 rvest_0.3.3 openssl_1.3
[13] askpass_1.1
===
CYGWIN_NT-10.0 DESKTOP-12J62K2 3.0.6(0.338/5/3) 2019-04-06 16:18 x86_64
mongodb-win32-x86_64-2008plus-ssl-4.0.8-signed
This happens if the URL is not from CTGOV or EUCTR, but the intention is to use the function for parameter checking so there should be a value returned for any input.
I have a specific requirement for my workflow: i would need to manual input the trial IDs to populate the database, instead of using the package's built-in functions to search the websites. I didn't find how to do this in the available documentation. Is it possible to do that with this package?
Consider the following example. With the old API, the filter is respected, whereas with the new one, all studies would be downloaded.
library("tibble")
library("ctrdata")
q1 <- tibble(`query-term` = "spons=Pfizer", `query-register` = "CTGOV")
ctrLoadQueryIntoDb(
queryterm = q1,
only.count = TRUE
)$n
# 5639
q2 <- tibble(`query-term` = "spons=Pfizer", `query-register` = "CTGOV2")
ctrLoadQueryIntoDb(
queryterm = q2,
only.count = TRUE
)$n
# 470145
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.