Giter VIP home page Giter VIP logo

cdkr's Introduction

Build Status CRAN Version CRAN Downloads CRAN Downloads Monthyl R-CMD-check

rcdk: a chemistry library

The goal of cdkr is to provide easy access to CDK chemoinformatics library to combine the simplicity and power of R with CDK’s powerful, tested API.

Installation

rCDK package releases are available on CRAN or on Github via Devtools:

# releases
install.packages("rcdk")

# development releases of `cdkr` are also available on github uinsg devtools:
library(devtools)
install_github("https://github.com/CDK-R/rcdklibs")
install_github("https://github.com/CDK-R/cdkr", subdir="rcdk")

Building and Development

Information on building and devloping the CDKR package is available in teh Otherwise if you prefer the command line

    cd /tmp/
    git clone [email protected]:CDK-R/rcdklibs.git
    R CMD INSTALL rcdklibs
    git clone [email protected]:CDK-R/cdkr.git
    cd cdkr/rcdkjar
    ant clean jar
    cd ../
    R CMD INSTALL rcdk

Before performing the install, you should have the following dependencies installed:

  • rJava
  • fingerprint
  • png
  • RUnit
  • Java JDK >= 1.8

For the png package, I have tested png-0.1-7

Some users have reported that rcdk methods (such as parse.smiles) are returning errors related to class not found or class version mismatch. This can happen when you are using a prepackaged version of rJava from CRAN and is caused by that package not finding the correct JRE home if you have multiple Java versions installed. In such a case, reinstalling rJava from sources appears to resolve this issue. See this discussion.

Installing Java

rCDK uses the CDK library that requires the Java JDK >= 1.8. In order to install rCDK, this requirement must be satisfied. You can check your java version on the command line as follows:

> java -version
> java version "1.8.0"

If your version is not 1.8 you may need to download and install a more recent installation of JAVA. If you have multiple versions of JAVA you may be using an older version. On Mac OSX, for example, the latest OS installs JAVA 1.6 and you will need to reconfigure your JAVA install. You can try the following:

# set the java version
R CMD javareconf  # or ....
sudo R CMD javareconf

# re install fromfrom R
install.packages('rJava', type="source")

Further informaiton about R’s use of Java can be found here.

cdkr's People

Contributors

abhik1368 avatar adelenelai avatar allaway avatar bachi55 avatar egonw avatar jbuonagurio avatar jorainer avatar mohammedfcis avatar olivroy avatar paulboardman avatar rajarshi avatar rickhelmus avatar schymane avatar sneumann avatar zachcp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdkr's Issues

fp.sim.matrix fplist2 issue

Hi there,

Thanks for a great R package. I'm encountering an issue with fp.sim.matrix() wherein fplist2 seems to be always interpreted as null even when I provide a second list of fingerprints.

Eg:

fp1 is a list of 8500 fingerprints
fp2 is a list of 2500 fingerprints

fp.sim <- fp.sim.matrix(fplist = fp1, fplist2 = fp2, method='tanimoto')

fp.sim ends up being a 8500x8500 matrix, rather than 8500x2500.

I am running R 3.4.0, fingerprint version 3.3.8.

Cheers,
Robert

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

Hi, I am using Python to invoke R package by rpy2. Actually this is a django project, I can use cdkr correctly only the first time, when I refresh the website, I get these error message everytime:

Error in .jcall(cn, "Ljava/lang/Object;", "get", as.integer(i - 1)) : 
  java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

I am not good at R, so I would like to know, what kind of situation may cause the size equal to zero?, in cdkr package source code, line 68. Thanks!

Segfault when loading rcdk

Using r-devel version I get a segfault when loading rcdk.
See also the CRAN checks.

sessionInfo()
R Under development (unstable) (2016-11-14 r71659)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] clisymbols_1.0.0 prompt_1.0.0     gitty_1.0.0     

loaded via a namespace (and not attached):
[1] compiler_3.4.0 parr_3.3.0     whisker_0.3-2  crayon_1.3.2   memuse_3.0-1 

Note, that there are no problems under current R:

❯ library(rcdk)
Loading required package: fingerprint

❯ sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=de_DE.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=de_DE.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=de_DE.UTF-8          LC_NAME=de_DE.UTF-8          
 [9] LC_ADDRESS=de_DE.UTF-8        LC_TELEPHONE=de_DE.UTF-8     
[11] LC_MEASUREMENT=de_DE.UTF-8    LC_IDENTIFICATION=de_DE.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] rcdk_3.3.6        fingerprint_3.5.4 clisymbols_1.0.0  prompt_1.0.0     
[5] gitty_1.0.0      

loaded via a namespace (and not attached):
 [1] parr_3.3.0      parallel_3.3.2  whisker_0.3-2   crayon_1.3.2   
 [5] rcdklibs_1.5.13 memuse_3.0-1    iterators_1.0.8 itertools_0.1-3
 [9] rJava_0.9-8     png_0.1-7      

I don't see this, when running on Travis-CI, which runs R-devel on Ubuntu 12.04 LTS.
Maybe it something with my installation... Investigating...
Maybe a different jdk version
openjdk version "1.8.0_111"
vs
oraclejdk8

Nope. Segfault also with oracle.

export .get.desc.values()

Rajarshi,

I am using the .get.desc.values() in a derived package, but cannot access the method anymore in the new R NAMESPACE world. Can you make the method public perhaps, or is there an alternative method that is exported that i should be using instead?

Egon

R CMD check Warning in rcdk

* checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'parse.smiles'
  ‘kekulise’
Documented arguments not in \usage in documentation object 'parse.smiles':
  ‘kekulize’

Patch coming.

More robust iload.molecules() in case of NoSuchAtomTypeException

Hi,

another problem occurs for other exceptions:

Error in .jcall(.jnew("org/openscience/cdk/ChemObject"), "Lorg/openscience/cdk/interfaces/IChemObjectBuilder;", :
org.openscience.cdk.exception.NoSuchAtomTypeException: The AtomType Se.2 could not be found

Can these be caught with rJava ? And an option added to iload.molecules()
to just jump over them, because we can't do anything with them anyway ?

Yours,
Steffen

get.mcs returns OutOfMemoryError

Hello,

For a specific set of molecules I cannot seem to calculate the mcs:

mol1 <- rcdk::parse.smiles("CC(=C)[C@@H]1CC[C@]2([C@H]1[C@H]3CC[C@H]4[C@]([C@@]3(CC2)C)(CC[C@H]([C@]4(C)CCOC(=O)C)C(C)(C)COC(=O)C)C)COC(=O)C")[[1]]
mol2 <- rcdk::parse.smiles("C[C@H]1[C@H](C2CC[C@@]3(C(=C2[C@]([C@@H]1C)(C)O)C=CC4[C@]3(CCC5[C@@]4(CCC(C5(C)C)O[C@H]6[C@@H]([C@H]([C@H](CO6)O)O)O)C)C)C)C")[[1]]
mcs <- rcdk::get.mcs(mol1, mol2)
Error in .jcall("org.guha.rcdk.util.Misc", "Lorg/openscience/cdk/interfaces/IAtomContainer;",  : 
  java.lang.OutOfMemoryError: Java heap space

The calculation takes a lot of time and usually fails with above error. Once in a while it may succeed though. Any thoughts? I am using the latest version of rcdk and rcdklibs from CRAN.

fingerprint::distance

First execution returns the distance, second time around it generates segmentation error:

library('rcdk', 'fingerprint')
a <- parse.smiles('CCC')
b <- parse.smiles('CCCO')
af <- get.fingerprint(a[[1]])
bf <- get.fingerprint(b[[1]])
fingerprint::distance(af, bf)
[1] 0.4285714
fingerprint::distance(af, bf)
Segmentation fault (core dumped)

This happens even if I use a new set of feature vectors.

depiction with kekulise=FALSE

The new depiction with kekulise=TRUE looks awesome, but with kekulise=FALSE rather bizarre.
In earlier versions this would have been the aromatic delocalised ring representation - any reason for the change? Should I "block" the kekulise=FALSE option? I'd rather keep it in for backwards compatibility...

smiles <- "OS(=O)(=O)c1ccc(cc1)C(CC(=O)O)CC(=O)O"
plot.new()
plot.window(xlim=c(0,200), ylim=c(0,100))
mol <- parse.smiles(smiles,kekulise=TRUE)[[1]]
img <- view.image.2d(mol)
rasterImage(img, 0,0, 100,100)
mol <- parse.smiles(smiles,kekulise=FALSE)[[1]]
img <- view.image.2d(mol)
rasterImage(img, 100,0, 200,100)

image

Errors with BasicGroupCountDescriptor

Source: https://www.biostars.org/p/100384/

Reproducible with rcdk 3.2.3.2 with:

#! /usr/bin/Rscript
require(rcdk)
drug.mols <- load.molecules(molfiles="./CID_175540.sdf")
descNames <- unique(unlist(sapply(get.desc.categories(), get.desc.names)))
drug.descs <- eval.desc(drug.mols, descNames, verbose=T)

The error you get:

Processing  BasicGroupCountDescriptor 
Error in if (is.na(dval)) return(NA) : argument is of length zero
In addition: Warning message:
In is.na(dval) : is.na() applied to non-(list or vector) of type 'NULL'

Error in get.assay() function

Hi, I was trying to follow the examples on youtube on how to retrieve the SMILES files from pubchem with the tutorial video. Using following script,

library(rpubchem)
library(rcdk)
library(ggplot2)
aids <- find.assay.id("dihydroorotate+dehyogenase+and+Malaria")
aidsdata <- data.frame()
for (i in 1:50){
  assay <- get.assay(aids[i], quiet=TRUE)
  assaydata <- assay[,c("PUBCHEM.CID", "PUBCHEM.ACTIVITY.OUTCOME")]
  data <- rbind(data, assaydata)
}

I receive the following error.

In addition: Warning message:
In open.connection(file, "rt") :
  cannot open: HTTP status was '400 Bad Request'

Here is the sessionInfo():

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_1.0.1     rcdk_3.3.2        fingerprint_3.5.2 rpubchem_1.5.0.2  webchem_0.0.1     dplyr_0.4.1      

loaded via a namespace (and not attached):
 [1] assertthat_0.1   bitops_1.0-6     car_2.0-25       colorspace_1.2-6 DBI_0.3.1        digest_0.6.8     grid_3.1.2      
 [8] gtable_0.1.2     iterators_1.0.7  lattice_0.20-31  lazyeval_0.1.10  lme4_1.1-7       magrittr_1.5     MASS_7.3-40     
[15] Matrix_1.2-0     mgcv_1.8-6       minqa_1.2.4      munsell_0.4.2    nlme_3.1-120     nloptr_1.0.4     nnet_7.3-9      
[22] parallel_3.1.2   pbkrtest_0.4-2   plyr_1.8.2       png_0.1-7        proto_0.3-10     quantreg_5.11    rcdklibs_1.5.8.4
[29] Rcpp_0.11.5      RCurl_1.95-4.6   reshape2_1.4.1   rJava_0.9-6      RJSONIO_1.3-0    scales_0.2.4     SparseM_1.6     
[36] splines_3.1.2    stringr_0.6.2    tools_3.1.2      XML_3.98-1.1    

Similarity methods for count fingerprints

Update fingerprint package to support similarity calculations for count fingerprints (that actually use the counts). See the Tanimoto class in the CDK sources. Also, the one named with the number 2 is the one that performed the best for me in virtual
screening.

Verbose installation rcdk_3.4.8

I'm getting a lot of output trying a fresh install of rcdk_v3.4.8 from github that seems to persist for all subsequent installations as well (and also appears for rcdk_libs)

  • installing source package 'rcdk' ...
    [...]
    *** arch - i386
    0 [main] DEBUG org.openscience.cdk.DynamicFactory - registered 'IAtom' with 'Atom' implementation
    0 [main] DEBUG org.openscience.cdk.DynamicFactory - registered 'IPseudoAtom' with 'PseudoAtom' implementation
    [... 38 lines total]

*** arch - x64
0 [main] DEBUG org.openscience.cdk.DynamicFactory - registered 'IAtom' with 'Atom' implementation
15 [main] DEBUG org.openscience.cdk.DynamicFactory - registered 'IPseudoAtom' with 'PseudoAtom' implementation
[... another 38 or so lines ... you get the picture ]

Then ...

  • installing source package 'ReSOLUTION' ...
    [...]
    0 [main] DEBUG org.openscience.cdk.DynamicFactory - registered 'IAtom' with 'Atom' implementation
    16 [main] DEBUG org.openscience.cdk.DynamicFactory - registered 'IPseudoAtom' with 'PseudoAtom' implementation
    [... again another 36 lines or so ..., both times, for i386 and x64 ... ]

Is there a way to make it quiet? :-) Thanks!

R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252

attached base packages:
[1] grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] ReSOLUTION_0.1.5 nontarget_1.9 mgcv_1.8-20 nlme_3.1-131 nontargetData_1.1
[6] mzR_2.10.0 readxl_1.1.0 OrgMassSpecR_0.5-3 RChemMass_0.1.12 enviPat_2.2
[11] rsvg_1.2 curl_3.2 rcdk_3.4.8 rcdklibs_2.2 rJava_0.9-9
[16] RMassBank_2.4.0 Rcpp_0.12.13 devtools_1.13.3

loaded via a namespace (and not attached):
[1] lattice_0.20-35 colorspace_1.3-2 stats4_3.4.2 fingerprint_3.5.7
[5] yaml_2.1.14 vsn_3.44.0 XML_3.98-1.9 rlang_0.1.2
[9] withr_2.0.0 MSnbase_2.2.0 BiocParallel_1.10.1 affy_1.54.0
[13] BiocGenerics_0.22.1 affyio_1.46.0 foreach_1.4.3 plyr_1.8.4
[17] mzID_1.14.0 ProtGenerics_1.8.0 zlibbioc_1.22.0 cellranger_1.1.0
[21] munsell_0.4.3 pcaMethods_1.68.0 gtable_0.2.0 codetools_0.2-15
[25] memoise_1.1.0 Biobase_2.36.2 knitr_1.17 IRanges_2.10.5
[29] doParallel_1.0.11 BiocInstaller_1.26.1 parallel_3.4.2 itertools_0.1-3
[33] preprocessCore_1.38.1 scales_0.5.0 limma_3.32.10 S4Vectors_0.14.7
[37] impute_1.50.1 rjson_0.2.15 ggplot2_2.2.1 png_0.1-7
[41] digest_0.6.12 tools_3.4.2 bitops_1.0-6 lazyeval_0.2.0
[45] RCurl_1.95-4.8 tibble_1.3.4 Matrix_1.2-11 httr_1.3.1
[49] iterators_1.0.9 R6_2.2.2 MALDIquant_1.16.4 git2r_0.19.0
[53] compiler_3.4.2

eval.desc fails with parLapply

Following the example for "eval.desc"

smiles <- c('CCC', 'c1ccccc1', 'CC(=O)C')
mols <- sapply(smiles, parse.smiles)

dnames <- get.desc.names('topological')
descs <- eval.desc(mols, dnames, verbose=TRUE)

you can do

lapply(dnames,function(x){ require(rcdk); eval.desc(mols, x, verbose=FALSE)})

which works but this

cl <- makeCluster(detectCores()-1)
clusterExport(cl, "mols")
rcdk_desc <- parLapply(cl,dnames,function(x){ require(rcdk); eval.desc(mols, x, verbose=FALSE)})
stopCluster(cl)

fails with

Error in checkForRemoteErrors(val) : 
  3 nodes produced errors; first error: java.lang.NullPointerException

Applying over "mols" instead of "dnames" does not produce an error but all are NA.

I realize this is probably some java/parallel interaction but do you have any idea of a workaround?

loading rinchi first causes errors

Loading rinchi first causes the following error. If I load rcdk first all works.

library(rinchi)
library(rcdk)
Loading required package: rcdklibs
Loading required package: rJava
Warning messages:
1: packagercdkwas built under R version 3.4.3 
2: packagercdklibswas built under R version 3.4.3 
> m <- parse.smiles('C1C=CCC1N(C)c1ccccc1')[[1]]
> get.smiles(m)
Error in .jnew("org/openscience/cdk/smiles/SmilesGenerator", flavor) : 
  java.lang.NoSuchMethodError: <init>

parse.smiles doesn't stop or give warnings for invalid SMILES

I ran parse.smiles on a long list (14,000+) of SMILES, some of which I later discovered were actually invalid. However, the function doesn't stop or give warnings as I'd have expected, and it took a while to trace which SMILES were problematic (so that I can remove/fix them).

Small snippet to reproduce:

smi <- c('CCC', 'c1ccccc1',  'N/A', 'C(C)(C=O)C(CCNC)C1CC1C(=O)', 'foo')
parse.smiles(smi)

Output:

$CCC
[1] "Java-Object{AtomContainer(171493374, #A:3, Atom(1876682596, S:C, H:3, AtomType(1876682596, FC:0, ..."

$c1ccccc1
[1] "Java-Object{AtomContainer(806511723, #A:6, Atom(1250442005, S:C, H:1, AtomType(1250442005, FC:0, ..."

$`N/A`
NULL

$`C(C)(C=O)C(CCNC)C1CC1C(=O)`
[1] "Java-Object{AtomContainer(2032079962, #A:14, Atom(953082513, S:C, H:1, AtomType(953082513, FC:0..."

$foo
NULL

NPE if SDF file has R# atom

Hi,

I am trying to iterate over ChEBI using the code below on a current rcdk git snapshot,
and want to calculate fingerprints. The iteration fails as soon as rcdk hits a compound
with an R# "atom", e.g. CHEBI:15489 because the hasNext(moliter) fails with an NPE:

Error in .jcall(sreader, "Z", "hasNext") : java.lang.NullPointerException

I have no problem if the molecule is a NULL, but iteration should be able to continue
to the end of the file.

Yours,
Steffen

P.S. That github Markdown for code looks cool!

library(rcdk)
sessionInfo()

chebifile <- "ChEBI_complete.sdf"

# iterate over a large file
moliter <- iload.molecules(chebifile, type="sdf")
i <- 1
chebifp <- c(new("fingerprint"))

while(hasNext(moliter)) {
    mol <- nextElem(moliter)
}

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rcdk_3.1.5        iterators_1.0.3   png_0.1-3         fingerprint_3.4.6
[5] rcdklibs_1.4.5    rJava_0.9-2      

rpubchem bug

Hi,

thanks for your rcdk packages. I've just started trying them and have encountered the following bug in rpubchem.

x  <- get.cid(3197)
 x$CanonicalSmile
>[1] "C51H64N12O12S2"

Looks like everything after the IUPACName is off. Somewhere or another the XML parsing is off. Looks like maybe line 280.

Just built the package from source and it worked fine. This was probably due to #17 Maybe this can get bumped to CRAN?

zachcp

unexpected error when file is not found

mols <- load.molecules(molfiles=c("thisFileDoesNotExist.sdf"))

causes not a fail message, but this error:

Error in if (!file.exists(f) && !grep("http://", f)) stop(paste(f, ": Does not exist", :
missing value where TRUE/FALSE needed

Problems with isotope pattern of charged molecules

Dear rcdk-Developers,

I'm using rcdk for prediction of isotope patterns for mass spectrometric analysis. I recognized that some problems when working with charged formulas. The function get.isotopes.pattern returns the masses without correction for charge.

Please find a example for the [M+Na]+ adduct of Glucose below.

Best regards,

Michael

library(rcdk)

glucose <- "C6H12O6"
glucoseFormula <- get.formula(glucose, charge = 0)

sodium <- "Na"
sodiumFormula <- get.formula(sodium, charge = 1)

glucoseFormula@mass + sodiumFormula@mass

glucoseSodium <- "C6H12O6Na"
glucoseSodiumFormula <- get.formula(glucoseSodium, charge = 1)

glucoseSodiumFormula@mass

get.isotopes.pattern(glucoseSodiumFormula, minAbund = 0.001)[1]
get.formula(glucoseSodium, charge = 0)@mass

The results are:

glucoseFormula@mass + sodiumFormula@mass
[1] 203.0526

glucoseSodiumFormula@mass
[1] 203.0526

get.isotopes.pattern(glucoseSodiumFormula, minAbund = 0.001)[1]
[1] 203.0532

get.formula(glucoseSodium, charge = 0)@mass
[1] 203.0532

Dual display of aromaticity after do.typing()

Hi,

library(rcdk)
m <- parse.smiles("c1ccccc1")[[1]]
view.molecule.2d(m)
do.typing(m)
view.molecule.2d(m)

The first display is kinda correct:
screenshot from 2014-02-12 22 49 47

while after typing I get dual aromaticity depiction:
screenshot from 2014-02-12 22 51 11

It would be great to have just one of them. (Or even a choice :-)
And I love the github issue tracker. Very well done.

Yours,
Steffen

Unable to install rcdk on macOS with JDK 1.9

I've problems loading the rcdk package on macOS high sierra with JDK 1.9. The package fails to install with the error:

Error: package or namespace load failed for ‘rcdk’:
.onLoad failed in loadNamespace() for 'rcdk', details:
call: if (isjavagood == FALSE) {
error: missing value where TRUE/FALSE needed
Error: loading failed

Poor error message for invalid smiles in rinchi

Hi,
we get a not so informative error message when passing crap into get.inchi()

> get.inchi("x")
Error in .jcall("org/guha/rcdk/util/Misc", "S", "getInChi", molecule,  : 
  method getInChi with signature ()Ljava/lang/String; not found

It would be better to throw a better error message invalid SMILES back to R.
Yours, Steffen

rcdkjar fails to compile with CDK nightly, missing IsotopeFactory

Hi, I am trying to compile with an updated cdk.jar.
While it works with the included cdk-1.5.2, it does not work
with the download of cdk-1.5.5.ar from sf.net nor the nightly.
This is with javac 1.7.0_51 on Linux.

    [javac] /vol/R/rguha/cdkr/rcdkjar/src/org/guha/rcdk/util/Misc.java:188: error: cannot find symbol
    [javac]             IsotopeFactory ifac = IsotopeFactory.getInstance(DefaultChemObjectBuilder.getInstance());

Not displaying charges

Hi,
I am back to using rCDK for some stuff, and there are some display issues
in the current release. Here are some test cases, most by Emma:

library(rcdk)
m <- parse.smiles("[CH2+]")[[1]]
get.total.charge(m)
view.molecule.2d(m)

As the screenshots show, there is no + charge:

screenshot from 2014-02-12 22 42 00

rcdklibs_1.5.4 , rcdk_3.2.4

get.mcs yields NullPointerException when there is no overlap

Hello,

As the title says.

Example:

sm1 <- "C1=CC=CC=C1"
sm2 <- "[O-]P(=O)([O-])[O-]"
rcdk::get.mcs(rcdk::parse.smiles(sm1)[[1]], rcdk::parse.smiles(sm2)[[1]])

Gives

Error in .jcall("org.guha.rcdk.util.Misc", "Lorg/openscience/cdk/interfaces/IAtomContainer;",  : 
  java.lang.NullPointerException

Perhaps it should return an empty molecule (if that is possible...)?

Regards,
Rick

get.exact.mass(): NullPointerException

Hi, I can confirm @schymane problem with rcdk-3.4.9 (see MassBank/RMassBank#199). I haven't checked in detail yet, but running the example from the get.exact.mass manpage does not work:

       m <- parse.smiles('c1ccccc1')[[1]]
     
       ## Need to configure the molecule
       do.aromaticity(m)
       do.typing(m)
       do.isotopes(m)
     
       get.exact.mass(m)

>        get.exact.mass(m)
[1] "Java-Object{java.lang.NullPointerException}"
Error in get.exact.mass(m) : 
  Couldn't get exact mass. Maybe you have not performed aromaticity, atom type or isotope configuration?

So either an issue with the rcdk code, my environment or the documentation.

Yours,
Steffen

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rcdk_3.4.9   rcdklibs_2.2 rJava_0.9-10

loaded via a namespace (and not attached):
[1] compiler_3.4.4    tools_3.4.4       parallel_3.4.4    fingerprint_3.5.7
[5] iterators_1.0.10  itertools_0.1-3   png_0.1-7   

and

java --version
openjdk 10.0.1 2018-04-17
OpenJDK Runtime Environment (build 10.0.1+10-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 10.0.1+10-Ubuntu-3ubuntu1, mixed mode)

Need to Update for Java 11

@rajarshi ,

I received an email from CRAN about fixing an rcdk version check before the release to Java 11. I didn't update a fix in time and noticed that as of this morning I have been booted from Maintainership as of rcdk (see https://cran.r-project.org/web/packages/rcdk/rcdk.pdf)

I will look into a fix of the Java version test to patch an updated version, but I don't know if I will be able to submit the patch. Build error and and correction updated below. A fix will need to
either:

  1. Fix the Java11 issue but avoid your recent updates which require a more recent CDK
  2. Update the rCDKlibs with a new release and then add the Java 11 Patch to master.

Any preferences on which route to take?

zach cp

* testing if installed package can be loaded
Warning in fun(libname, pkgname) : NAs introduced by coercion
Error: package or namespace load failed for ‘rcdk’:
  .onLoad failed in loadNamespace() for 'rcdk', details:
   call: if (isjavagood == FALSE) {
   error: missing value where TRUE/FALSE needed

jversion evaluates as 11-ea+22" .

The code in 'Writing R Extensions' does work portably,

Please correct ASAP and before Sep 25 (the currently expected release
date for Java 11).

Issues with function get.desc.categories()

Hi Rajarsh,

I found some errors this morning while running my code built on rcdk package. Zach mentioned you guys are doing a major update for the packages. I dig into my code and seems like "get.desc.categories()" generate errors like shown below:

Error in .jcall("org/guha/rcdk/descriptors/DescriptorUtilities", "[Ljava/lang/String;", :
java.lang.UnsupportedClassVersionError: org/guha/rcdk/descriptors/DescriptorUtilities : Unsupported major.minor version 52.0

I wonder if this is something you guys noticed?

Thank you,

Tao

problem with load.molecules in rcdk 3.2.7

load.molecules(molfiles="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6442441&disopt=SaveSDF")

Error in as.character.default(X[[1L]], ...) :
no method for coercing this S4 class to a vector

Same Error for the local file of this chemical.

however in version 3.2.3.2
This error will be https://www.biostars.org/p/100384/ (in my comments)

for CID: 6442441, http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6442441&disopt=SaveSDF

descriptors: "org.openscience.cdk.qsar.descriptors.molecular.RuleOfFiveDescriptor"; "org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor"

! /usr/bin/Rscript

require(rcdk)
drug.mols <- load.molecules(molfiles="./CID_6442441.sdf")
drug.descs <- eval.desc(drug.mols, "org.openscience.cdk.qsar.descriptors.molecular.RuleOfFiveDescriptor", verbose=T)

drug.descs <- eval.desc(drug.mols, "org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor", verbose=T)

Error in .jcall(b, "Lorg/openscience/cdk/qsar/DescriptorValue;", "calculate", :
java.lang.NullPointerException

Thank you. the rcdk is will helpful to me.

Issues with rcdk.jar

I am getting issues with get.fingerprint module. As far as I checked it is coming from call to get.property method. I have built the package from latest source off the master. See the following snippet

library(rcdk)
a = parse.smiles('CCCO')
f = get.fingerprint(a[[1]])
Error in .jcall("org/guha/rcdk/util/Misc", "Ljava/lang/Object;", "getProperty", :
RcallMethod: cannot determine object class

Line 43 in props.R has following call:
value <- .jcall('org/guha/rcdk/util/Misc', 'Ljava/lang/Object;', 'getProperty',
molecule, as.character(key), check=FALSE)
jClassPath seems to be okay and has path for rcdk.jar.

Is the error because of difference in JRE version?

.jnew('org.guha.rcdk.util.Misc')
Error in .jnew("org.guha.rcdk.util.Misc") :
java.lang.UnsupportedClassVersionError: org/guha/rcdk/util/Misc : Unsupported major.minor version 52.0

Though it seems that cdk-2.0.jar is also built using Java 1.8 and seems to work okay on my system.

library(rJava)
.jinit(classpath = "/home/varun/R/x86_64-pc-linux-gnu-library/3.4/rcdklibs/cont/cdk-2.0.jar")
.jclassPath()
[1] "/home/varun/R/x86_64-pc-linux-gnu-library/3.4/rJava/java"
[2] "/home/varun/R/x86_64-pc-linux-gnu-library/3.4/rcdklibs/cont/cdk-2.0.jar"
.jcall("org.openscience.cdk.CDK", "S", "getVersion")
[1] "2.0"

CDKR fails to compile via command line

Hei,

I cloned the latest version of the repository and tried to compile the library using the command line:

R CMD build rcdklibs
R CMD INSTALL rcdklibs_*gz
cd rcdkjar
ant clean jar
cd ../
R CMD build rcdk # <-- Produces error
R CMD INSTALL rcdk_*gz

The R CMD build rcdk command fails with the following error-message:

creating vignettes ... ERROR
Quitting from lines 110-115 (molform.Rmd) 
Error: processing vignette 'molform.Rmd' failed with diagnostics:
Elements must be 3-tuples or 4-tuples

I fixed the molform.Rmd file by changing the lines 110-111:

mit <- generate.formula.iter(100, charge=0, window=0.1,
                             elements=list(C=c(0,50), H=c(0,50), N=c(0,50)))

to:

mit <- generate.formula.iter(100, charge=0, window=0.1,
                             elements=list(c("C",0,50), c("H",0,50), c("N",0,50)))

to be compatible with the generate.formula.iter function definition in rcdk/R/formula.R.

However, I receive another compilation error, that I could not fix now:

creating vignettes ... ERROR
Quitting from lines 110-115 (molform.Rmd) 
Error: processing vignette 'molform.Rmd' failed with diagnostics:
method getString with signature (Lorg/openscience/cdk/interfaces/IMolecularFormula;ZZ)Ljava/lang/String; not found

This seems to be related to the command in line 245-246 in file rcdk/R/formula.R:

return(.jcall("org/openscience/cdk/tools/manipulator/MolecularFormulaManipulator",
      "S", "getString", formula, FALSE, TRUE))

Can you help with that?

Best regards,

Eric

Unit test failure for MCS

  1 Test Suite : 
  rcdk rcdk Unit Tests - 27 test functions, 0 errors, 1 failure
  FAILURE in test.get.smiles2: Error in checkEquals("N([CH2])CC", get.smiles(mcs)) : 1 string mismatch
checkEquals("N([CH2])CC", get.smiles(mcs))

In the new code, the mcs is returned as "[CH2]NCC", which looks identical to "N([CH2])CC". Patch for test coming.
Yours, Steffen
Steffen

Unable to use view.molecule.2d and view.image.rd

I've been struggling to view the 2d structure of any molecule. I tried with R version 3.3.1, version 3.3.2 and R version 3.4.2

Latest sessionInfo():
R version 3.4.2 (2017-09-28), png_0.1-7 , fingerprint_3.5.6, rcdk_3.4.3, rcdklibs_2.0, rJava_0.9-9
root@eb2bc2d30d3d:/# javac -version
javac 1.8.0_121

curcumin = parse.smiles("O=C(\C=C\c1ccc(O)c(OC)c1)CC(=O)\C=C\c2cc(OC)c(O)cc2")[[1]]
dep <- get.depictor(width = 200, height = 200, zoom = 1.3, style = "cow",

  • annotate = "off", abbr = "on", suppressh = TRUE,
  • showTitle = FALSE, smaLimit = 100, sma = NULL)

imp <- view.image.2d(curcumin, dep)
Error in .jcall(mi, "[B", "getBytes", as.integer(depictor$getWidth()), :
java.lang.NoClassDefFoundError: Could not initialize class sun.awt.X11GraphicsEnvironment

view.image.2d(curcumin, dep)
Error in .jcall(mi, "[B", "getBytes", as.integer(depictor$getWidth()), :
java.lang.NoClassDefFoundError: Could not initialize class sun.awt.X11GraphicsEnvironment

view.molecule.2d(curcumin, dep)
Error in .jnew("org/guha/rcdk/view/ViewMolecule2D", molecule, as.integer(width), :
java.lang.NoSuchMethodError:

view.molecule.2d: 'cellx' not found

Hi,

if running the example from #14, I get 'cellx' not found:

> library(rcdk)
Loading required package: fingerprint
> m <- parse.smiles("[CH2+]")[[1]]
> get.total.charge(m)
[1] 1
> view.molecule.2d(m)
Error in .jnew("org/guha/rcdk/view/ViewMolecule2D", molecule, as.integer(cellx),  : 
  object 'cellx' not found
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=de_DE.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=de_DE.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=de_DE.UTF-8          LC_NAME=de_DE.UTF-8          
 [9] LC_ADDRESS=de_DE.UTF-8        LC_TELEPHONE=de_DE.UTF-8     
[11] LC_MEASUREMENT=de_DE.UTF-8    LC_IDENTIFICATION=de_DE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rcdk_3.4.1        fingerprint_3.5.4

loaded via a namespace (and not attached):
[1] parallel_3.2.3  rcdklibs_1.5.14 iterators_1.0.8 itertools_0.1-3
[5] rJava_0.9-8     png_0.1-7 

Error in molecule visualization after rcdk 3.4.5 update

Hi,
I updated rcdk today to the newest version and the functionality of view.molecule.2d() broke down. I tried to execute the chunk from the vignette and get the following error:

library(rcdk)
Loading required package: rcdklibs
Loading required package: rJava
smiles <- c('CCC', 'CCN', 'CCN(C)(C)',
 'c1ccccc1Cc1ccccc1','C1CCC1CC(CN(C)(C))CC(=O)CC')
mols <- parse.smiles(smiles)
view.molecule.2d(mols[[1]])
Error in .jnew("org/guha/rcdk/view/ViewMolecule2D", molecule, as.integer(width),  : 
  java.lang.NoSuchMethodError: <init>

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rcdk_3.4.5   rcdklibs_2.0 rJava_0.9-9 

loaded via a namespace (and not attached):
[1] compiler_3.4.3    parallel_3.4.3    fingerprint_3.5.6 tools_3.4.3       iterators_1.0.9  
[6] itertools_0.1-3   png_0.1-7    

My Java build is (build 1.8.0_151-b12)

Include hydrogens in plots

Hydrogens are not included in plots (using view.image.2d or view.molecule.2d) functions. This is confusing when plotting structures containing, for example, hydroxyl groups. I suspect this is caused by the following default assigment in these functions of:

molecule = AtomContainerManipulator.removeHydrogens(molecule)

This assignment should be updated to include the following options:

  1. No hydrogens (default; current behaviour)
  2. All atoms (i.e. including all hydrogens; less useful but probably easy to implement)
  3. All hydrogens not attached to a carbon (most useful option)

request rcdklibs 1.5.12 release

hi @rajarshi,

thanks for your excellent package. I have been writing some utilities that depend on rcdk/rcdklibs and I would like to use the most recent features of 1.5.12 including the newer smiles parsing and possibly image depiction. Therefore, I am hoping that you can cut another CRAN release of rcdk/rcdklibs. I'd be happy to help in any way.

(FYI: just put together a chemdoodle widget for drawing molecules in html and using CDK as the backend for parsing them https://github.com/zachcp/chemdoodle)

zach cp

Rcdk isotopes in generate.formula.iter()

Hi Rcdk team

I have a small question regarding isotope annotation for generate.formula.iter().

If I want to annotate possible formulae to MS peaks, limited by the number of atoms of the parent compound. If I use the attached example without adding to the element list :

elements
[[1]]
[1] "C" "0" "7"

[[2]]
[1] "H" "0" "4"

[[3]]
[1] "Br" "0" "2"

[[4]]
[1] "O" "0" "3"

Using this list, the formula of the monoisotopic peak (M-H) at 292.8454296 can easily be annotated as "C7H3Br2O3" by the generate.formula.iter() function.
The problem is, that the M+2 peak at 294.8432039 is not annotated as the [81Br] isotope is not in the list.

if i modify the list to Br81:

elements
[[1]]
[1] "C" "0" "7"

[[2]]
[1] "H" "0" "4"

[[3]]
[1] "Br" "0" "2" "81"

[[4]]
[1] "O" "0" "3"

I can only annotate the peak with 2 81Br atoms.

If I add an additional line with the 81Br (as shown in the example):

elements
[[1]]
[1] "C" "0" "7"

[[2]]
[1] "H" "0" "4"

[[3]]
[1] "Br" "0" "2"

[[4]]
[1] "O" "0" "3"

[[5]]
[1] "Br" "0" "2" "81"

I can annotate all 3 peaks as "C7H3Br2O3". Unfortunately, the annotation makes no difference between 79Br and 81Br in regard to the symbol.

My question is now, if the is a way (or if a way could be created), to safe the isotope entry in the list with a different symbol (like [81Br]) so as to be able to differentiate between the annotated isotopes.

Thank you in advance
Benedikt Lauper
Eawag Dübendorf
Uchem

get.mol2formula: Error in .jcall(), invalid object parameter

Hi,
Both my local R CMD check as well as Travis report an issue:
https://travis-ci.org/sneumann/cdkr/builds/447702470#L3022

Error: processing vignette 'molform.Rmd' failed with diagnostics:
RcallMethod: invalid object parameter

This comes from

> library(rcdk)
Loading required package: rcdklibs
Loading required package: rJava
> sp <- get.smiles.parser()
> molecule <- parse.smiles('N')[[1]]
> convert.implicit.to.explicit(molecule)
> formula <- get.mol2formula(molecule,charge=0)
Error in .jcall(ch, "D", "doubleValue") : 
  RcallMethod: invalid object parameter

in

> traceback()
3: .jcall(ch, "D", "doubleValue")
2: .cdkFormula.createObject(.jcast(moleculaJT, .IMolecularFormula))
1: get.mol2formula(molecule, charge = 0)

Can someone confirm ? Ideas ? Yours, Steffen

> sessionInfo()
R Under development (unstable) (2018-10-17 r75450)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /vol/R/R-devel/lib/R/lib/libRblas.so
LAPACK: /vol/R/R-devel/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rcdk_3.4.9     rcdklibs_2.2.1 rJava_0.9-10  

loaded via a namespace (and not attached):
[1] compiler_3.6.0    tools_3.6.0       parallel_3.6.0    fingerprint_3.5.7
[5] iterators_1.0.10  itertools_0.1-3   png_0.1-7         tcltk_3.6.0      

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.