Giter VIP home page Giter VIP logo

cdk's Introduction

Maven Central build Bugs

The Chemistry Development Kit (CDK)

Copyright © 1997-2024 The CDK Development Team

License: LGPL v2, see LICENSE.txt

Home Page | JavaDoc | Wiki | Issues | Mailing List

Introduction

The CDK is an open-source Java library for cheminformatics and bioinformatics.

Key Features:

  • Molecule and reaction valence bond representation.
  • Read and write file formats: SMILES, SDF, InChI, Mol2, CML, and others.
  • Efficient molecule processing algorithms: Ring Finding, Kekulisation, Aromaticity.
  • Coordinate generation and rendering.
  • Canonical identifiers for fast exact searching.
  • Substructure and SMARTS pattern searching.
  • ECFP, Daylight, MACCS, and other fingerprint methods for similarity searching.
  • QSAR descriptor calculations

Install

The CDK is a class library intended to be used by other programs, it will not run as a stand-alone program.

The library is built with Apache Maven and currently requires Java 1.7 or later. From the root of the project run to build the JAR files for each module. The bundle/target/ directory contains the main JAR with all dependencies included:

$ mvn install

You can also download a pre-built library JAR from releases.

Include the main JAR on the Java classpath when compiling and running your code:

$ javac -cp cdk-2.9.jar MyClass.java
$ java -cp cdk-2.9.jar:. MyClass

If you are using Maven, you can use the uber cdk-bundle to grab everything, note it is much more efficient to use include the modules you need:

<dependency>
  <artifactId>cdk-bundle</artifactId>
  <groupId>org.openscience.cdk</groupId>
  <version>2.9</version>
</dependency>

If you are a Python user, the Cinfony project provides access via Jython. Noel O'Boyle's Cinfony provides a wrapper around the CDK and over toolkits exposing core functionality as a consistent API. ScyJava can also be used, as explain in ChemPyFormatics.

Further details on building the project in integrated development environments (IDEs) are available on the wiki:

Getting Help

The Toolkit-Rosetta Wiki Page provides some examples for common tasks. If you need help using the CDK and have questions please use the user mailing list, [email protected] (you must subscribe here first to post).

Acknowledgments

YourKit Logo

The CDK developers use YourKit to profile and optimise code.

YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.

cdk's People

Contributors

aclarkxyz avatar andyhowlettgithub avatar asad avatar bachi55 avatar danielszisz avatar dkatzel-ncats avatar dmak avatar egonw avatar ficolas2 avatar gilleain avatar goglepox avatar johnmay avatar jonalv avatar k-ujihara avatar kaibioinfo avatar kdole avatar klasjoensson avatar mjw99 avatar ntk73 avatar ostueker avatar plantej avatar rajarshi avatar rwst avatar sauliusg avatar tomas-pluskal avatar tylerperyea avatar uli-f avatar vedina avatar xperrylinn avatar yapchunwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdk's Issues

Fix removeAtom API

removeAtom(IAtom) got removed in some of previous patches I should add it back in.

AtomContainer API invariants

The AtomContainer has several ambiguities/oddities in it's API. This rolling issue will track those inconsistencies.

setAtom(int i, IAtom)

Q: Should this method expand the capacity of the container, should the getAtomCount() be updated?

There is this test:

 @Test
    public void testSetAtom_int_IAtom() {
        IAtomContainer container = (IAtomContainer) newChemObject();
        IAtom c = container.getBuilder().newInstance(IAtom.class, "C");
        container.setAtom(0, c);
        Assert.assertNotNull(container.getAtom(0));
        Assert.assertEquals("C", container.getAtom(0).getSymbol());
    }

Not probed in the test but getAtomCount() will return 0 for this container. If we initilize the container with capacity 0 we would get an ArrayIndexOutOfBounds error.

Isotope Philosophy in CDK

Daniel's being doing some validation of InChI's generated by CDK and noticed the isotopes get lost. This is a known problem that I fixed in SMILES with a 'AtomMassStrict' flag. I would like some clarification on the following:

What exactly does/should IsotopeFactory.confgure do and why is it needed?
and
How do we represent 'natural abundance'?

The documentation is not clear: IsotopeFactor.configure(atom)

In many code examples (for example Groovy CDK) atoms are typed and configured. This makes it impossible to distinguish [12CH4] vs [CH4] as [CH4] comes in with a null atom mass which then get's configured to [12CH4] however to avoid this ugly-ness in SMILES we check what the 'major' isotope is (12C) and omit it. Thus

both [12CH4] and [CH4] become [CH4]. To work round this the 'strict' flag treats 'null' as the undefined and all carbon 12. This means you can round trip the listed SMILES providing you don't call IsotopeFactory.configure(). If you do call IsotopeFactory.configure(atom) then [12CH4] and [CH4] both become [12CH4].

The same happens with the InChI, at the moment:

[12CH4] and [CH4] becomes InChI=1S/CH4/h1H4
[12CH4] should be InChI=1S/CH4/h1H4/i1+0
[CH4] should be InChI=1S/CH4/h1H4

IIRC the Molfile calls configure on input so the information is always lost.

I would like to propose deprecating the IsotopeFactory.configure() to avoid these problems or perhaps changing it such that 'null' does take default major isotope...

Reaction Depiction Alignment

An issue to open keep depiction problems.

Reaction Alignment Problems

[CH3:20][CH:19]([CH3:21])[CH2:18][C@H:17]([CH2:16][O:15][c:13]1[cH:12][cH:11][c:10]-2[c:9]([cH:14]1)[O:8][CH2:7][c:5]3[c:4]2[cH:3][cH:2][n:1][cH:6]3)[N:22]4C(=O)c5ccccc5C4=O>>[CH3:20][CH:19]([CH3:21])[CH2:18][C@H:17]([CH2:16][O:15][c:13]1[cH:12][cH:11][c:10]-2[c:9]([cH:14]1)[O:8][CH2:7][c:5]3[c:4]2[cH:3][cH:2][n:1][cH:6]3)[NH2:22]

Fixed!

[CH3:40][O:39][c:28]1[cH:27][c:26]([cH:31][cH:30][c:29]1[O:32][CH2:33][C@@H:34]([CH:36]2[CH2:38][CH2:37]2)[OH:35])[n:21]3[cH:20][n:19][c:18]4[cH:17][c:16]([s:24][c:23]4[c:22]3=[O:25])[c:13]5[cH:14][cH:15][c:10]([cH:11][cH:12]5)[Cl:9].[CH2:3]1[CH2:4][C:5](=[O:6])[O:7][C:1](=[O:8])[CH2:2]1>c1cnccc1N2CCCC2.C(Cl)Cl>[CH3:40][O:39][c:28]1[cH:27][c:26]([cH:31][cH:30][c:29]1[O:32][CH2:33][C@@H:34]([CH:36]2[CH2:37][CH2:38]2)[O:35][C:5](=[O:6])[CH2:4][CH2:3][CH2:2][C:1](=[O:8])[OH:7])[n:21]3[cH:20][n:19][c:18]4[cH:17][c:16]([s:24][c:23]4[c:22]3=[O:25])[c:13]5[cH:14][cH:15][c:10]([cH:11][cH:12]5)[Cl:9]

Fixed!

Spiro twisted

[B:14]1([O:13][C:10]([C:8]([O:15]1)([CH3:7])[CH3:9])([CH3:11])[CH3:12])B2OC(C(O2)(C)C)(C)C.[CH2:2]1[CH2:3][O:4][CH2:5][CH:6]=[C:1]1OS(=O)(=O)C(F)(F)F>CC(=O)[O-].[K+].CS(=O)C.C1=CC=C(C=C1)P([C-]2C=CC=C2)C3=CC=CC=C3.C1=CC=C(C=C1)P([C-]2C=CC=C2)C3=CC=CC=C3.C(Cl)Cl.Cl[Pd]Cl.[Fe+2].O>[B:14]1([O:13][C:10]([C:8]([O:15]1)([CH3:7])[CH3:9])([CH3:11])[CH3:12])[C:1]2=[CH:6][CH2:5][O:4][CH2:3][CH2:2]2 |f:2.3,5.6.7.8.9|

Fixed!

Epoxy Ring Break

[Cl:1][c:2]1[cH:3][c:4]2[c:11]([cH:12][cH:13]1)[C:8]1([CH2:7][CH2:6][CH2:5]2)[O:9][CH2:10]1>C1CCOC1.CCOCC.FB(F)F>[Cl:1][c:2]1[cH:3][c:4]2[c:11]([cH:12][cH:13]1)[C@@H:8]([CH:10]=[O:9])[CH2:7][CH2:6][CH2:5]2.[Cl:1][c:2]1[cH:3][c:4]2[c:11]([cH:12][cH:13]1)[C@H:8]([CH:10]=[O:9])[CH2:7][CH2:6][CH2:5]2

Fixed

Indigo AAM Fail 1

[cH:22]1[cH:21]c2[c:16](cc[nH]2)[cH:15][c:14]1[F:13].[cH:11]1[cH:12][n:8]([cH:9]n1)[C:1](=[O:2])[n:3]2[cH:7][cH:6]n[cH:4]2>CC#N.CN(C)c1ccncc1>[cH:15]1[cH:16][c:9]2[c:21]([cH:11][cH:12][n:8]2[C:1](=[O:2])[n:3]3[cH:7][cH:6][c:16]4[c:4]3[cH:21][cH:22][c:14]([cH:15]4)[F:13])[cH:22][c:14]1[F:13]

Won't fix - AAM is wrong.

Indigo AAM Fail 2

[CH3:11][S:10](=[O:12])(=[O:13])[NH:9][CH2:8][C:1](=O)[c:2]1[cH:3][cH:4][cH:5][cH:6][cH:7]1.[cH:18]1[cH:19][cH:20][c:15]2[c:16]([cH:17]1)[C:21](=O)[C:22](=[O:23])[NH:14]2>CC(C)O.[OH-].[Na+].Cl>[CH3:11][S:10](=[O:12])(=[O:13])[NH:9][c:8]1[c:21]([c:16]2[cH:17][cH:18][cH:19][cH:20][c:15]2[n:14][c:1]1[c:2]3[cH:3][cH:4][cH:5][cH:6][cH:7]3)[C:22](=[O:23])O |f:3.4|

Fixed

Don't kink *-C=C=O in alignment.

[NH2:1][CH2:2][C:3]#[CH:4].[O:5]=[C:6]=[N:7][c:8]1[cH:13][cH:12][cH:11][cH:10][cH:9]1>ClCCl>[NH:1]([C:6]([NH:7][c:8]1[cH:13][cH:12][cH:11][cH:10][cH:9]1)=[O:5])[CH2:2][C:3]#[CH:4]

Fixed (via #491) no longer kinked

Text Encoding

S20060270608A1-20061130-C00042

C1=C(C=CC=C1)CC.CCCOC |m:7:2.1.5.0.4.3,Sg:n:6:n3:ht,Sg:n:9:n3′:ht|	US20060270608A1-20061130-C00042	GMAfpsh

Fixed - encoded correctly

Positional Variation

US20010056109A1-20011227-C00016

C1=CC=C2C(=C1)C(*C2)=C(C3=CN=CN3*)*.C*.CC(*)(*)*.C* |$;;;;;;;X;;;;;;;;R&#39;;R4;;R6;;;R1;;;;R8$,m:17:1.8.4.6.5.2.7.0.3,m:19:1.8.4.6.5.2.7.0.3,m:21:1.8.4.6.5.2.7.0.3,Sg:n:18:m:ht,Sg:n:25:t:ht,Sg:n:20,22,23:r:ht|	US20010056109A1-20011227-C00016

Previously missextraced, now okay.

US20160072079A1-20160310-C00027_1

C=1C=CC=C2C1NC3=C2C=CC(=C3)Br.**.**.C1=CC=C2C(=C1)C3=C(C4=C2C=CC=C4)C=CC=C3.B(**)(*)*.**.**.**>>**.**.C1=CC=C2C(=C1)C3=C(C4=C2C=CC=C4)C=CC=C3.**.**.**.C=1C=CC=C2C1NC3=C2C=CC(=C3)** |$;;;;;;;;;;;;;;;R4;;R3;;;;;;;;;;;;;;;;;;;;L2;;R9;R8;;R7;;R5;;R6;;R4;;R3;;;;;;;;;;;;;;;;;;;;R7;;R5;;R6;;;;;;;;;;;;;;L2$,f:0.1.2,3.4.5.6.7,8.9.10.11.12.13.14,m:14:0.1.2.3.4.5.6.7.8.9.10.11.12,16:0.1.2.3.4.5.6.7.8.9.10.11.12,38:18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35,41:18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35,43:18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35,45:18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35,47:75.76.77.78.79.80.81.82.83.84.85.86.87,49:75.76.77.78.79.80.81.82.83.84.85.86.87,69:51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68,71:51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68,73:51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68,89:51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68,Sg:n:15:d:ht,Sg:n:17:c:ht,Sg:n:37:s:ht,Sg:n:42:g:ht,Sg:n:44:e:ht,Sg:n:46:f:ht,Sg:n:48:d:ht,Sg:n:50:c:ht,Sg:n:70:g:ht,Sg:n:72:e:ht,Sg:n:74:f:ht,Sg:n:88:s:ht|

Fixed!

US20160072079A1-20160310-C00027_2

**.**.C1=CC=C2C(=C1)C3=C(C4=C2C=CC=C4)C=CC=C3.**.**.**.C=1C=CC=C2C1NC3=C2C=CC(=C3)**>C1=CC=C2C(=C1)*C3=C2C=CC=C3.**.***.**>**.**.C1=CC=C2C(=C1)C3=C(C4=C2C=CC=C4)C=CC=C3.**.**.**.C=1C=CC=C2C1N(C3=C2C=CC(=C3)**)**.*1C2=C(C3=CC=CC=C31)C=CC=C2.**.** |$;R4;;R3;;;;;;;;;;;;;;;;;;;;R7;;R5;;R6;;;;;;;;;;;;;;L2;;;;;;;;X;;;;;;;;R2;;L1;Y;;R1;;R4;;R3;;;;;;;;;;;;;;;;;;;;R7;;R5;;R6;;;;;;;;;;;;;;L2;;L1;;X;;;;;;;;;;;;;R2;;;R1$,f:11.12.13.14.15.16.17.18.19.20,0.1.2.3.4.5.6,7.8.9.10,m:63:91.92.93.94.95.96.97.98.99.100.101.102.103,65:91.92.93.94.95.96.97.98.99.100.101.102.103,85:67.68.69.70.71.72.73.74.75.76.77.78.79.80.81.82.83.84,87:67.68.69.70.71.72.73.74.75.76.77.78.79.80.81.82.83.84,89:67.68.69.70.71.72.73.74.75.76.77.78.79.80.81.82.83.84,105:67.68.69.70.71.72.73.74.75.76.77.78.79.80.81.82.83.84,107:108.109.110.111.112.113.114.115.116.117.118.119.120,122:108.109.110.111.112.113.114.115.116.117.118.119.120,123:108.109.110.111.112.113.114.115.116.117.118.119.120,0:28.29.30.31.32.33.34.35.36.37.38.39.40,2:28.29.30.31.32.33.34.35.36.37.38.39.40,22:4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21,24:4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21,26:4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21,42:4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21,56:43.44.45.46.47.48.49.50.51.52.53.54.55,58:43.44.45.46.47.48.49.50.51.52.53.54.55,61:43.44.45.46.47.48.49.50.51.52.53.54.55,Sg:n:64:d:ht,Sg:n:66:c:ht,Sg:n:86:g:ht,Sg:n:88:e:ht,Sg:n:90:f:ht,Sg:n:104:s:ht,Sg:n:106:r:ht,Sg:n:121:b:ht,Sg:n:124:a:ht,Sg:n:1:d:ht,Sg:n:3:c:ht,Sg:n:23:g:ht,Sg:n:25:e:ht,Sg:n:27:f:ht,Sg:n:41:s:ht,Sg:n:57:b:ht,Sg:n:59:r:ht,Sg:n:62:a:ht|

Fixed!

Javadoc for IsotopeFactory.getMajorIsotope(int) doesn't match implementation

The Javadocs for IsotopeFactory give the definition of major isotope as

Returns the most abundant (major) isotope with a given atomic number.
The isotope's abundance is for atoms with atomic number 60 and smaller defined as a number that is proportional to the 100 of the most abundant isotope. For atoms with higher atomic numbers, the abundance is defined as a percentage.

However looking at the implementation it appears that the distinction between elements with atomic number > or < 60 is not made, and major isotope is simply the most abundant isotope.

Should the Javadocs be updated? Or am I missing something?

.NET port of CDK

Anyone here interested to have a .NET port of CDK, please check this.
It would help the community as a whole if we could help the Author of NCDK to sort out any issues to keep up with the CDK development

LongestAliphaticChainDescriptor doesn't work

Personal communication from Nichola McCann

I am relatively new to using the cdk, but I have a question about the descriptor generation. When I calculate the Longest Aliphatic Chain (nAtomLAC) for ethanol, I get different answers, depending on how I enter the molecule. If I use CCO, I get a value of 2, but if I use OCC (the canonical version from cdk), I get 0.
I have attached below the R code that I used to generate these results – could you please explain the discrepancy?

    @Test
    public void debug() throws ClassNotFoundException, CDKException, java.lang.Exception {
        SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
        IAtomContainer mol1 = sp.parseSmiles("CCO");
        IAtomContainer mol2 = sp.parseSmiles("OCC");
        System.out.println(descriptor.calculate(mol1).getValue());
        System.out.println(descriptor.calculate(mol2).getValue());
    }

Match results when doing substructure search

When doing substructure search, I got sometimes incorrect results.
For example, I tried to look if I can find just C-F substructure in the following molecule: FC=1C(F)=C2C(=C(F)C1S)C(F)(F)C(F)(F)C2(F)F

Obviously there are 9 match cases and indeed we can find 9 match cases but the method UniversalIsomorphismTester:: getSubgraphAtomsMaps returns a list of 9 empty RMap.

InChIGeneratorFactory.getInChIToStructure ignores deuterium atoms

I try to create a String with a molecular formula of a molecule from a InChI String. The InChI contains some deuterium atoms which are somehow ignored in the conversion process. As far as I can find out, the deuterium don't make it into the IAtomContainer. Example code is given below:

String inchi = "InChI=1S/C22H32O2/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22(23)24/h3-4,6-7,9-10,12-13,15-16,18-19H,2,5,8,11,14,17,20-21H2,1H3,(H,23,24)/b4-3-,7-6-,10-9-,13-12-,16-15-,19-18-/i1D3,2D2";
InChIToStructure intostruct = InChIGeneratorFactory.getInstance().getInChIToStructure(inchi, DefaultChemObjectBuilder.getInstance());
INCHI_RET ret = intostruct.getReturnStatus();
if (ret == INCHI_RET.WARNING) {
	// Structure generated, but with warning message
	System.out.println("InChI warning: " + intostruct.getMessage());
}
else if (ret != INCHI_RET.OKAY) {
	// Structure generation failed
	System.out.println("Can not parse INCHI string. Structure generation failed: " + ret.toString() + " [" + intostruct.getMessage() + "]");
	System.exit(0); 
}
IAtomContainer m = intostruct.getAtomContainer();
String formula_inchi = MolecularFormulaManipulator.getString(MolecularFormulaManipulator.getMolecularFormula(m));
System.out.println(formula_inchi);

The expected output would be either of C22H32O2 or C22H27D5O2 or C22H27[2H]5O2but it is C22H27O2.

The compound can be seen here: https://pubchem.ncbi.nlm.nih.gov/compound/24778483#section=Top

Any suggestions?

Private Constructor InChiGeneratorFactory() does not exist ...

... but is called by public static InChIGeneratorFactory getInstance(). Which of course causes a compile error.
Weirdly, that only popped up when I used the getInstanceMethod() in my code which forced Eclipse to compile the module. It does not appear when doing a mvn install of CDK, so presumably this code is neither build nor tested?!

some isotopes missing in SMILES generation

The following Jython code gives two examples where round-tripping a SMILES causes it to lose some isotope information:

>>> from org.openscience.cdk.smiles import SmilesParser
>>> from org.openscience.cdk.silent import SilentChemObjectBuilder
>>> from org.openscience.cdk.smiles import SmilesGenerator
>>> sp = SmilesParser(SilentChemObjectBuilder.getInstance())
>>> sg = SmilesGenerator.isomeric()
>>> mol = sp.parseSmiles("[13CH3][12CH3]")
>>> sg.create(mol)
u'[13CH3]C'
>>> mol = sp.parseSmiles("[13CH3][16OH]")
>>> sg.create(mol)
u'[13CH3]O'

I expected those create() to return a SMILES with the isotopically labelled 12C and 16O, respectively.

My guess is that if the atomic mass is close to the natural abundance then the isotopic SMILES generator omits the isotope.

MurckoFragmenter - improve algorithm

It seems that murckoFragmenter do a lot of stuff while fragmenting molecules so the runtime is very high.
Example compound where I quit the fragmenter after 17 hours of fragmentation is:
CHEMBL529226

Update:
Created a Testset with ~67000 Molecules from Chembl_22.1 with AtomCount from 50 to 150 Atoms.
CDK-Fragmenter needs ~3 hours to create the Fragments
PypelinePilot Fragmenter needs ~15 Minutes
DataWarrior Fragmenter needs ~3 Minutes

Some additional Information:
I analysed the run with VisualVM and saw that few Threads took a long time to process a Molecule.
I created a ThreadDump and it seems that the function AtomContainer.getConnectedAtomLists() is in every Thread who stuck the last call of a Method.

Would be nice if there will be an improvement of the fragmenter.

mvn test with strange behavior

Hi

I compiled CDK from release 2.0 tarball and got an error during mvn test:

[INFO] Reactor Summary:
[INFO]
[INFO] cdk ................................................ SUCCESS [  0.028 s]
[INFO] cdk-base ........................................... SUCCESS [  0.001 s]
[INFO] cdk-interfaces ..................................... SUCCESS [  4.322 s]
[INFO] cdk-core ........................................... SUCCESS [  0.025 s]
[INFO] cdk-standard ....................................... SUCCESS [  0.013 s]
[INFO] cdk-atomtype ....................................... SUCCESS [  0.008 s]
[INFO] cdk-valencycheck ................................... SUCCESS [  0.007 s]
[INFO] cdk-test ........................................... SUCCESS [  2.749 s]
[INFO] cdk-misc ........................................... SUCCESS [  0.001 s]
[INFO] cdk-diff ........................................... SUCCESS [  1.332 s]
[INFO] cdk-test-interfaces ................................ SUCCESS [  1.604 s]
[INFO] cdk-data ........................................... SUCCESS [  2.088 s]
[INFO] cdk-testdata ....................................... SUCCESS [  0.109 s]
[INFO] cdk-storage ........................................ SUCCESS [  0.000 s]
[INFO] cdk-ioformats ...................................... SUCCESS [  1.245 s]
[INFO] cdk-silent ......................................... SUCCESS [  1.741 s]
[INFO] cdk-isomorphism .................................... SUCCESS [  1.494 s]
[INFO] cdk-datadebug ...................................... SUCCESS [  2.168 s]
[INFO] cdk-io ............................................. SUCCESS [  3.098 s]
[INFO] cdk-tool ........................................... SUCCESS [  0.000 s]
[INFO] cdk-formula ........................................ SUCCESS [  2.184 s]
[INFO] cdk-dict ........................................... SUCCESS [  0.920 s]
[INFO] cdk-pdb ............................................ SUCCESS [  1.621 s]
[INFO] cdk-libiocml ....................................... SUCCESS [  2.204 s]
[INFO] cdk-smiles ......................................... SUCCESS [  2.190 s]
[INFO] cdk-extra .......................................... SUCCESS [  0.026 s]
[INFO] cdk-log4j .......................................... SUCCESS [  0.764 s]
[INFO] cdk-smarts ......................................... SUCCESS [  2.267 s]
[INFO] cdk-reaction ....................................... SUCCESS [  2.640 s]
[INFO] cdk-test-standard .................................. SUCCESS [  3.462 s]
[INFO] cdk-descriptor ..................................... SUCCESS [  0.001 s]
[INFO] cdk-fingerprint .................................... SUCCESS [  3.398 s]
[INFO] cdk-cip ............................................ SUCCESS [  1.276 s]
[INFO] cdk-signature ...................................... SUCCESS [  1.452 s]
[INFO] cdk-charges ........................................ SUCCESS [  1.351 s]
[INFO] cdk-forcefield ..................................... SUCCESS [  2.998 s]
[INFO] cdk-sdg ............................................ SUCCESS [  2.012 s]
[INFO] cdk-builder3dtools ................................. SUCCESS [  0.018 s]
[INFO] cdk-builder3d ...................................... SUCCESS [  2.981 s]
[INFO] cdk-qsar ........................................... SUCCESS [  1.040 s]
[INFO] cdk-ionpot ......................................... SUCCESS [  0.020 s]
[INFO] cdk-display ........................................ SUCCESS [  0.000 s]
[INFO] cdk-render ......................................... SUCCESS [  1.001 s]
[INFO] cdk-inchi .......................................... FAILURE [  1.462 s]
[INFO] cdk-qsaratomic ..................................... SKIPPED
[INFO] cdk-qsarbond ....................................... SKIPPED
[INFO] cdk-hash ........................................... SKIPPED
[INFO] cdk-fragment ....................................... SKIPPED
[INFO] cdk-model .......................................... SKIPPED
[INFO] cdk-qsarmolecular .................................. SKIPPED
[INFO] cdk-qsarcml ........................................ SKIPPED
[INFO] cdk-qsarionpot ..................................... SKIPPED
[INFO] cdk-qsarprotein .................................... SKIPPED
[INFO] cdk-renderbasic .................................... SKIPPED
[INFO] cdk-renderawt ...................................... SKIPPED
[INFO] cdk-renderextra .................................... SKIPPED
[INFO] cdk-control ........................................ SKIPPED
[INFO] cdk-qm ............................................. SKIPPED
[INFO] cdk-iordf .......................................... SKIPPED
[INFO] cdk-libiomd ........................................ SKIPPED
[INFO] cdk-pdbcml ......................................... SKIPPED
[INFO] cdk-group .......................................... SKIPPED
[INFO] cdk-pcore .......................................... SKIPPED
[INFO] cdk-smsd ........................................... SKIPPED
[INFO] cdk-test-core ...................................... SKIPPED
[INFO] cdk-structgen ...................................... SKIPPED
[INFO] cdk-tautomer ....................................... SKIPPED
[INFO] cdk-legacy ......................................... SKIPPED
[INFO] cdk-app ............................................ SKIPPED
[INFO] cdk-depict ......................................... SKIPPED
[INFO] cdk-bundle ......................................... SKIPPED
[INFO] cdk-annotation ..................................... SKIPPED
[INFO] cdk-test-atomtype .................................. SKIPPED
[INFO] cdk-test-valencycheck .............................. SKIPPED
[INFO] cdk-qsarsubstance .................................. SKIPPED
[INFO] cdk-test-extra ..................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

Then I immediately run mvn test -rf :cdk-inchi and now failed and skipped tests run successfully.

[INFO] Reactor Summary:
[INFO]
[INFO] cdk-inchi .......................................... SUCCESS [  3.414 s]
[INFO] cdk-qsaratomic ..................................... SUCCESS [  2.028 s]
[INFO] cdk-qsarbond ....................................... SUCCESS [  1.255 s]
[INFO] cdk-hash ........................................... SUCCESS [  2.316 s]
[INFO] cdk-fragment ....................................... SUCCESS [  1.359 s]
[INFO] cdk-model .......................................... SUCCESS [  1.367 s]
[INFO] cdk-qsarmolecular .................................. SUCCESS [  3.322 s]
[INFO] cdk-qsarcml ........................................ SUCCESS [  1.609 s]
[INFO] cdk-qsarionpot ..................................... SUCCESS [  0.018 s]
[INFO] cdk-qsarprotein .................................... SUCCESS [  1.423 s]
[INFO] cdk-renderbasic .................................... SUCCESS [  1.730 s]
[INFO] cdk-renderawt ...................................... SUCCESS [  0.830 s]
[INFO] cdk-renderextra .................................... SUCCESS [  1.063 s]
[INFO] cdk-control ........................................ SUCCESS [  0.009 s]
[INFO] cdk-qm ............................................. SUCCESS [  0.007 s]
[INFO] cdk-iordf .......................................... SUCCESS [  1.383 s]
[INFO] cdk-libiomd ........................................ SUCCESS [  1.438 s]
[INFO] cdk-pdbcml ......................................... SUCCESS [  2.038 s]
[INFO] cdk-group .......................................... SUCCESS [  1.311 s]
[INFO] cdk-pcore .......................................... SUCCESS [  1.459 s]
[INFO] cdk-smsd ........................................... SUCCESS [  0.010 s]
[INFO] cdk-test-core ...................................... SUCCESS [  3.285 s]
[INFO] cdk-structgen ...................................... SUCCESS [  1.280 s]
[INFO] cdk-tautomer ....................................... SUCCESS [  1.386 s]
[INFO] cdk-legacy ......................................... SUCCESS [  6.318 s]
[INFO] cdk-app ............................................ SUCCESS [  0.001 s]
[INFO] cdk-depict ......................................... SUCCESS [  1.040 s]
[INFO] cdk-bundle ......................................... SUCCESS [  0.021 s]
[INFO] cdk-annotation ..................................... SUCCESS [  0.053 s]
[INFO] cdk-test-atomtype .................................. SUCCESS [  1.193 s]
[INFO] cdk-test-valencycheck .............................. SUCCESS [  1.456 s]
[INFO] cdk-qsarsubstance .................................. SUCCESS [  0.756 s]
[INFO] cdk-test-extra ..................................... SUCCESS [  1.991 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS

Is it kind of expected?
Regards

Java Oracle 1.8.0_121, Maven 3.3.9, Linux x86_64 2.6.32

Insufficient chi descriptor patterns?

Hi.

Kier & Hall molecular connectivity index definition on this (http://www.esi.umontreal.ca/accelrys/life/cerius46/qsar/theory_descriptors.html#285013) page says:

 Given a connected subgraph G:

(i) If G contains a cycle it is of type CH (chain).

Otherwise:

(ii) If all vertex valencies of G (valencies with respect to G, not the entire graph) are either greater than 2 or equal to 1, G is of type C (cluster).

Otherwise:

(iii) If all vertex valencies (as in (ii)) are either equal to 2 or 1, G is of type P (path).

Otherwise:

(iv) G is of type PC (Path/Cluster). 

So, I think order6 patterns of ChiCluster shouldn't contain "C1(C)C(C)C1(C)" (I think it is order6 chain),
and order6 patterns of ChiPathCluster should contain "CC(C)(CC)CC". Do I understand correctly?

I'll send PR if my understand is correct.

Thanks.

Some of the molecule generate smiles exceptions

CDK version is 2.0

private final static SmilesGenerator smiGen = SmilesGenerator.absolute();
String smiles2 = smiGen.create(molecule); // IAtomContainer molecule

Exception is :
Caused by: org.openscience.cdk.exception.CDKException: An InChI could not be generated and used to canonise SMILES: null
at org.openscience.cdk.smiles.SmilesGenerator.inchiNumbers(SmilesGenerator.java:503)
at org.openscience.cdk.smiles.SmilesGenerator.labels(SmilesGenerator.java:471)
at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:375)
at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:325)
... 125 common frames omitted
Caused by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.openscience.cdk.smiles.SmilesGenerator.inchiNumbers(SmilesGenerator.java:496)
... 129 common frames omitted
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.openscience.cdk.graph.invariant.InChINumbersTools.parseUSmilesNumbers(InChINumbersTools.java:145)
at org.openscience.cdk.graph.invariant.InChINumbersTools.getUSmilesNumbers(InChINumbersTools.java:73)
... 134 common frames omitted

I have debug the inchi code

**org.openscience.cdk.graph.invariant.InChiNumbersTools.java

parseUSmilesNumbers method,**

       ```

if ((index = aux.indexOf("/F:")) >= 0) {
String[] fixedHNumbers = aux.substring(index + 3, aux.indexOf('/', index + 3)).split(";");
for (int i = 0; i < fixedHNumbers.length; i++) {

                String component = fixedHNumbers[i];

                // m, 2m, 3m ... need to lookup number in the base numbering
                if (component.charAt(component.length() - 1) == 'm') {
                    int n = component.length() > 1 ? Integer
                            .parseInt(component.substring(0, component.length() - 1)) : 1;
                    for (int j = 0; j < n; j++) {
                        String[] numbering = baseNumbers[i + j].split(",");
                        first[i + j] = Integer.parseInt(numbering[0]) - 1;
                        for (String number : numbering)
                            numbers[Integer.parseInt(number) - 1] = label++;
                    }
                } else {
                    String[] numbering = component.split(",");
                    first[i] = Integer.parseInt(numbering[0]) - 1;    // error line 
                    for (String number : numbering)
                        numbers[Integer.parseInt(number) - 1] = label++;
                }

_**When parse the AuxInfo the error happened.**    
I think how to set the size of the **first** array can not find the basis. 
**baseNumbers[i + j]**  **first[i + j]**   may be not consider the size of the array._

The molecule is : 
  CDK     0520151716

  9  6  0  0  0  0  0  0  0  0999 V2000
    3.3660    1.0000    0.0000 Si  0  0  0  0  0  0  0  0  0  0  0  0
    4.2320    1.5000    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    2.5000    0.5000    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    2.5000    1.5000    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    3.3660    0.0000    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    3.3660    2.0000    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    4.2320    0.5000    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    2.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.1160    4.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  1  4  1  0  0  0  0
  1  5  1  0  0  0  0
  1  6  1  0  0  0  0
  1  7  1  0  0  0  0
M  CHG  1   1  -2
M  CHG  1   8   1
M  CHG  1   9   1
M  END

**The correct Smiles is : F[Si-2](F)(F)(F)(F)F.[H+].[H+]**






Atoms may be invisible depending on background

In the RendererModel it is possible to set the background color. If this background color happens to be the same as an element color the respective elements become invisible. Ideally if the element color is the same as the background color there should be a frame round the element or so. Of course if the color is slightly off this might still be a problem and ideally some "color similarity" measure would be needed.

Two different SMILES representations of the same molecule give different results when calculating some atom descriptors

Hello everyone,

first of all, thanks for developing CDK. It's nice to have a cheminformatics framework available for Java too!

I was calculating some atom descriptors in CDK (v2.0) and I noticed some dodgy results when using different forms of SMILES. For example, this code:

IAtomicDescriptor eatmpol = new EffectiveAtomPolarizabilityDescriptor();

IAtomContainer benzene = parseSMILES("c1ccccc1");
for (IAtom atm : benzene.atoms()) {
    System.out.println(eatmpol.calculate(atm, benzene).getValue().toString());
}

System.out.println();

IAtomContainer benzene_kek = parseSMILES("C1=CC=CC=C1");
for (IAtom atm : benzene_kek.atoms()) {
     System.out.println(eatmpol.calculate(atm, benzene_kek).getValue().toString());
}

generates the following output:

6.243375000000002
6.243375
6.243375
6.243375
6.243375
6.243375000000002

6.838375000000001
6.838375000000001
6.838375000000001
6.838375000000001
6.838375000000001
6.838375000000001

I think there is something wrong with the perception of bond types because the difference between the two representations is that in the first case the double bonds are marked aromatic while in the latter case they are not.

I think the following descriptors might also be affected:

PartialSigmaChargeDescriptor
PiElectronegativityDescriptor
SigmaElectronegativityDescriptor

Does anyone have an idea what might be happening here? How should I prepare my structures to get consistent results? Thanks!

Problem with Inchi generator

When we parse a SDF file to build a structure and next calculate inchi, the inchi generator fails when the sdf file contains bond description aromatic (bond block with bond type 4)
The error message is : org.openscience.cdk.exception.CDKException: Failed to generate InChI: Unsupported bond type

atom map classes not included in canonical SMILES ordering

I can ask the canonical SMILES generation to include the atom map class in the output. However, the map class values are not included in the canonicalization ordering. This means I cannot canonicalize a SMILES string which differ only by the atom class. Here's an example using Jython:

>>> from org.openscience.cdk.smiles import SmilesParser
>>> from org.openscience.cdk.silent import SilentChemObjectBuilder
>>> from org.openscience.cdk.smiles import SmilesGenerator
>>> sp = SmilesParser(SilentChemObjectBuilder.getInstance())
>>> sg = SmilesGenerator.unique().withAtomClasses()
>>> mol = sp.parseSmiles("[CH3:1][CH3:2]")
>>> sg.create(mol)
u'[CH3:1][CH3:2]'
>>> mol = sp.parseSmiles("[CH3:2][CH3:1]")
>>> sg.create(mol)
u'[CH3:2][CH3:1]'

I expected the two create() calls to be identical because 1) the equivalent SetFlavor() for OEChem uses the flavor parameters to specify the invariants, 2) RDKit includes the atom class as part of the canonicalization, if present, and 3) the CDK documentation shows that some flavors (like SmiFlavor.Stereo | SmiFlavor.AtomicMass) affect the canonical ordering, and doesn't say which flavors don't affect the ordering; I assumed that all of the ones which could affect the ordering, do affect the ordering.

(As expected, I get the same output when I used ".absolute()" instead of ".unique()".)

A frequent work-around for the lack of canonical atom maps is to overload the isotope number. However, issue #273 means that if I rewrite "[C:16]" to "[16C]" then I risk losing the 16.

If the code is meant to work this way, then my alternative suggestion is to modify the documentation slightly to distinguish between the flavors that may affect the canonicalization ordering and those which only affect the output symbols.

NPE in atom typer causing failing unit tests

NPEs like:

java.lang.NullPointerException: null
at org.openscience.cdk.atomtype.CDKAtomTypeMatcher.isAcceptable(CDKAtomTypeMatcher.java:2516)
at org.openscience.cdk.atomtype.CDKAtomTypeMatcher.perceiveCarbons(CDKAtomTypeMatcher.java:480)
at org.openscience.cdk.atomtype.CDKAtomTypeMatcher.findMatchingAtomType(CDKAtomTypeMatcher.java:131)
at org.openscience.cdk.atomtype.CDKAtomTypeMatcher.findMatchingAtomType(CDKAtomTypeMatcher.java:122)
at org.openscience.cdk.atomtype.ReactionStructuresTest.testM10(ReactionStructuresTest.java:334)

See https://jenkins.bigcat.unimaas.nl/job/cdk/lastCompletedBuild/testReport/

map.put("Br",#) twise in MMElementRule

MMElementRule.getWisley_500() contains the following lines.
map.put("Br", 4); map.put("Br", 8);
Apparently, one of two should be wrong.
I suppose latter "Br" be corrected to "Si" but am not sure.

Remove some major isotopes

Here are the current "major" isotopes as defined in CDK, I cleaned this up last year such that when there was no information (e.g. all 0 abundance) then it would not select a major one (e.g. Es used to be defined). Actually other higher mass elements should really not have a major definition. Here is the list, could you indicate your thoughts @schymane?

H	1
He	4
Li	7
Be	9
B	11
C	12
N	14
O	16
F	19
Ne	20
Na	23
Mg	24
Al	27
Si	28
P	31
S	32
Cl	35
Ar	40
K	39
Ca	40
Sc	45
Ti	48
V	51
Cr	52
Mn	55
Fe	56
Co	59
Ni	58
Cu	63
Zn	64
Ga	69
Ge	74
As	75
Se	80
Br	79
Kr	84
Rb	85
Sr	88
Y	89
Zr	90
Nb	93
Mo	98
Tc	NONE
Ru	102
Rh	103
Pd	106
Ag	107
Cd	114
In	115
Sn	120
Sb	121
Te	130
I	127
Xe	132
Cs	133
Ba	138
La	139
Ce	140
Pr	141
Nd	142
Pm	NONE
Sm	152
Eu	153
Gd	158
Tb	159
Dy	164
Ho	165
Er	166
Tm	169
Yb	174
Lu	175
Hf	180
Ta	181
W	184
Re	187
Os	192
Ir	193
Pt	195
Au	197
Hg	202
Tl	205
Pb	208
Bi	209
Po	NONE
At	NONE
Rn	NONE
Fr	NONE
Ra	NONE
Ac	NONE
Th	232
Pa	231
U	238
Np	NONE
Pu	NONE
Am	NONE
Cm	NONE
Bk	NONE
Cf	NONE
Es	NONE
Fm	NONE
Md	NONE
No	NONE
Lr	NONE
Rf	NONE
Db	NONE
Sg	NONE
Bh	NONE
Hs	NONE
Mt	NONE
Ds	NONE
Rg	NONE
Cn	NONE
Uut	NONE
Nh	NONE
Fl	NONE
Uup	NONE
Mc	NONE
Lv	NONE
Uus	NONE
Ts	NONE
Uuo	NONE
Og	NONE

How to compile a small java program using cdk from the command line?

Hello,

javac Test.java

gives me:

Test.java:5: error: package org.openscience.cdk.io.iterator does not exist
import org.openscience.cdk.io.iterator.IteratingSMILESReader;

I am looking for a simple command line. I would prefer not to have to start a whole project with eclipse/netbeans/...
Adding -cp /usr/share/java/cdk-io.jar doesn't solve the problem.

Hardcoded use of java.awt.Color

A few classes inside org.openscience.cdk.renderer.elements are using java.awt.Color objects in order to represent their color.

This does become an issue if one is working outside of AWT (SWT for example). Therefore I'd suggest the introduction of a custom Color class that can be used to represent colors independently from the used toolkit.
One could the supply a new interface IColorFactory<T> that transforms the internal color representation into the actual color object as used by the respective toolkit. As a default one could also include an AWTColorFactory that implements IColorFactory<java.awt.Color> and can be used in all places the java.awt.Color object is currently used directly.

If one considers the constant creating of new color objects one could think about buffering the created object. That would best fit inside the AWTColorFactory as a Map<Color,java.awt.color> (where Color is the internal color representation).

Suppress drawing of wiggly bonds in CDK 2.0

First of all I would like to thank you for your awesome work on the CDK!

The issue arises after upgrading from CDK 1.5.10 to 2.0: a molecule is rendered with "wiggly" bonds which was not rendered like that before:

wiggly_lines

I guess it might have something to do with StereoElements, since the wiggly bonds may represent unknown stereo chemistry?

I've tried resetting the StereoElements like so:
molecule.setStereoElements(new ArrayList<>());

Unfortunately, this did not resolve my problem.

Is there a way I can suppress this novel behaviour?

trouble visualizing this SMILES

The following SMILES does not seem to visualize nicely with CDKDepict:

C1=CC=C(C=C1)C#CC2=CC=CC=C2

See https://www.wikidata.org/wiki/Q902100 and http://cdkdepict-openchem.rhcloud.com/depict/bow/svg?smi=C1=CC=C(C=C1)C#CC2=CC=CC=C2&zoom=2.0&annotate=none

I get:

image

However, pasting the SMILES in the main CDKDepict page makes it show up fine... so, it seems one of the options...

Also, with CDK 1.5.13 in Bioclipse it does show up properly too:

ui.open(cdk.fromSMILES("C1=CC=C(C=C1)C#CC2=CC=CC=C2"))

I get:

image

StructureDiagramGenerator ignores setBondLength for ring bonds

Setting bond length for StructureDiagramGenerator has no affect on the bond length of ring bonds. A minimal demonstration:

SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance());
IAtomContainer m = null;
m = sp.parseSmiles("CCCCCC1CCCCC1");
StructureDiagramGenerator sdg = new StructureDiagramGenerator();
sdg.setBondLength(28);
sdg.setMolecule(m);
sdg.generateCoordinates();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
MDLV2000Writer writer = new MDLV2000Writer(
baos
);
writer.write((IAtomContainer)sdg.getMolecule());
System.out.println(new String(baos.toByteArray()));
writer.close();

This results in a molecule with very long acyclic bonds and tiny (relatively) ring bonds.

SMILES issue (Sulphur 7 valence)

We have an issue reported that RdKit gives error, when parsing this SMILES

BrC1=CC=C([SH](O)(=O)=CC=2OC(C(=O)N3CCOCC3)=CC2)C=C1
# Explicit valence for atom # 5 S, 7, is greater than permitted

but CDK (via AMBIT ) parses it fine (along with others)

Even the atom types are assigned.

		String smi = "BrC1=CC=C([SH](O)(=O)=CC=2OC(C(=O)N3CCOCC3)=CC2)C=C1";
		SmilesParser parser = new SmilesParser(
				SilentChemObjectBuilder.getInstance());
	
		IAtomContainer mol = parser.parseSmiles(smi);
		AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol);
		for (IAtom a : mol.atoms()) {
			System.out.println(a.getAtomTypeName());
		}

The S atom type is S.sp3d1.

Do you consider the SMILES is valid ?

Current CIF reader fails to read uni cell parameters and overreads atoms

The current CIFReader reads unit cell constants incorrectly (returns all zeros) and overreads the _atom_site loop, yielding incorrect atoms and eventually a NullPointerException.

How to reproduce:

  • compile cdkcifrd.java from the attached cdkcifrd.zip;
    cdkcifrd.zip
  • read a CIF from the COD (http://crystallography.net/cod/1502693.cif); a copy is provided.
    On Ubuntu-16.04 under bash the following commands demonstrate the issue (assuming the cdk clone is build in ../cdk):
saulius@koala smiles-scripts/ $ javac -cp ../cdk/bundle/target/cdk-2.2-SNAPSHOT.jar cdkcifrd.java
  • with standard latest CDK snapshot, cell parameters are all 0.0, there are too many atoms reported and the code triggers assertion for null pointer:
saulius@koala smiles-scripts/ $ java -ea -cp ../cdk/bundle/target/cdk-2.2-SNAPSHOT.jar:. cdkcifrd tests/inputs/1502693.cif | head
269
tests/inputs/1502693.cif; CELL = (0.0, 0.0, 0.0) (0.0, 0.0, 0.0) (0.0, 0.0, 0.0)
Exception in thread "main" java.lang.AssertionError: f is null at atom No 87
	at cdkcifrd.computeOrthogonalCoordinates(cdkcifrd.java:101)
	at cdkcifrd.main(cdkcifrd.java:70)
  • with a patched code (will offer a PR shortly) the file is read correctly:
saulius@koala smiles-scripts/ $ java -ea -cp jars/cdk-2.2-SNAPSHOT.jar:. cdkcifrd tests/inputs/1502693.cif | head
86
tests/inputs/1502693.cif; CELL = (21.33, 0.0, 0.0) (4.759283623186401E-16, 7.7725, 0.0) (-4.2317488823956815, 1.6591361069663173E-15, 22.4689741064505)
C     16.6686    8.77671   -3.86916
C     17.0263    9.47468   -5.15663
H     17.6069    8.89563   -5.69139
H     17.4937    10.3110   -4.95441
H     16.2077    9.67132   -5.65769
C     17.6021    5.62729   -2.16601
C     18.5319    4.51038   -2.52776
H     19.2362    4.84926   -3.11869

local regression for formula

I get this local error with the latest version when running the tests locally with "mvn test":

[INFO] Running org.openscience.cdk.tools.manipulator.MolecularFormulaManipulatorTest
[ERROR] Tests run: 85, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.421 s <<< FAILURE! - in org.openscience.cdk.tools.manipulator.MolecularFormulaManipulatorTest
[ERROR] testMassNumberDisplay(org.openscience.cdk.tools.manipulator.MolecularFormulaManipulatorTest)  Time elapsed: 0.007 s  <<< FAILURE!
java.lang.AssertionError: 

Expected: is "C7H3[81Br]BrO3"
     but: was "C7H3Br[81Br]O3"
        at org.openscience.cdk.tools.manipulator.MolecularFormulaManipulatorTest.testMassNumberDisplay(MolecularFormulaManipulatorTest.java:1339)

CDK unique smiles flavour generates identical smiles for some non-identical cyclobutadiens

I was looking at whether RandomGenerator covers all of constitutional isomer space that it is supposed to do (it seems to do) and used CDK unique smiles as a hash. In this test code, RandomGenerator found only 4308 of the 4315 compounds that Molgen (http://www.molgen.de/literature.html) predicted.

Turned out that the reason was that the CDK unique smiles implementation generates identical ‘unique’ smiles for non-identical molecules. They are all mesomeric cyclobutadiens. See attached graphics and SDF file.

Both RDKIT and Openbabel generate the right number 4315 non-identical unique smiles.

pastedgraphic-1

[C10H16-isomers-with-degenerate-smiles.sdf.zip](https://github.com/cdk/cdk/files/1792790/C10H16-isomers-with-degenerate-smiles.sdf.zip)

Generate NonStandard InChI

Via Nina:

Using InChIGenerator with non-standard options generates standard InChI (starting with 1S/ ) with 1.5.14. The same code generated non-standard InChI in previous versions (1.5.13 and before)

List<INCHI_OPTION> options = new ArrayList<INCHI_OPTION>();
options.add(INCHI_OPTION.FixedH);
options.add(INCHI_OPTION.SAbs);
options.add(INCHI_OPTION.SAsXYZ);
options.add(INCHI_OPTION.SPXYZ);
options.add(INCHI_OPTION.FixSp3Bug);
options.add(INCHI_OPTION.AuxNone);
InChIGeneratorFactory factory = InChIGeneratorFactory.getInstance();
InChIGenerator gen1 = factory.getInChIGenerator(mol1, options);

Suggestion

Hello, I’m very interested in this, but quite new in programming and/or chemoinformatics.
I believe We can find in here something that will help me.
So, I have an in-house database with several SMILES structures in a CSV file, together with a lot of other information including activity/toxicity simulated data. Do CDK has capabilities to calculate the structural similarity between those SMILES and run it into Cytoscape for visualization?

Thank you

can't generate an absolute SMILES with "*" in it

I have been investigating how CDK uses InChI to generate Universal SMILES. InChI doesn't handle all of SMILES. For example, it doesn't handle atoms with an atomic number of 0 ("*"), which means the following bit of Jython code fails:

from org.openscience.cdk.smiles import SmilesParser
from org.openscience.cdk.silent import SilentChemObjectBuilder
sp = SmilesParser(SilentChemObjectBuilder.getInstance())
mol = sp.parseSmiles("*c1ccccc1")
from org.openscience.cdk.smiles import SmilesGenerator
sg = SmilesGenerator.absolute()
sg.create(mol)

It gives the exception:

org.openscience.cdk.exception.CDKException: org.openscience.cdk.exception.CDKException: An InChI could not be generated and used to canonise SMILES: null

Stereocenters.of causes NullPointerException for some ChEBI molecules

Method Stereocenters.of causes NullPointerException for some ChEBI molecules (e.g., CHEBI:2762). To reproduce the issue, you can use the following code:

import org.openscience.cdk.io.MDLV2000Reader;
import org.openscience.cdk.silent.AtomContainer;
import org.openscience.cdk.stereo.Stereocenters;

public class Test
{
    public static void main(String[] args) throws Exception
    {
        URL url = new URL("http://www.ebi.ac.uk/chebi/saveStructure.do?" +
                "sdf=true&chebiId=2762");
        MDLV2000Reader reader = new MDLV2000Reader(url.openStream());
        AtomContainer molecule = reader.read(new AtomContainer());
        Stereocenters.of(molecule);
    }
}

Different fingerprints for same structure

In some cases, for two same structures, we get 2 different fingerprints.

I got this problem with class CircularFingerprinter using CLASS_ECFP4. I didn't check if the problem occurs also with other classes or other fingerprint type.

To see a working example, I enclose a SDF file with two same structures that give two different fingerprints.
test.sdf.txt

Fix Macrocycle Layout Collissions

The new macrocycle layout handles many cases gracefully - however sometimes the old code will do a better job:

Test Case:
C1CC2=CC=C(CCC3=CC=C1C=C3)C=C2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.