Giter VIP home page Giter VIP logo

Comments (20)

johnmay avatar johnmay commented on August 28, 2024

What do you think is wrong with the first one?

C*.CC1=CC=CC=C1 |c:4,6,t:2,m:1:4.5.6|

from depict.

schymane avatar schymane commented on August 28, 2024

Nothing - it's the only one in the set that works :-)
If I get rid of the lp manually all look good. I also don't see the need for a lone pair definition for these structures...
C*.CC1=CC=CC=C1 |c:4,6,t:2,m:1:4.5.6|
C*.S=C1NC2=CC=CC=C2N1 |c:6,8,t:4,m:1:8.9|
CCC1=CC=CC=C1.Br*.Br* |c:4,6,t:2,m:9:3.4,11:4.5.6.7|
Cl*.Cl*.ClCC1=CC=CC=C1 |c:6,8,t:4,m:1:7.8,3:8.9.10.11|
*C=O.C1CC2C3CCC(C3)C2C1 |m:0:3.4.5.6.7.8.9.10.11.12|

image

from depict.

schymane avatar schymane commented on August 28, 2024

Looking much better now - here's one that's totally broken once removing the lp part:

Original:
[H]CC(C[H])C1=CC=C(C=C1)S(O)(=O)=O |c:7,9,t:5,lp:12:2,13:2,14:2,Sg:n:1:x:ht,Sg:n:3:y:ht| DTXCID701284951

Hand edited:
[H]CC(C[H])C1=CC=C(C=C1)S(O)(=O)=O |c:7,9,t:5,Sg:n:1:x:ht,Sg:n:3:y:ht| DTXCID701284951

image

from depict.

schymane avatar schymane commented on August 28, 2024

...and easily fixed by turning Hs on
image

from depict.

schymane avatar schymane commented on August 28, 2024

So ... getting there ... here's what it looks like in development (trying out a few different options) vs Depict now - I'm personally not a big fan of this third representation, but these are the only examples that still don't look quite the same (ignoring the shading for the moment - although in the 3rd case the missing shading means loss of information).
Apart from the depiction, what are your thoughts on the best representation(s)? Obviously this sometimes depends on the definition, or lack there of, of the substances involved and these three examples are not representing exactly the same thing (as you can see from the names in the first screenshot).
My ideal aim would be something we can depict properly and expand in R to use our mass spec workflows - i.e. to generate valid SMILES from structures stored (like these examples) in the Dashboard that we can then manipulate in (r)cdk.

image

image

Some SMILES:

[H]CC(C[H])C1=CC=C(C=C1)S(O)(=O)=O |c:7,9,t:5,Sg:n:1:x:ht,Sg:n:3:y:ht| DTXCID701284951
OS(=O)(=O)C1=CC=CC=C1.** |$;;;;;;;;;;Alkyl_p;$,c:6,8,t:4,m:11:7.8.9| DTXCID301079750           
CCCCCCCCCC.OS(=O)(=O)C1=CC=C(*)C=C1 |c:18,t:13,15,m:18:1.2.3.4| DTXCID001079751           
CCCCCCCCCCC.OS(=O)(=O)C1=CC=C(*)C=C1 |c:19,t:14,16,m:19:5.6.7.8.9| DTXCID701079752           
CCCCCCCCCCCC.OS(=O)(=O)C1=CC=C(*)C=C1 |c:20,t:15,17,m:20:1.2.3.4.5| DTXCID401079753           
CCCCCCC(O)=O.OS(=O)(=O)C1=CC=C(*)C=C1 |c:17,t:12,14,m:17:1.2.3.4.5| DTXCID101079754           

from depict.

johnmay avatar johnmay commented on August 28, 2024
  • Have patched the lp: for you so these are now ignored (see cdk/cdk#410). The reason we ignore lp: and c: t: etc is because they don't change the meaning and you run in to issues with canonical labelling/registration. For example in pyrrole you would want these to have the same canonical identifier:
N1C=CC=C1
N1C=CC=C1 |c:2,3|
N1C=CC=C1 |c:2,3,lp:0:1|
N1C=CC=C1 |lp:0:1|
  • The hydrogen case is easy to address, essentially as you guess the hydrogen gets removed and so the brackets can't be drawn raising an exception. Will add a patch for that.

? still don't look quite the same - that's not the goal, my goal was to get them looking like they do in patents - all the information is captured and so can always be rendered manually if desired. I don't like the shading as it's not clear when the shaded regions overlap:

image
CC1=CC=CC=C1.N*.O*.Cl* |c:3,5,t:1,m:8:4.3.2,10:3.4.5,12:2.3.4.5|

CAS have a better semantic depiction but again in the same we don't draw the electrons the goal is not to exactly draw what is semantically captured underneath.

  1. CDK draws these small brackets for single atom link nodes, if you look through patents this is very common. It's a simple boolean switch to draw the larger (and IMO uglier) brackets in these cases. If statement is here: StandardSgroupGenerator:L531
  2. Attachment is ortho/meta/para and therefore can go anywhere valence is okay on this ring.
  3. Yeah there is no good way to draw this, I think even IUPAC admits this, the CAS may be okay here. I'll agree the ChemAxon depiction is clearer here.

from depict.

schymane avatar schymane commented on August 28, 2024

Thanks! That's awesome. I actually much prefer your depiction for case 1 and 2, they are simpler and cleaner. Case 3 is the tricky one, like you say :-) [and good point about the shading].
Case 3 could be captured (for my purposes) using Case 1 and being able to define the range of x and y. This is all I need and structure-wise enables you to collapse down the long chains and get a much clearer depiction.

Is there a way to store and display that range properly? So far my non-ideal workaround has been via the title field...

[H]CC(C[H])C1=CC=C(C=C1)S(O)(=O)=O |c:7,9,t:5,Sg:n:1:x:ht,Sg:n:3:y:ht| DTXCID701284951 (x+y=1-10)
image

from depict.

johnmay avatar johnmay commented on August 28, 2024

Not in CDK's handling and I don't think so in CXSMILES also, you can do constraints on Rgroups (R group logic) but not sure about the repeat variation. Completely makes sense though and you'd want it for a general Markush match.

from depict.

schymane avatar schymane commented on August 28, 2024

This one is an interesting corner case ...
NC1=CC=C(O)C2=C1C(=O)C1=C(O)C=CC(N)=C1C2=O.Br* |c:6,11,14,17,t:1,3,lp:0:1,5:2,9:2,12:2,16:1,19:2,20:3,m:21:13.14| DTXCID301079550
image
(are there any plans to be able to adjust orientation if desired at some point, in general?)

from depict.

johnmay avatar johnmay commented on August 28, 2024

Let's remove the junk first:

NC1=CC=C(O)C2=C1C(=O)C1=C(O)C=CC(N)=C1C2=O.Br* |m:21:13.14|

when it crossed a single bond it doesn't seem to do the check for which side to put it:

NC1=CC=C(O)C2=C1C(=O)C1=C(O)C=CC(N)=C1C2=O.Br* |m:21:13.14.15|

from depict.

schymane avatar schymane commented on August 28, 2024

from depict.

johnmay avatar johnmay commented on August 28, 2024

CDK issue

from depict.

johnmay avatar johnmay commented on August 28, 2024

I believe I've come up with a fix for the sided-ness of the positional variation. The algorithm currently looks at the atoms where it can go and compute the centre from that. When there is a single bond there is no sided-ness and so it's arbitrary. You can see the 'center of mass' marked in the following examples with an 'X'. I used the existing APIs to add the colors so we can see where it's attaching.

|13.14|

image

|13.14.15|

image

|13.14.15.16|

image

from depict.

schymane avatar schymane commented on August 28, 2024

I'm adding comments to this as we have a new case:
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=PCBs
[*]C1=C([*])C([*])=C(C([*])=C1[*])C1=C([*])C([*])=C([*])C([*])=C1[*] |$_R1;;;_R1;;_R1;;;_R1;;_R1;;;_R1;;_R1;;_R1;;_R1;;_R1$,c:1,5,8,12,20,t:16,RG:_R1={Cl* |$;_AP1$,lp:0:2|},LOG={_R1:;H;>0}|
image

which does not work in CDK Depict. I am also not yet a big fan of this style of representing the problem.
This works as an alternative:
image

Cl*.Cl*.c1ccccc1-c1ccccc1 |m:1:4.5.6.7.8.9,3:10.11.12.13.14.15,Sg:n:2:x:ht,Sg:n:0:y:ht| polychlorinated biphenyls, x+y>1

What are your thoughts/suggestions? THanks!

from depict.

johnmay avatar johnmay commented on August 28, 2024

Yes R groups are not supported. I have some internal code at NextMove that does handle them but the issue is the CDK has an explicit type (RGroupQuery) for handling this. Trying to convert the ChemAxon definitions into this is tricky as the concepts don't match exactly (RGroupQuery doesn't store attachments explicitly for example) - Hence for my internal code I just store them on a property of a molecule.

from depict.

schymane avatar schymane commented on August 28, 2024

Here is a new case that doesn't seem to display optimally ... an undefined location within a repeater unit:
[H]OCCO.C* |lp:1:2,4:2,m:6:3.2,Sg:n:5,1,2,3::ht|
image

http://www.simolecule.com/cdkdepict/depict/bow/svg?smi=[H]OCCO.C*%20%7Clp%3A1%3A2%2C4%3A2%2Cm%3A6%3A3.2%2CSg%3An%3A5%2C1%2C2%2C3%3A%3Aht%7C&abbr=on&hdisp=bridgehead&showtitle=false&zoom=1.6&annotate=none

Pic from ChemAxon (source of extended SMILES):
ppgn_chemaxon

I'm not yet sure I'm a fan of the ChemAxon representation ... but we don't currently see another option.

from depict.

schymane avatar schymane commented on August 28, 2024

[H]OCCO.C* |m:6:3.2,Sg:n:1,2,3::ht| looks better

from depict.

johnmay avatar johnmay commented on August 28, 2024

I think this is possibly a bug in the ChemAxon export. Since the positional variation is part of the repeat then the whole thing needs to be included in the repeat, not just one of the atoms. Of course this depends how you represent the attachment, but since it is present as real node in the SMILES * it's logical it should be in the repeat - so the Sg:n should include atom indices 1-6.

[H]OCCO.C* |m:6:3.2,Sg:n:1,2,3,5,6::ht|

I was talking to Greg about this at the ICCS, CXSMILES really is poorly designed. It can't for example differentiate spiro vs linear repeats because it only stores the atoms and not the crossing bonds:

image

from depict.

schymane avatar schymane commented on August 28, 2024

Agree re: design ... Chris and I were swapping some last night that enumerate correctly according to the technical definitions but are visually incomprehensible.
OS(=O)(=O)c1ccc2c(c1)C(CCC2CC)CC |Sg:n:14:m:ht,Sg:n:16:n:ht| C6-C10DATS; n+m=0-4
image

vs (n and ms may not match exactly, don't have the ChemAxom CxSMILES for this yet)

image

from depict.

johnmay avatar johnmay commented on August 28, 2024

As far as I can tell there is one still broken - will open a separate issue for that.

from depict.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.