Giter VIP home page Giter VIP logo

cgrtools's People

Contributors

dependabot[bot] avatar neon-monster avatar pandylandy avatar phrolycheva avatar salikhovi4 avatar stsouko avatar tagirshin avatar timurgimadiev avatar valiaafo avatar zarinaibr avatar

Stargazers

 avatar

Watchers

 avatar

cgrtools's Issues

Add Pyrylium cation to rules for thiele

I have found that pyrylium cations and similar cycles cannot be converted from thiel to kekule form (raises InvalidAromaticRing exception). However, these are definitely aromatic compounds and this example molecule is not wrong.

Is it possible to add a new rule for them?

pyr
Снимок экрана 2021-02-12 в 11 11 23

CGR decomposition

A cgr decomposition function changes charge of the atom and add hydrogens to it.
[Al] -> [AlH3]

Canonical tautomer definition

Regarding the tautomerize function, what defines the canonical form? Is this similar to rdkit's empirical scoring scheme or otherwise?

XYZ parser

Add XYZ parser and may be some intermediate parser like one that can work with JSON or dictionary representation, for the case of self-made parsers:
my awesome parser(make dict or JSON) --> intermediate parser(parse redefined fields or I give correspondence of each filed teach in some dict) --> CGR objects

Query isomorphism unexepected behaviour

I wanted to create a query from SMILES, so I decided to use molecule.substructure(as_query=True). Then I needed to use isomorphism, but it didn't work as I expected:

Снимок экрана 2021-05-19 в 09 14 35

Next, I decided to delete all hybridization and neighbors marks, but it didn't change anything:

Снимок экрана 2021-05-19 в 09 09 16

Finally, when I constructed query myself, it worked as supposed:

Снимок экрана 2021-05-19 в 09 12 12

So, I don't understand if it is supposed to be like that or it's a bug.

Notebook to reproduce this experiment:
query_bug.ipynb.zip

IDname of molecules

Проще показать, что хочу. Хочу чтобы была ID-шка молекулы

ethanol

9 8 0 0 0 0 999 V2000
-1.4732 -4.4786 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.7587 -4.0661 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0443 -4.4786 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-1.8857 -3.7641 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.1877 -4.8911 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-1.0607 -5.1930 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-1.1461 -3.3376 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.3714 -3.3376 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.0443 -5.3036 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
1 4 1 0 0 0 0
1 5 1 0 0 0 0
1 6 1 0 0 0 0
2 7 1 0 0 0 0
2 8 1 0 0 0 0
3 9 1 0 0 0 0
M END
$$$$

Divide construction of SMILES and SMILES file reading

Is it possible to divide SMILESRead in two functions, one for reading SMILES file and convertion from SMILES to graph? Since graph has it is own function to convert into SMILES str(graph) -> SMILES, it could be more practical to have inverse function like to_graph(SMILES) -> graph. However now you should always import StringIO and write something like this:

with StringIO(smi) as f, SMILESRead(f) as m:
    mol = next(m)

[Errno 2] No such file or directory: 'molecules.dat'

I'm new to CGRtools and when I follow the tutorial given on the documentation it shows me this error [Errno 2] No such file or directory: 'molecules.dat'. I don't know what is causing this problem. Your help will be really appreciated. Thanks.

SDF parser

try to parse 2000 and 3000 when errors found

Depict

cgr_new_R svg

Bad positioning of mapping.
Mapping font size, color; dynobj font size , everything not editable

RXN block from RDF

Return RXN block from big RDF file for visualization and search of RXN block with errors and for transfer RXN blocks to other libraries like RDkit, IDNIGO, etc...

RDFwrite and SDFwrite problems

There is a problem with writing of SDF and RDF files if reaction contain any atom with charge +4. For example Ti+4 with 4 anions in one molecule.

t_decomposed = preparer.decompose(rc.compose())

how to generate ReactionContainer object?
I found the reactions.dat file is supplied,but i want to construct ReactionContainer by myself
the code are:

from CGRtools import CGRpreparer # import of CGRpreparer
from CGRtools.containers import ReactionContainer,MoleculeContainer
from CGRtools.utils.rdkit import from_rdkit_molecule
from rdkit import Chem

def MoleculeContainer_from_smiles(smiles):
m = Chem.MolFromSmiles(smiles)
return from_rdkit_molecule(m)

r1 = MoleculeContainer_from_smiles(r1_smiles)
r2 = MoleculeContainer_from_smiles(r2_smiles)
r3 = MoleculeContainer_from_smiles(r3_smiles)
p1 = MoleculeContainer_from_smiles(p1_smiles)

rc = ReactionContainer(reactants=[r1,r2,r3], products=[p1])
type(rc)

preparer = CGRpreparer()
t_decomposed = preparer.decompose(rc.compose())

and i found errors as follow:
KeyError Traceback (most recent call last)
~/fengjiaxin/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/CGRtools/cache.py in wrapper(self)
46 try:
---> 47 return self.dict[name]
48 except KeyError:

KeyError: '_cached_method_compose'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in
1 preparer = CGRpreparer()
----> 2 t_decomposed = preparer.decompose(rc.compose())

~/fengjiaxin/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/CGRtools/cache.py in wrapper(self)
47 return self.dict[name]
48 except KeyError:
---> 49 value = self.dict[name] = func(self)
50 return value
51 return wrapper

~/fengjiaxin/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/CGRtools/containers/reaction.py in compose(self)
227 if not all(isinstance(x, (MoleculeContainer, CGRContainer)) for x in rr):
228 raise TypeError('Queries not composable')
--> 229 r = reduce(or_, rr)
230 else:
231 r = MoleculeContainer()

~/fengjiaxin/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/CGRtools/algorithms/union.py in or(self, other)
24 G | H is union of graphs
25 """
---> 26 return self.union(other)
27
28 def union(self, other):

~/fengjiaxin/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/CGRtools/algorithms/union.py in union(self, other)
30 raise TypeError('BaseContainer subclass expected')
31 if self._node.keys() & set(other):
---> 32 raise KeyError('mapping of graphs is not disjoint')
33
34 # dynamic container resolving

KeyError: 'mapping of graphs is not disjoint'

am i wrong with construct the ReactionContainer
if i was wrong,can you supply the right code to construct the ReactionContainer
Thanks very much

does it have the function - add atom map id for the reaction?

does it have the function - add atom map id for the reaction?
input reaction smiles:

rxnsmi='O.O=C(COCc1ccccc1)N1CCCc2sc(-c3ccc(OC4CC(N5CCCCC5)C4)cc3)nc21>>O=C(CO)N1CCCc2sc(-c3ccc(OC4CC(N5CCCCC5)C4)cc3)nc21'
addmap(rxnsmi)

output:

'[O:1]=[C:2]([N:3]1[C:4]=2[N:5]=[C:6]([S:7][C:8]2[CH2:9][CH2:10][CH2:11]1)[C:12]3=[CH:13][CH:14]=[C:15]([O:16][CH:17]4[CH2:18][CH:19]([N:20]5[CH2:21][CH2:22][CH2:23][CH2:24][CH2:25]5)[CH2:26]4)[CH:27]=[CH:28]3)[CH2:29][O:30][CH2:31][C:32]6=[CH:33][CH:34]=[CH:35][CH:36]=[CH:37]6.[OH2:38]>>[O:1]=[C:2]([N:3]1[C:4]=2[N:5]=[C:6]([S:7][C:8]2[CH2:9][CH2:10][CH2:11]1)[C:12]3=[CH:13][CH:14]=[C:15]([O:16][CH:17]4[CH2:18][CH:19]([N:20]5[CH2:21][CH2:22][CH2:23][CH2:24][CH2:25]5)[CH2:26]4)[CH:27]=[CH:28]3)[CH2:29][OH:30]'

Errors in pytest (no module lazy_object_proxy)

Hello!

I have installed CDRtools as described in the READMe. But then I run tests:
pytest --pyargs CGRtools, I have the error: ModuleNotFoundError: No module named 'lazy_object_proxy'.

This problem is easily solved by additional installation of this library: pip install lazy_object_proxy

Maybe it is better to add this library depending on the CGRtools? Or maybe add additional command (pip install lazy_object_proxy) to the README?

Output SDF file is invalid and unreadable

The issue:

When I write a molecule container to an mol sdf file with SDFWrite then try opening this file in any molecular editor whatsoever(Accelrys draw, rdkit lib, etc) the editor will either error or return a blank canvas.

The code and output:

``` m = CreateMoleculeFromString(message.content) CGR = m.compose(m) with SDFWrite('molecule.mol') as w: w.write(CGR) ```

which creates a file



 10 10  0  0  0  0            999 V2000
    1.8347    1.7649    0.0000 C   0  0  0  0  0  0  0  0  0  1  0  0
    0.8175    2.0469    0.0000 C   0  0  0  0  0  0  0  0  0  2  0  0
    0.0000    1.3778    0.0000 C   0  0  0  0  0  0  0  0  0  3  0  0
    0.4512    0.4166    0.0000 C   0  0  0  0  0  0  0  0  0  4  0  0
    1.2496    0.0220    0.0000 C   0  0  0  0  0  0  0  0  0  5  0  0
    1.8111    0.7009    0.0000 C   0  0  0  0  0  0  0  0  0  6  0  0
    1.3813   -0.6984    0.0000 C   0  0  0  0  0  0  0  0  0  7  0  0
    2.0824   -1.1896    0.0000 C   0  0  0  0  0  0  0  0  0  8  0  0
    2.1004   -1.9993    0.0000 C   0  0  0  0  0  0  0  0  0  9  0  0
    2.8603   -2.4418    0.0000 C   0  0  0  0  0  0  0  0  0 10  0  0
  1  2  2  0  0  0  0
  1  6  2  0  0  0  0
  2  3  2  0  0  0  0
  3  4  2  0  0  0  0
  4  5  2  0  0  0  0
  5  6  2  0  0  0  0
  5  7  1  0  0  0  0
  7  8  1  0  0  0  0
  8  9  1  0  0  0  0
  9 10  1  0  0  0  0
M  END
$$$$

which im unable to open.

Conclusion

Am I using SDFWrite wrong? Am I using .compose wrong? Any help would be very useful. Thanks for your time and sorry if this doesnt fit the issue guidelines for this project. I would like to add that the molecule container seems valid because I can run operations on it, such as printing it to an svg image.

SMILESRead failing to return most SMILES

On delimited data, having issues with SMILESRead i.e. upon calling read(), only a fraction of the results are returned. However, manually iterating over the same with smiles() generally returns them all. While I can use smiles() as a workaround, it would be great to use SMILESRead for parsing the other columns in as metadata.

See below example (using python 3.8.10 and CGRtools 4.1.20):

# This is an excerpt of 1976_Sep2016_USPTOgrants_smiles.rsmi
example_text = """ReactionSmiles	PatentNumber	ParagraphNum	Year	TextMinedYield	CalculatedYield
[Br:1][CH2:2][CH2:3][OH:4].[CH2:5]([S:7](Cl)(=[O:9])=[O:8])[CH3:6].CCOCC>C(N(CC)CC)C>[CH2:5]([S:7]([O:4][CH2:3][CH2:2][Br:1])(=[O:9])=[O:8])[CH3:6]	US03930836		1976		
[Br:1][CH2:2][CH2:3][CH2:4][OH:5].[CH3:6][S:7](Cl)(=[O:9])=[O:8].CCOCC>C(N(CC)CC)C>[CH3:6][S:7]([O:5][CH2:4][CH2:3][CH2:2][Br:1])(=[O:9])=[O:8]	US03930836		1976		
[CH2:1]([Cl:4])[CH2:2][OH:3].CCOCC.[CH2:10]([S:14](Cl)(=[O:16])=[O:15])[CH:11]([CH3:13])[CH3:12]>C(N(CC)CC)C>[CH2:10]([S:14]([O:3][CH2:2][CH2:1][Cl:4])(=[O:16])=[O:15])[CH:11]([CH3:13])[CH3:12]	US03930836		1976		
[Br:1][CH2:2][CH2:3][OH:4].[CH2:5]([S:7](Cl)(=[O:9])=[O:8])[CH3:6].CCOCC>C(N(CC)CC)C>[CH2:5]([S:7]([O:4][CH2:3][CH2:2][Br:1])(=[O:9])=[O:8])[CH3:6]	US03930839		1976		
[Br:1][CH2:2][CH2:3][CH2:4][OH:5].[CH3:6][S:7](Cl)(=[O:9])=[O:8].CCOCC>C(N(CC)CC)C>[CH3:6][S:7]([O:5][CH2:4][CH2:3][CH2:2][Br:1])(=[O:9])=[O:8]	US03930839		1976		
[CH2:1]([Cl:4])[CH2:2][OH:3].CCOCC.[CH2:10]([S:14](Cl)(=[O:16])=[O:15])[CH:11]([CH3:13])[CH3:12]>C(N(CC)CC)C>[CH2:10]([S:14]([O:3][CH2:2][CH2:1][Cl:4])(=[O:16])=[O:15])[CH:11]([CH3:13])[CH3:12]	US03930839		1976		
[Cl:1][C:2]1[N:3]=[CH:4][C:5]2[C:10]([CH:11]=1)=[C:9]([N+:12]([O-])=O)[CH:8]=[CH:7][CH:6]=2.O.[OH-].[Na+]>C(O)(=O)C.[Fe]>[Cl:1][C:2]1[N:3]=[CH:4][C:5]2[C:10]([CH:11]=1)=[C:9]([NH2:12])[CH:8]=[CH:7][CH:6]=2 |f:2.3|	US03930837		1976		
[CH3:1][C:2]1[N+:3]([O-])=[CH:4][C:5]2[C:10]([CH:11]=1)=[C:9]([N+:12]([O-:14])=[O:13])[CH:8]=[CH:7][CH:6]=2.P(Cl)(Cl)([Cl:18])=O>>[Cl:18][C:4]1[C:5]2[C:10](=[C:9]([N+:12]([O-:14])=[O:13])[CH:8]=[CH:7][CH:6]=2)[CH:11]=[C:2]([CH3:1])[N:3]=1	US03930837		1976		
[CH3:1][C:2]1[N:3]=[CH:4][C:5]2[C:10]([CH:11]=1)=[C:9]([N+:12]([O-:14])=[O:13])[CH:8]=[CH:7][CH:6]=2.[ClH:15]>>[ClH:15].[CH3:1][C:2]1[N:3]=[CH:4][C:5]2[C:10]([CH:11]=1)=[C:9]([N+:12]([O-:14])=[O:13])[CH:8]=[CH:7][CH:6]=2 |f:2.3|	US03930837		1976		
CC1N=CC2C(C=1)=C([N+]([O-])=O)C=CC=2.[Cl:15][C:16]1[C:25]2[C:20](=[CH:21][CH:22]=[CH:23][CH:24]=2)[CH:19]=[CH:18][N:17]=1>>[ClH:15].[Cl:15][C:16]1[C:25]2[C:20](=[CH:21][CH:22]=[CH:23][CH:24]=2)[CH:19]=[CH:18][N:17]=1 |f:2.3|	US03930837		1976		
CC1N=CC2C(C=1)=C([N+]([O-])=O)C=CC=2.[Cl:15][C:16]1[CH:25]=[CH:24][C:23]([N+:26]([O-:28])=[O:27])=[C:22]2[C:17]=1[CH:18]=[CH:19][N:20]=[CH:21]2.Cl.CC1N=CC2C(C=1)=C([N+]([O-])=O)C=CC=2.[IH:44]>>[IH:44].[Cl:15][C:16]1[CH:25]=[CH:24][C:23]([N+:26]([O-:28])=[O:27])=[C:22]2[C:17]=1[CH:18]=[CH:19][N:20]=[CH:21]2 |f:2.3,5.6|	US03930837		1976		
[N+:1]([C:4]1[CH:13]=[CH:12][CH:11]=[C:10]2[C:5]=1[CH:6]=[CH:7][N:8]=[CH:9]2)([O-:3])=[O:2].[BrH:14]>C(O)C>[BrH:14].[N+:1]([C:4]1[CH:13]=[CH:12][CH:11]=[C:10]2[C:5]=1[CH:6]=[CH:7][N:8]=[CH:9]2)([O-:3])=[O:2] |f:3.4|	US03930837		1976		
[N+](C1C=CC=C2C=1C=CN=C2)([O-])=O.[CH3:14][C:15]1[C:24]2[C:19](=[CH:20][CH:21]=[CH:22][CH:23]=2)[CH:18]=[CH:17][N:16]=1.Br.[Cl:26][C:27]1[C:32]([OH:33])=[C:31]([Cl:34])[C:30]([Cl:35])=[C:29]([Cl:36])[C:28]=1[Cl:37]>>[Cl:26][C:27]1[C:32]([O-:33])=[C:31]([Cl:34])[C:30]([Cl:35])=[C:29]([Cl:36])[C:28]=1[Cl:37].[CH3:14][C:15]1[C:24]2[C:19](=[CH:20][CH:21]=[CH:22][CH:23]=2)[CH:18]=[CH:17][NH+:16]=1 |f:4.5|	US03930837		1976		
[N+:1]([C:4]1[CH:13]=[CH:12][CH:11]=[C:10]2[C:5]=1[CH:6]=[CH:7][N:8]=[CH:9]2)([O-])=O.NC1C=CC=C2C=1C=CN=C2.Br.[IH:26]>>[IH:26].[IH:26].[NH2:1][C:4]1[CH:13]=[CH:12][CH:11]=[C:10]2[C:5]=1[CH:6]=[CH:7][N:8]=[CH:9]2 |f:4.5.6|	US03930837		1976		
Cl.[OH:2][C@@H:3]([CH2:21][CH2:22][CH2:23][CH2:24][CH3:25])[CH:4]=[CH:5][CH:6]1[CH:10]=[CH:9][C:8](=[O:11])[CH:7]1[CH2:12][CH:13]=[CH:14][CH2:15][CH2:16][CH2:17][C:18]([OH:20])=[O:19]>C(O)C>[OH:2][C@@H:3]([CH2:21][CH2:22][CH2:23][CH2:24][CH3:25])[CH:4]=[CH:5][CH:6]1[CH2:10][CH2:9][C:8](=[O:11])[CH:7]1[CH2:12][CH:13]=[CH:14][CH2:15][CH2:16][CH2:17][C:18]([OH:20])=[O:19]	US03930952		1976		
CC(O[CH2:5][C:6]1[CH2:28][S:27][C@@H:9]2[C@H:10]([NH:13]C(C(OC(C)=O)C3C=CC=CC=3)=O)[C:11](=[O:12])[N:8]2[C:7]=1[C:29]([OH:31])=[O:30])=O>O>[CH3:5][C:6]1[CH2:28][S:27][C@@H:9]2[C@H:10]([NH2:13])[C:11](=[O:12])[N:8]2[C:7]=1[C:29]([OH:31])=[O:30]	US03930949		1976		
[S:1]([O-:5])([O-:4])(=[O:3])=[O:2].[NH4+:6].[NH4+]>O>[S:1](=[O:3])(=[O:2])([OH:5])[O-:4].[NH4+:6].[S:1]([O-:5])([O-:4])(=[O:3])=[O:2].[NH4+:6].[NH4+:6] |f:0.1.2,4.5,6.7.8|	US03930988		1976		
CO[C:3]1[CH:4]=[C:5]([C:9]2([CH2:12][C:13]([Cl:16])([Cl:15])[Cl:14])[CH2:11][O:10]2)[CH:6]=[CH:7][CH:8]=1.ClC1C=C(C2(CC(Cl)(Cl)Cl)CO2)C=CC=1.FC1C=C(C2(CC(Cl)(Cl)Cl)CO2)C=CC=1.ClC1C=C(C2(CC(Cl)(Cl)Cl)CO2)C=CC=1Cl.C(OC1C=C(C2(CC(Cl)(Cl)Cl)CO2)C=CC=1)C.C(OC1C=C(C2(CC(Cl)(Cl)Cl)CO2)C=CC=1)C1C=CC=CC=1.ClC1C=CC(C2(CC(Cl)(Cl)Cl)CO2)=CC=1.[Br:117]C1C=CC(C2(CC(Cl)(Cl)Cl)CO2)=CC=1>>[Br:117][C:3]1[CH:4]=[C:5]([C:9]2([CH2:12][C:13]([Cl:16])([Cl:15])[Cl:14])[CH2:11][O:10]2)[CH:6]=[CH:7][CH:8]=1	US03930835		1976		
[C:1]1(O)[CH:6]=[CH:5][CH:4]=[CH:3][CH:2]=1.[CH2:8]=[O:9].[S:10]([O-:13])([O-:12])=[O:11].[Na+:14].[Na+]>O>[OH:9][CH:8]([S:10]([O-:13])(=[O:12])=[O:11])[C:1]1[CH:6]=[CH:5][CH:4]=[CH:3][CH:2]=1.[Na+:14] |f:2.3.4,6.7|	US03931083		1976		
[CH3:1][O:2][C:3]1[C:4]([C:13]([OH:15])=O)=[CH:5][C:6]2[C:11]([CH:12]=1)=[CH:10][CH:9]=[CH:8][CH:7]=2.S(Cl)([Cl:18])=O>C1C=CC=CC=1>[CH3:1][O:2][C:3]1[C:4]([C:13]([Cl:18])=[O:15])=[CH:5][C:6]2[C:11]([CH:12]=1)=[CH:10][CH:9]=[CH:8][CH:7]=2	US03931103		1976		
[C:1]1([O:7]C(Cl)=O)[CH:6]=[CH:5][CH:4]=[CH:3][CH:2]=1.C(Cl)Cl.[OH2:14].[OH-].[Na+].C(N([CH2:22][CH3:23])CC)C>>[CH:2]1[CH:3]=[C:22]([CH2:23][C:2]2[C:1]([OH:7])=[CH:6][CH:5]=[CH:4][CH:3]=2)[C:5]([OH:14])=[CH:6][CH:1]=1 |f:3.4|	US03931108		1976		
[CH3:1][C:2]1[C:3](=[CH:7][C:8](=[CH:12][CH:13]=1)[N:9]=[C:10]=[O:11])N=C=O.[NH2:14][C:15]([O:17]CC)=O>>[CH2:2]1[CH:3]([CH2:1][CH:2]2[CH2:13][CH2:12][CH:8]([N:9]=[C:10]=[O:11])[CH2:7][CH2:3]2)[CH2:7][CH2:8][CH:12]([N:14]=[C:15]=[O:17])[CH2:13]1	US03931113		1976		
C1CC[CH:4]([N:7]=C=[N:7][CH:4]2CCC[CH2:2][CH2:3]2)[CH2:3][CH2:2]1.[N:16]1([C:24]([O:26][CH2:27][C:28]2[CH:33]=[CH:32][CH:31]=[CH:30][CH:29]=2)=[O:25])[CH2:23][CH2:22][CH2:21][C@H:17]1[C:18]([OH:20])=[O:19].C1C=CC2N(O)N=NC=2C=1.C(N)CC>O1CCCC1>[N:16]1([C:24]([O:26][CH2:27][C:28]2[CH:29]=[CH:30][CH:31]=[CH:32][CH:33]=2)=[O:25])[CH2:23][CH2:22][CH2:21][C@H:17]1[C:18]([OH:20])=[O:19].[CH2:4]([NH-:7])[CH2:3][CH3:2] |f:5.6|	US03931139		1976		
[N:1]1([C:9]([O:11][CH2:12][C:13]2[CH:18]=[CH:17][CH:16]=[CH:15][CH:14]=2)=[O:10])[CH2:8][CH2:7][CH2:6][C@H:2]1[C:3]([OH:5])=[O:4].C(OC(Cl)=O)C.[CH2:25]([NH2:31])[CH2:26][CH2:27][CH2:28][CH2:29][CH3:30]>O1CCCC1>[N:1]1([C:9]([O:11][CH2:12][C:13]2[CH:14]=[CH:15][CH:16]=[CH:17][CH:18]=2)=[O:10])[CH2:8][CH2:7][CH2:6][C@H:2]1[C:3]([OH:5])=[O:4].[CH2:25]([NH-:31])[CH2:26][CH2:27][CH2:28][CH2:29][CH3:30] |f:4.5|	US03931139		1976		
[IH:1].CS[C:4]1[NH:5][CH2:6][CH2:7][CH2:8][CH2:9][N:10]=1.C(O)C.O.[NH2:15][NH2:16]>CCOCC>[IH:1].[NH:15]([C:4]1[NH:5][CH2:6][CH2:7][CH2:8][CH2:9][N:10]=1)[NH2:16] |f:0.1,3.4,6.7|	US03931152		1976		
C1C(=O)N([Br:8])C(=O)C1.[CH3:9][N:10]1[C:16]2[CH:17]=[CH:18][CH:19]=[CH:20][C:15]=2[C:14](=[O:21])[CH2:13][C:12]2[CH:22]=[CH:23][CH:24]=[CH:25][C:11]1=2>CN(C)C=O>[Br:8][C:19]1[CH:18]=[CH:17][C:16]2[N:10]([CH3:9])[C:11]3[CH:25]=[CH:24][CH:23]=[CH:22][C:12]=3[CH2:13][C:14](=[O:21])[C:15]=2[CH:20]=1	US03931151		1976		100.5%
[Br:1][C:2]1[CH:18]=[CH:17][C:5]2[N:6]([CH3:16])[C:7]3[CH:15]=[CH:14][CH:13]=[CH:12][C:8]=3[CH2:9][C:10](=[O:11])[C:4]=2[CH:3]=1.[CH2:19](O)[CH3:20].C([O-])([O-])OCC.C1(C)C=CC(S(O)(=O)=O)=CC=1>C(N(CC)CC)C>[Br:1][C:2]1[CH:18]=[CH:17][C:5]2[N:6]([CH3:16])[C:7]3[CH:15]=[CH:14][CH:13]=[CH:12][C:8]=3[CH:9]=[C:10]([O:11][CH2:19][CH3:20])[C:4]=2[CH:3]=1	US03931151		1976		
[CH2:1]([S:3][C:4]1[CH:26]=[CH:25][C:7]2[N:8]([CH3:24])[C:9]3[CH:23]=[CH:22][CH:21]=[CH:20][C:10]=3[CH2:11][C:12](O)([CH2:13][C:14]([O:16][CH2:17][CH3:18])=[O:15])[C:6]=2[CH:5]=1)[CH3:2].Cl>C(O)C>[CH2:1]([S:3][C:4]1[CH:26]=[CH:25][C:7]2[N:8]([CH3:24])[C:9]3[CH:23]=[CH:22][CH:21]=[CH:20][C:10]=3[CH2:11][C:12](=[CH:13][C:14]([O:16][CH2:17][CH3:18])=[O:15])[C:6]=2[CH:5]=1)[CH3:2]	US03931151		1976		82.0%
[CH2:1]([S:3][C:4]1[CH:25]=[CH:24][C:7]2[N:8]([CH3:23])[C:9]3[CH:22]=[CH:21][CH:20]=[CH:19][C:10]=3[CH2:11][C:12](=[CH:13][C:14]([O:16]CC)=[O:15])[C:6]=2[CH:5]=1)[CH3:2].[OH-].[K+].Cl>C(O)C>[CH2:1]([S:3][C:4]1[CH:25]=[CH:24][C:7]2[N:8]([CH3:23])[C:9]3[CH:22]=[CH:21][CH:20]=[CH:19][C:10]=3[CH:11]=[C:12]([CH2:13][C:14]([OH:16])=[O:15])[C:6]=2[CH:5]=1)[CH3:2] |f:1.2|	US03931151		1976		78.1%
[CH2:1]([S:3][C:4]1[CH:23]=[CH:22][C:7]2[N:8]([CH3:21])[C:9]3[CH:20]=[CH:19][CH:18]=[CH:17][C:10]=3[CH:11]=C(CC(O)=O)[C:6]=2[CH:5]=1)[CH3:2].[CH:24]1[C:29]([N+:30]([O-:32])=[O:31])=[CH:28][CH:27]=[C:26]([OH:33])[CH:25]=1.[CH:34]1(N=C=NC2CCCCC2)CCCCC1.[C:49](OCC)(=[O:51])[CH3:50]>>[CH2:1]([S:3][C:4]1[CH:23]=[CH:22][C:7]2[N:8]([CH3:21])[C:9]3[CH:20]=[CH:19][CH:18]=[CH:17][C:10]=3[CH:11]=[C:50]([C:49]([O:33][C:26]3[CH:27]=[CH:28][C:29]([N+:30]([O-:32])=[O:31])=[CH:24][CH:25]=3)=[O:51])[C:6]=2[C:5]=1[CH3:34])[CH3:2]	US03931151		1976		
"""

from CGRtools.files import *
from CGRtools import smiles

# Setup example
fname = "first_30_USPTOgrants.rsmi"
f = open(fname, "a")
f.write(example_text)
f.close()

# Try SMILESRead
smi_reader = SMILESRead(fname, header=True)
reader_result = smi_reader.read()

# 7 SMILES retrieved
print(len(reader_result))
for smi in reader_result:
    print(smi)

# Read line-by-line with smiles, skip header
f = open(fname, "r")
lines = f.readlines()
smiles_result = []
for line in lines[1:]:
    smi = line.split("\t")[0]
    parsed_smi = smiles(smi)
    smiles_result.append(parsed_smi)

f.close()

# All 30 SMILES retrieved
print(len(smiles_result))
for smi in smiles_result:
    print(smi)

sssr filter

rewrite contour search to edges from nodes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.