Giter VIP home page Giter VIP logo

padelpy's People

Contributors

dinabandhu50 avatar fanwangm avatar jacksonburns avatar maclandrol avatar mehr-licht avatar pikakolendo02 avatar sebastiandro avatar tjkessler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

padelpy's Issues

how to calculate individual descriptors; supplying descriptors.xml file

Hi, I have run into a situation where a single 2d descriptor fails which makes padelpy hang / timeout. I proved this by opening the java version and removing the descriptor which makes the calculation run fine. So i would like to specify which descriptors to calculate in padelpy. I think this should be achieved by passing the padeldescriptor(descriptortype='\path\to\descriptortypes') argument which points to the descriptors.xml file (originally found in the site-packages\padelpy\PaDEL-Descriptor location). I tried modifying this file turning all the descriptors to false except 1 or 2 trying to get it to calculation just those descriptors. i have attached the xml file as txt. However even with this it just calculates all 2d descriptors for a test mol file, so i'm not doing something correctly or there is a bug in how this information is passed.

Any help would be appreciated.

descriptors.txt

Padelpy broken on Python 3.12

On Python 3.12, padelpy seems broken at

filename = timestamp + str(random.randint(1e8,1e9))

conda create -n padel python=3
conda activate padel
conda install -c conda-forge padelpy
python
>>> from padelpy import from_smiles
>>>
>>> # calculate molecular descriptors for propane
>>> descriptors = from_smiles('CCC')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Software\Michele\miniconda3\envs\padel\Lib\site-packages\padelpy\functions.py", line 63, in from_smiles
    filename = timestamp + str(random.randint(1e8,1e9))
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Software\Michele\miniconda3\envs\padel\Lib\random.py", line 336, in randint
    return self.randrange(a, b+1)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Software\Michele\miniconda3\envs\padel\Lib\random.py", line 301, in randrange
    istart = _index(start)
             ^^^^^^^^^^^^^
TypeError: 'float' object cannot be interpreted as an integer

This works on 3.10, I haven't checked 3.11. Apparently some typeconversion rules were tightened. - But is there a reason not to use tempfile.mkstemp here?

RuntimeError on generating descriptors

Hi, I always get a runtime error on generating descriptors. I have a file of 7000 molecules with their smile entries. It stops after a few SMILES and gives runtime error. I tried to increase the the runtime and no. of threads but it did not work.

for smile in tqdm(smiles):
desc = from_smiles(smile, fingerprints=False, descriptors=True, threads=4)
Descriptors.append(desc)

RuntimeError: PaDEL-Descriptor encountered an error: GLib-GIO-Message: 08:11:00.494: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.

DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
python 3.7
I met the errors as following, when i run python test.py (padelpy/tests)

E

ERROR: test_from_smiles (main.TestAll)

Traceback (most recent call last):
File "/home/abc/miniconda3/lib/python3.7/site-packages/padelpy/functions.py", line 59, in from_smiles
sp_timeout=timeout
File "/home/abc/miniconda3/lib/python3.7/site-packages/padelpy/wrapper.py", line 148, in padeldescriptor
err.decode('utf-8')
RuntimeError: PaDEL-Descriptor encountered an error: GLib-GIO-Message: 08:11:00.494: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 11, in test_from_smiles
descriptors = from_smiles('CCC')
File "/home/abc/miniconda3/lib/python3.7/site-packages/padelpy/functions.py", line 68, in from_smiles
raise RuntimeError(exception)
RuntimeError: PaDEL-Descriptor encountered an error: GLib-GIO-Message: 08:11:00.494: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.


Ran 1 test in 13.542s

FAILED (errors=1)

Fingerprint calculation limited to 99 compounds

I have a .smi file containing 207 compounds, with corresponding SMILES. While calculating fingerprints for those, it is limited to 99 compounds only, which can also be seen in the log file. Is there any parameter to bypass that limit? I tried setting maxcpdperfile=500, but it didn't work. Here is the snippet:

from padelpy import padeldescriptor
fingerprints = ['CDK', 'CDKextended', 'CDKgraphonly']
for fingerprint in fingerprints:
    fingerprint_output_file = ''.join([fingerprint,'.csv'])
    fingerprint_descriptortypes = fp[fingerprint]
    print(fingerprint_output_file)
    padeldescriptor(mol_dir='molecule_prad.smi', 
                    d_file=fingerprint_output_file,
                    descriptortypes= fingerprint_descriptortypes,
                    detectaromaticity=True,
                    standardizenitro=True,
                    standardizetautomers=True,
                    threads=10,
                    removesalt=True,
                    log=True,
                    fingerprints=True,
                    retainorder=True)

Screenshot 2024-01-29 202228

Unexpected keyword argument 'usefilenamesasmolname'

When trying to configure usefilenamesasmolname option:

from padelpy import padeldescriptor

padeldescriptor(usefilenamesasmolname=True)

I get the following error:

TypeError: padeldescriptor() got an unexpected keyword argument 'usefilenamesasmolname'

Doubt about command-line wrapper function padeldescriptor

Hi! I'm kinda new in the programming world and I have a doubt about how to use the command-line wrapper function.

So I have a .txt with a bunch of SMILES and I want to calculate their 2D descriptors, so how should I use the padeldescriptor function so I only calculate the 2D? Because by default I get all 1875 descriptors.

I wrote something like this:

image

Thanks!

padelpy hangs with specific peptide sequence

Hi, novice user here.
when using PaDELpy it works most of the time for me but the following peptide sequence just hangs...and eventually times out.

string = rdkit.Chem.MolToSmiles(rdkit.Chem.MolFromFASTA('GLILVGGYGTR'))
desc = from_smiles(string)

To me i don't see anything obviously wrong with the smiles string that is generated by rdkit.
interesting if i take off the final Arginine (R) it computes fine.

Any ideas?

wrapper conflict with parallel processing

my code:

def get_padel_descs(smiles):
    dummy = from_smiles("CC")
    try:
        desc = from_smiles(
            smiles,
            timeout=10,
            threads=1,
        )
    except Exception:
        desc = dummy.fromkeys(dummy)
        sys.stderr.write("convert failed: " + smiles)

    return pd.Series(desc)

desc_list = get_padel_descs("CC").keys()

%%time
swifter.set_defaults(
    npartitions=32,
    scheduler="threads",  # threads or processes
    force_parallel=True,
)
df1.loc[:, desc_list] = df1["SMILES"].swifter.apply(get_padel_descs)

randomly get error:

File ~/mambaforge/lib/python3.11/site-packages/padelpy/functions.py:121, in from_smiles(smiles, output_csv, descriptors, fingerprints, timeout, maxruntime, threads)
    118     rows = [row for row in reader]
    119 desc_file.close()
--> 121 remove("{}.smi".format(timestamp))
    122 if not save_csv:
    123     remove(output_csv)

FileNotFoundError: [Errno 2] No such file or directory: '20230507055855835.smi'

I think it is the timestamp problem, two files (in parallel) shared the same timestamp. A random number could be better.

Inconsistent descriptors values for same SMILES.

Hi
I am using this library for my projects and found out that there are some descriptors which will give different values for different run.

In the below figures the x-axis is different SMILES samples i.e. total of 128 samples, and the y-axis is the values calculated by padel-descriptor for topoRadius, topoDiameter and WPATH. Unfortunately because of LICENSE issues I cannot post here the dataset or any SMILES values for reproducibility but.

run 1
run_1

Here we can see the high-values are occurring at - 12, 28, 32, 37 e.t.c

run2
run_2

But in the second run the high values are at - 13, 25, 28, 33 e.t.c which is saying that different values for same set of SMILES.

  1. Is this a common problem ?
  2. How to handle this problem ?
  3. Why would padel descriptor give such extreme values ?

Thanks

  • Dinabandhu

Java JRE 6+ not found (required for PaDEL-Descriptor)

I followed your instruction to install padelpy. But i don't know how to install java (i tried to install java but failed, this error still exists). Could you give me particular instructions to handle this problem?

Buprenorphine SMILES

I have been using padelpy to generate molecular descriptors and it works very well. I do have one issue though and it is that I am having trouble generating descriptors for buprenorphine. I have tried different SMILES strings and even changed the timeout paramater to 30 min and I still receive a timeout error. Curious to know why it isn't working for this SMILES string.

SMILES I have tried:

Oc7ccc5c1c7O[C@H]3[C@]6(OC)C@HC@@(C)C(C)(C)C (Wikipedia)

CC(C)(C)C(C)(C1CC23CCC1(C4C25CCN(C3CC6=C5C(=C(C=C6)O)O4)CC7CC7)OC)O (canonical PubChem)

C[C@([C@H]1C[C@@]23CC[C@@]1([C@H]4[C@@]25CCN([C@@h]3CC6=C5C(=C(C=C6)O)O4)CC7CC7)OC)(C(C)(C)C)O (isomeric PubChem)

Thank you in advance for any help you might be able to provide.

KlekotaRoth descriptor run is not finishing

When I run the KlekotaRoth descriptor the run is not finishing, even though I get the results normally.

Interrupting the run I get this log:

KeyboardInterrupt
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-5-fb2777cce1fe> in <module>
      1 from functions.fingerprint_functions import KlekotaRoth
      2 
----> 3 KlekotaRoth(mol_dir)

c:\Users\Brenda\OneDrive\Documentos\Doutorado\QSAR\03.descriptor_preparation\functions\fingerprint_functions.py in KlekotaRoth(mol_dir)
    322     fingerprint_descriptortypes = fp[fingerprint]
    323 
--> 324     padeldescriptor(mol_dir='dataset.smi', 
    325                     d_file=fingerprint_output_file,
    326                     descriptortypes= fingerprint_descriptortypes,

~\AppData\Local\Programs\Python\Python38\lib\site-packages\padelpy\wrapper.py in padeldescriptor(maxruntime, waitingjobs, threads, d_2d, d_3d, config, convert3d, descriptortypes, detectaromaticity, mol_dir, d_file, fingerprints, log, maxcpdperfile, removesalt, retain3d, retainorder, standardizenitro, standardizetautomers, tautomerlist, usefilenameasmolname, sp_timeout, headless)
    149         command += ' -usefilenameasmolname'
    150 
--> 151     _, err = _popen_timeout(command, sp_timeout)
    152     if err != b'':
    153         raise RuntimeError('PaDEL-Descriptor encountered an error: {}'.format(

~\AppData\Local\Programs\Python\Python38\lib\site-packages\padelpy\wrapper.py in _popen_timeout(command, timeout)
     43         return (-1, b'PaDEL-Descriptor timed out during subprocess call')
     44     else:
---> 45         return p.communicate()
     46 
     47 

~\AppData\Local\Programs\Python\Python38\lib\subprocess.py in communicate(self, input, timeout)
   1022 
   1023             try:
-> 1024                 stdout, stderr = self._communicate(input, endtime, timeout)
   1025             except KeyboardInterrupt:
   1026                 # https://bugs.python.org/issue25942

~\AppData\Local\Programs\Python\Python38\lib\subprocess.py in _communicate(self, input, endtime, orig_timeout)
   1393             # calls communicate again.
   1394             if self.stdout is not None:
-> 1395                 self.stdout_thread.join(self._remaining_time(endtime))
   1396                 if self.stdout_thread.is_alive():
   1397                     raise TimeoutExpired(self.args, orig_timeout)

~\AppData\Local\Programs\Python\Python38\lib\threading.py in join(self, timeout)
   1009 
   1010         if timeout is None:
-> 1011             self._wait_for_tstate_lock()
   1012         else:
   1013             # the behavior of a negative timeout isn't documented, but

~\AppData\Local\Programs\Python\Python38\lib\threading.py in _wait_for_tstate_lock(self, block, timeout)
   1025         if lock is None:  # already determined that the C code is done
   1026             assert self._is_stopped
-> 1027         elif lock.acquire(block, timeout):
   1028             lock.release()
   1029             self._stop()

KeyboardInterrupt: 

Thank you in advance.

Not able to fingerprint some SMILES

I was trying to fingerprint SMILES from a dataset but unable to get the descriptors using padelpy.from_smile for some SMILES in the dataset. For example these SMILES: 'C=C', '[H]CCCP(CCCNC(=O)C(=O)N[H])c1ccccc1', '[H]CCCP(CCCCCCCC)CCCNC(=O)C(=O)N[H]', etc.
However when I double or triple these smiles for e.g. 'C=CC=C', I am able to get the fingerprints(both descriptors and fingerprints) but not for the original one. I don't know what is happening.
Here is the code that I am using:
from padelpy import from_smiles
from_smiles('C=C',descriptors=True, fingerprints=True, timeout=50)
Please help!!!

Problem when calculate descriptors from smiles by using csv

I have seen an issue (when calculate descriptors from smiles, an error occured. #9) with related problem; however, when I use the code mentioned in the issue:

from padelpy import from_smiles
with open('all_smi.csv', 'rt') as f:
	smi = f.readlines()

I will get error: IndexError: list index out of range. I have checked for my csv, and when I use the csv provided in that issue, it still get the same problem. How can I fix it?
My environment is Mac OS, and the python version is 3.7.3
Thanks.

padeldescriptor returns no output

Hello,

I've created a .smi file of smiles (1 smile in each row) and I'm running it as:

! pip install padelpy
from padelpy import padeldescriptor
padeldescriptor(mol_dir='ai_ml.smi', d_2d=True, d_3d=True, fingerprints=True, detectaromaticity=True, removesalt=True, retainorder=True, threads=2, d_file='ai_ml_padel_output.csv' )

It exits with no error, however, no output file is generated, any idea what I'm doing wrong? I'm running it on a Jupyter notebook and no errors occurred on installation.

Padelpy generating fingerprints in random order

It has come to my notice that when we are calculating molecules descriptors of say 5 molecules, the fingerprints generated are in random order.
Screenshot from 2022-11-21 18-11-20
You can see in the image, the order of molecule has changed randomly, can you please tell me ehy is this and what is the work around, I have 10000 molecules, I can't keep a track of any because of this.

Error:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 1: invalid continuation byte

Hello, when calling from_smiles function, when the input is C1N2C3C4N(C2=O)CN5C6C7N(C5=O)CN8C9C2N(C8=O)CN5C8C%10N(C5=O)CN5C%11C%12N(C5=O)CN5C%13C%14N(C5=O)CN5C%15C( N1C5=O)N1CN3C(=O)N4CN6C(=O)N7CN9C(=O)N2CN8C(=O)N%10CN%11C(=O)N%12CN%13C(=O)N%14CN%15C1=O will be reported as an error, Do you know how to solve this problem? thank you very much!

Code:

from padelpy import from_smiles
result = from_smiles('C1N2C3C4N(C2=O)CN5C6C7N(C5=O)CN8C9C2N(C8=O)CN5C8C%10N(C5=O)CN5C%11C%12N(C5=O)CN5C%13C%14N(C5=O)CN5C%15C(N1C5=O)N1CN3C(=O)N4CN6C(=O)N7CN9C(=O)N2CN8C(=O)N%10CN%11C(=O)N%12CN%13C(=O)N%14CN%15C1=O' ,fingerprints=True, descriptors=False)
print(result)

Error:
Traceback (most recent call last):
File "D:/Python/jupyter/sr/son_structure.py", line 6, in
result = from_smiles('C1N2C3C4N(C2=O)CN5C6C7N(C5=O)CN8C9C2N(C8=O)CN5C8C%10N(C5=O)CN5C%11C%12N(C5=O)CN5C%13C%14N(C5=O)CN5C%15C(N1C5=O)N1CN3C(=O)N4CN6C(=O)N7CN9C(=O)N2CN8C(=O)N%10CN%11C(=O)N%12CN%13C(=O)N%14CN%15C1=O' ,fingerprints=True, descriptors=False)
File "D:\anoconda\envs\HGT\lib\site-packages\padelpy\functions.py", line 94, in from_smiles
threads=threads
File "D:\anoconda\envs\HGT\lib\site-packages\padelpy\wrapper.py", line 168, in padeldescriptor
err.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 1: invalid continuation byte

other fingerprints

Hi, Thanks for the great packages!

I have simple question. Maybe my lack of ability.

When running from_smiles(), it only calculate Pubchem FP?
I can't find any function for other FP in Readme.md.

Thank you in advance for any help you might be able to provide.

Can not apply with .smi file (~4000 cps)

I want to calculate molecular descriptor for more than 4000 cps in file.smi. However, when I use:

from padelpy import padeldescriptor
padeldescriptor(mol_dir='molecule.smi', d_file='descriptors.csv')

The output descriptors.csv have no values. Tell me how to solve this issue. Tks so much

Calculate MACCS and PUBCHEM Fingerprints Together

Hi, I find a problem that I cannot generate both MACCS and PUBCHEM fingerprints together by SMILES.

It is easy to be done in the PaDEL software by simply checking the box of MACCS and PUBCHEM.

Is there a way in the padelpy that can generate those two fingerprints together from SMILES?

Example:

Input:

padelpy.from_smiles(smi, fingerprints=True, descriptors=False)

Output:

{'PubchemFP0': '1', 'PubchemFP1': '0',.......}

Input:

padelpy.from_smiles(smi, fingerprints=False, descriptors=False)

Output:

{}

Besides these two situations, I cannot find a way to change the output types of fingerprints in padelpy. It is PUBCHEM or None. I am looking forward to your reply. Thank you.

Enabling multithreading

I have seen that padel descriptor allows multi processing(link). Can we do this in wrapper? because I can see my CPUs are not being used to fullest extent. But, while using GUI from PaDEL usage of CPUs is maximum.

Prevent Padel Splash Image from Loading

Hey,

I love this package, it's working great, except for one inconvenience. When I run "from_smiles()" multiple times, such as on a dataframe with thousands of entries, it reloads the Padel splash image each time it does a calculation, making it impossible to work because it's constantly interrupting whatever window I'm using.

I looked at wrapper.py and can see that there is a "_popen_timeout()" function, which has a call to "popen()". I've tried importing "CREATE_NO_WINDOW" from the "subprocess" module and setting "creation_flags = CREATE_NO_WINDOW" in the "popen()" call, but it doesn't fix the problem.

Is there a simple way to prevent the Padel splash image from loading every computation?

Thanks!

Convert to 3D is broken?

Hello,

I'm trying to generate the PaDEL 3D descriptors for a small dataset; I've set the as the code snip below and prepared an XML file the 3D descriptors set to True and left the rest as False however the out files are all empty values. Is there something wrong with my approach or are the 3D descriptors broken?
PS: I'm calculating it from smiles supplied as a smi file, the 2D descriptors work fine with the same input.
padeldescriptor(mol_dir='temp.smi', d_file='PaDEL_3D.csv', descriptortypes= '3D.xml', detectaromaticity=True, standardizenitro=True, convert3d=True, d_2d=False, d_3d=True, standardizetautomers=True, threads=2, removesalt=True, log=False, retainorder=True, fingerprints=False)

Padelpy GPU version?

Hello
I want to know if the library is compatible on GPU? The PadelPy library on CPU is quite slow to generate fingerprints of around ~10000 molecules it takes me around 3-4 hours or even more sometimes. if GPU version isn't available, how can the process be speed up?
Please let me know
Thanks
Anjali

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.