Giter VIP home page Giter VIP logo

flame's People

Contributors

adriancabreraphi avatar adrianexamen avatar bet-gregori avatar bielstela avatar ignaciopasamontes avatar ismaelresp avatar josecarlosgomezt avatar manuelpastor avatar parodbe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

flame's Issues

Temp directories for Windows need to be created manually

Default upload directory for the prediction web service is /var/tmp. In Windows, this directory must be created by hand.
We need to recognize the platform which is running the server and set up appropriate temp directories

Config windows path is not resolved correctly

When config.yml contains a windows path it fails to resolve correctly and the function utils.model_repository_path() returns an invalid path.

In [19]: p = pathlib.Path('C:/Users')
In [20]: p.resolve()
Out[20]: PosixPath('/home/biel/git-repos/phi/Flame/C:/Users')

Documentation for use with Jupyter notebooks

We must create a folder with Jupyter notebooks illustrating how Flame can be used to generate predictions and how the JSONs can be easily converted to pandas and visualized in different ways

Two possible sources of errors.

1 - In context.py (build_cmd function):

       ifile = model['infile']
        if not os.path.isfile(ifile):
            return False, 'wrong training series file'

        epd = utils.model_path(model['endpoint'], 0)
        lfile = os.path.join(epd, os.path.basename(ifile))
        shutil.copy(ifile, lfile) <---

When the input file is already in the dev folder, an exception raises.

2- In idata.py (workflow_objects function):

           if first_mol:  # first molecule
                md_results = results[0]
                va_results = results[1]
                num_var = len(md_results) <---
                first_mol = False
            else:
                if len(results[0]) != num_var:
                    print('ERROR: (@workflow_objects) incorrect number of MD for molecule #', str(
                        i+1), 'in file ' + input_file)
                    continue

Indicated statement assumes first molecule will always be correct in the number of parameters.

TSV format External prediction error

JSON format works perfectly. When TSV format is set up in yaml file:
1- there is no complete output in terminal:
2- the output.tsv dumped contains:
- headers: obj_nam | SMILES | c0 | c1 | ymatrix
- What does is mean c0 and c1? what about ymatrix?
3- where is sens, spec and MCC??

Use Pathlib to handle the paths

Since we are using python 3.6 we could get advantage of the new pathlib (new since 3.4). Its standard library to work with path (either posix or windows) with a lot of useful methods. Since we are dealing with multiple sdfiles (when working with cpu>1) it will be helpfull!

Compatibility of flame

Should we test flame in older python version and downgraded versions of packages and fix the issues? If so, where do we have to put the compatibility frontier?

argument -f inconsistent

I saw that the argparser for file input uses -f for short arg but --infile for long. I think they should have the same starting letter. eg. --filein

path error whilst building

(flame) [kpinto@ulises 6-model]$ flame -c build -e MyModel -f tr-DEG.sdf
CRITICAL ERROR: unable to load parameter file.Running with fallback defaults
Traceback (most recent call last):
File "/home/kpinto/miniconda3/envs/flame/bin/flame", line 11, in
load_entry_point('flame', 'console_scripts', 'flame')()
File "/phi/users/kpinto/flame/flame/flame_scr.py", line 142, in main
success, results = context.build_cmd(model)
File "/phi/users/kpinto/flame/flame/context.py", line 142, in build_cmd
shutil.copy(ifile, lfile)
File "/home/kpinto/miniconda3/envs/flame/lib/python3.6/shutil.py", line 241, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/home/kpinto/miniconda3/envs/flame/lib/python3.6/shutil.py", line 121, in copyfile
with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/phi/users/kpinto/flame/flame_models/MyModel/dev/tr-DEG.sdf'

Error "Segmentation fault: 11" in MacOS

I am doing the tutorial. When I try to build the model I get Segmentation fault: 11

I have installed the environment you provide on a macOS 10.13.3 machine.

Also the file I use as training set is caco2.sdf.

Building Qualitative models

flame) [kpinto@ulises 0-rdkit-properties]$ flame -c build -e INF-ql-RF -f ../../../1-test/pr-InF-3D-moka.sdf

recycling data >>> /phi/users/kpinto/flame/flame_models/INF-ql-RF/dev/data.pkl
running sumbsmapling
tune_parameters
metric: f1
best parameters: {'class_weight': None, 'max_features': 'sqrt', 'n_estimators': 25, 'oob_score': True, 'random_state': 46}
found in: 2.9187703132629395 seconds
Traceback (most recent call last):
File "/home/kpinto/miniconda3/envs/flame/bin/flame", line 11, in
load_entry_point('flame', 'console_scripts', 'flame')()
File "/phi/users/kpinto/flame/flame/flame_scr.py", line 142, in main
success, results = context.build_cmd(model)
File "/phi/users/kpinto/flame/flame/context.py", line 145, in build_cmd
success, results = build.run(lfile)
File "/phi/users/kpinto/flame/flame/build.py", line 83, in run
results = learn.run()
File "/phi/users/kpinto/flame/flame/learn.py", line 123, in run
self.run_internal()
File "/phi/users/kpinto/flame/flame/learn.py", line 96, in run_internal
success, results = model.validate()
File "/phi/users/kpinto/flame/flame/stats/base_model.py", line 391, in validate
success, results = self.CF_qualitative_validation()
File "/phi/users/kpinto/flame/flame/stats/base_model.py", line 248, in CF_qualitative_validation
self.sensitivity = (self.TP / (self.TP + self.FN))
ZeroDivisionError: division by zero

integration of missing components in furnace

We need to include scikit-learn in the environment. Also, we need to see how we can include standardizer and, if not possible, write a brief "how-to" explaining how setting up the environment

Error when serializing data in odata.py in predict for qualitative endpoints.

Both JSON and TSV data serialization fails.

JSON serialization fails when dumping the variable values from results. values is given as np.int64 type which is not compatible.

TSV instead, fails at (line 139):

    if isinstance(val, float):
       line += "%.4f" % val
    else:
       line += val

As there is no assertion for np.int64 type, the variable is not converted to string.

Properly handling of exceptions

The way how the exceptions are handled is proper to cause problems and misconceptions. For example, in the function:

def nummols (ifile):
    try:
        suppl = Chem.SDMolSupplier(ifile)
    except:
        return False, 'unable to open molfile'
    return True, len(suppl)

if the try/except catches an error, it will swallow it and output unable to open molfile always, even if the error wasn't opening the file (bad rdkit import for example)

Otherwise, doing:

def nummols (ifile):
    try:
        suppl = Chem.SDMolSupplier(ifile)
    except:
        raise
  
    return len(suppl)

If the try fails because there is no Chem module now the error will be correctly tracked to:

NameError: name 'Chem' is not defined

configuration status changes too early

Saying "no" to the first dialog in config command changes the config status. It shouldn't change since the model repo path is not updated when aborting the config been updated.

Source code conflicts

Please remember to update your code frequently, avoiding pushing obsolete code

Also, avoid re-writting code produced by other members of the team unless there is a good reason to do so. Even in this case, please inform the author before pushing

Moreover, before pushing code make double sure it works running simple tests. This does not replace more sophisticated quality controls, but at least will not block developement of other components

Add Logger.

Print when working with the CLI but a logger will be better for debugging and to inspect the workflow of model management and use.

Handling of exceptions

Don't use:

try:
    1/0
except:
    print('something did not work')

will print something did not work

Always catch the exception (even with generic exception class Exception):

try:
    1/0
except Exception as e:
    print(f' something did not work. Cause: {e}')

will print something did not work. Cause: division by zero

Let's build a better world together

Python style (PEP-8)

Since this will become a big project I think we should follow the PEP-8 style guide.

Here you can find a resume with the most important features.

Error building a model called 'test'

This only happens if the model name is 'test' in lowercase, with other names or 'TEST' in uppercase it works

Steps to reproduce:
from flame.build import Build
d = Build("test")
d.run("/home/marc/Documents/flame_dev_api/sdf/caco2.sdf")

Output:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-5-dbf08200c5a5> in <module>()
----> 1 d.run("/home/marc/Documents/flame_dev_api/sdf/caco2.sdf")

~/Documents/flame/flame/build.py in run(self, input_source)
     70             modpath = utils.module_path(self.model, 0)
     71 
---> 72             idata_child = importlib.import_module(modpath+".idata_child")
     73             learn_child = importlib.import_module(modpath+".learn_child")
     74             odata_child = importlib.import_module(modpath+".odata_child")

~/anaconda3/envs/flame_django/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'test.dev'

Subsampling

I don't know if this is useful but:

  • I would really appreciate to have the dataset in SD format after subsampling is done.
  • I would like to choose another the random seed to generate the samples, to be reproducible afterwards, and play with different ones.

Molecule standardization wrong behavior in workflow_series

When standardizing a molecule series, if one molecule fails in the standardization process, the whole series is rejected.

                if 'standardize' in method:
                    try:
                        parent = standardise.run(Chem.MolToMolBlock(m))
                    except standardise.StandardiseException as e:
                        if e.name == "no_non_salt":
                            parent = Chem.MolToMolBlock(m)
                        else:    **--> then the function is returning False for the whole series**
                            return False, e.name
                    except:
                        return False, "Unknown standardiser error"

Flame is returning the error message: "False {"error": "number of molecules informed and processed does not match"} " as no molecule could be processed.

add functionality to customize the path to model directory

manage needs some fixes and improvements in order to make the user and developer experience smooth. It would be nice to have a functionality to set the root directory for models repository and copy the the config.yaml file if it needs to be readed again.

It would be nice if we can discuss more about how manage should deal with the repository of models and how to propagate this information to the other classes.

standardizer not working with 1 CPU, on ws mode only

Then working as ws, and the number of CPUs is set to 1, standardizer fails. The error is captured and a "standardizer unknown error" is issued
Changing to 2 CPU or removing normalization solves the problem

Type check activity from SDF

Depending if the model is qualitative or quantitative flame shouldn't read without raising error or warning a sdf with the wrong type in < activity >

External molecular descriptors

1- in yaml file:

  • where should I put the pathway of the external TSV md file??
  • where should I put the activity column?
    2- I would add the option of concatenate descriptors, such as calculate the internal ones, and concatenate the external descriptors.
    3- It would be super good to have an external server where molecular descriptors could be calculated, and send requests through flame to calculate them.

use Logging instead of print

We should use Logging lib to dump event messages, warnings and errors. It will improve debugging and inspection of results. The logger have different levels (DEBUG, INFO, WARNING, ERROR) and info about the module that produces the message. For example:

2018-07-28 12:41:12,075 - flame.build - INFO - Creating list...
2018-07-28 12:41:12,075 - flame.build - DEBUG - length of list: 10

Separate the code section that writes the chunk files into another function

Now it is in countmol(). I think it will be better to have it in a separate method like:

def chunk_to_file(*args):
            index = []
            chunksize = nmol//self.control.numCPUs
            for a in range (nmol):
                index.append(a//chunksize)
            
            moli=0      # molecule counter in next loop
            chunki=0    # chunk counter in next toolp

            filename, file_extension = os.path.splitext(ifile)
            chunkname = filename + '_%d' %chunki + file_extension
            try:
                [. . .] 

if self.control.numCPUs > 1 :
        chunk_to_file()

version parameter of predict

The version number must be passed to predict as an int, both from flame and from predict-ws. Avoid reconverting strings to ints at the constructor or other places

OUTPUT formats

Molecular descriptors:
-It creates a file with the same name for both build and predict. I would recommend to put different names.

Build:

  • It always gives JSON output format
  • Maybe need TSV format
  • Recalculated values does not appers
  • subsampling: creates a new dataset that is not saved anywhere. I would really appreciate if it could be saved in the model folder.
    -JSON output: it does not give you the recalculated values, when running without conformal. when running conformal, it does not give you the upper and lower limit values, as it is shown in the predict part.
    -It would be good to have plots ( scatters, ... )

Predict:

  • JSON output: I would give the same output as the one given in Build section, with Q2, SDEP, .... in the first lines.
  • TSV output: I would add quality parameters such as sens, spec, mcc, coverage, accuracy, q2, SDEP, ...
  • plots (scatters, ...)

Could it be possible to obtain a table where appears:

MD | spec_calc | sens_calc | MCC_calc | spec_CV | sens_CV | MCC_CV | Coverage_CV | Accuracy_CV | spec_extv | sens_extv | MCC_extv | Coverage_extv
RDKit_properties |   |   |   | 0.77 | 0.79 | 0.55 | 0.47 | 0.78 | 0.74 | 0.58 | 0.32 | 0.51
RDKit_md |   |   |   | 0.78 | 0.73 | 0.50 | 0.36 | 0.75 | 0.79 | 0.80 | 0.58 | 0.48

Flame in Conda repository

Since flame depends on RDkit, using just pip to have a clean complete install is not possible. Conda allows handling this kind of complex dependencies very easily.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.