cokelaer / bioservices Goto Github PK

View Code? Open in Web Editor NEW

277.0 17.0 61.0 5.6 MB

Access to Biological Web Services from Python.

Home Page: http://bioservices.readthedocs.io

License: Other

Python 90.39% Jupyter Notebook 9.61%

kegg unichem eutils chembl chebi biomodels quickgo uniprot wikipathways muscle

bioservices's Introduction

Hi there

🔭 I’m currently actively working on Sequana and also maintaining BioServices, Damona, Fitter, colormap, spectrum, Bioconvert and easydev.
I'm currently leading the bioinformatics and data management activities of the Biomics NGS platform (biomics.pasteur.fr) building pipelines and tools for production.
👯 I’m looking to collaborate on BioConvert, BioServices and Damona and of course Sequana. Would you be interested to take the lead on Spectrum, fitter, colormap, please let me know.

bioservices's People

Contributors

Stargazers

Watchers

bioservices's Issues

test suite takes too long

The test suite takes 15 minutes. We should decrease the amount of time it takes by having a sub set of tests that will be run all the time after each commit and another that will be run before a release.

use devtools from easydev 0.8.0

ensemble sequence + variation + overlap + regualtion sections

installation error of bioservices

Hi,
I have python2.6 installed on my server (I can not change to new one ). i am getting the following error message during installation

Extracting bioservices-1.3.5-py2.6.egg to /home/satya/python/lib/python2.6/site-packages
SyntaxError: ('invalid syntax', ('/home/satya/python/lib/python2.6/site-packages/bioservices-1.3.5-py2.6.egg/bioservices/kegg.py', 1569, 50, " output['dblinks'] = {k[0:-1]:v for k,v in output['dblinks'].items()}\n"))

SyntaxError: ('invalid syntax', ('/home/satya/python/lib/python2.6/site-packages/bioservices-1.3.5-py2.6.egg/bioservices/mapping/mappers.py', 149, 37, " data = {k1:{k2:v2['xkey'] for k2,v2 in self.alldata[k1].iteritems()} for k1 in self.alldata.keys()}\n"))

Please let me know the fix

Missing SEQUENCE in KEGGParser

When parsing multiple kgml files I get the warning saying SEQUENCE is missing. Everything still functions. It happens for kegg compounds that are peptide sequences. Seems to be missing from the list on line 1374 of kegg.py. It suggested to pass it along to the repo.

KEGG parse for keyword "SEQUENCE"

Hi,
I get the following warning for a few compounds (eg. C15682, C12045, ...) with parser:

from bioservices import KEGG
k=KEGG()
c = k.get('C15682')
cp = k.parse(c)

Warning. Found keyword SEQUENCE, which has not special
parsing for now. please report this issue with the KEGG
identifier ( C15682 Compound) into github.com/bioservices. Thanks T.C.

hgnc tests fail

Example fails

I just installed bioservices and copy pasted the example code in ipython. It didn't work.

 from bioservices import UniProt
 u = UniProt()
 data = u.search("zap70+and+taxonomy:9606", format="tab", limit=3, columns="entry name,length,id, genes")

Result:

TypeError: search() got an unexpected keyword argument 'format'

Version 2

XML is an issue in many places. This is not bioservices issue but rather a natural difference between choices made by various organization. The easyXML class is very simple and simply parse the xml thanks to beautifulsoup4. The xmltodict package may be very useful for that purpose.
Output in dictiobnary, json is good but most of the times people want to do something with it such as plotting, statistics and so on. Right now, bioservices has hardly any dependencies but Pandas would be great to have. This means that matplotlib and numpy will be required but this would be a great addon.
Finalise the remaining missing package such as PDB and Ensembl (almost there)
use pandas officially
Wikipathway
- refactoring from WSDL to REST
- use Pandas
- missing functionalities to be implemented (not those with login though)

Compound DBLINKS KEGGParser error!

from bioservices import *
kegg = KEGG()
c = kegg.parse(kegg.get('cpd:C00087'))

c['DBLINKS']
{u'CAS': u'7704-34-9 10544-50-0 PubChem: 3387 ChEBI: 17909 26833 3DMET: B04617 N
IKKAJI: J3.750H'}

ensemble Ontologies and Taxonomy section

BioCyc Interface

It's great to see a multi-service package like this being developed! I found you via this post here

I noticed that there isn't an API for BioCyc yet.

I have an implementation of one here here that uses the Web API to allow browsing of BioCyc objects. See the example notebook for an example. Improving the API to seamlessly support a local PathwayTools server is on the someday todo list.

Would you be interested in merging this into the bioservices package? It seems a natural fit

Taxonomy missing

I found an other example in the documentation:

from bioservices.apps.taxonomy import Taxon

Result:

ImportError: No module named taxonomy

Using version 1.4.0

ensemble archive section

UniProt API has changed and need to be updated in uniprot module

"Interacts with" has been renamed 'interactor'

1 test in picr test fails

test_getUPIForBLAST

 self.data = data[:]
TypeError: 'int' object has no attribute '__getitem__'

ensembl

Need to finalise the wrapping

biomart fails if service down without useful message

ensemble information section

add intact complex web services

caching with suds

A caching with suds is possible so let us try to implement it.

Possible to access a KEGG entry without specifying the associated organism?

I added a detailed question on stackoverflow.

In short:

Let's say I am interested in a certain gene (e.g. b3640) and I do not know to which organism it belongs, is it then possible to get all the information for this gene without specifying the organism?

For example

from bioservices import *
kegg_con = KEGG()
res = kegg_con.get('b3640', parse=True)['NAME']

does not work since the organism is not specified. When it is specified, it all works as intended

kegg_con.get('eco:b3640', parse=True)['NAME']

returns

[u'dut']

When I try to determine the associated organism by using

kegg_con.find('genes', 'b3640')

the desired entry is not found, unfortunately.

So my two questions are therefore:

Is there a way so that I can access the information about a gene just based on its gene ID without specifying the organism it belongs to?
What would be the best way to retrieve the information to which organism the gene belongs? And why doesn't find fails when I search for the E.coli gene?

Uniprot return integer

I'm using bioservices after an automatic BLAST, it searches the accesionnumber in Uniprot and retreives the available data. However I check if their is a result using if len(res) != 0, this works well but sometimes I get the following error message: if len(res) != 0: TypeError: object of type 'int' has no len().
But after printing the concerning variable (res) I turns out to be blank.
I hope someone know why this is happening?

uniprot tab names to be updated

After discussions with Klemens, it appear that we can add those columns:

[1] comment(x) where x can be any of the comment types, like comment(FUNCTION)
[2] database(y) where y can be any of the cross references, like database(CCDS) or database(InterPro)
[3] lineage-id(z) where z can be any of a number of taxonomic ranks, like lineage-id(PHYLUM) or lineage-id(GENUS); returns taxid, e.g. 2759
[3a] lineage(z), see [3] but returns the name, e.g. Eukaryota
[4] feature(a) where a can be any of the feature keys, like feature(DISULFIDE BOND)
[5] version(entry)
[6] sequence-modified
[7] proteome, returns proteome identifiers
[...]

Found keyword BRACKET, which has not special parsing for now

Found keyword BRACKET, which has not special parsing for
now. please report this issue with the identifier ( C06042 Compound) into github.com/bioservices

add a count on requests to limit number of requests

If one launch too many requests at the same time, he may be blacklisted or future requests limited.
A counter to limit requests per second would be nice

biomodels contains ascii characters unicode error

>>> from bioservice import BioModels
>>> b = BioModels()
>>> b.getModelSBMLById('MODEL1006230101')
 <repr(<suds.sax.text.Text at 0x5767e90>) failed: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 5171: ordinal not in range(128)>

There seem to be an error but actually, this message is shown if we call type the variable name.

For instance this work without any message:

>>> result = b.getModelSBMLById('MODEL1006230101')

print functions works as well, so this is a repr issue

This is not impotant but one way to fix this is to call encore('utf8') on the output

Typo in Kegg Tutorial

On the KEGG tutorial page,

k.organism = "hsa"
k.pathwaysIds

should be changed to

k.organism = "hsa"
k.pathwayIds

uniprot module cleaning

in the mapping method, right now we are limited in the number of requests but is handle thanks to the multi_mapping. In fact, we can just merge the 2 methods and use a http post request.

add test and doc for clinvitae

test
doc

kegg pathway2sif does not work anymore

k.pathway2sif('path:map04010')

TypeError: 'int' object has no attribute '__getitem__'

pdb service

This service is not finalised. There seem to be more functionalities that are not yet available

eutils does not use wsdl anymore

Need to convert the code that relied on WSDL (EFetch)

adding function to fetch information about a reaction.

"Warning. Found keyword SYSNAME, which has not special parsing for now. please report this issue..."

I received the following warnings which included the request to report them:

Warning. Found keyword SYSNAME, which has not special
parsing for now. please report this issue with the KEGG
identifier ( EC 4.2.1.3 Enzyme) into github.com/bioservices. Thanks T.C.

Warning. Found keyword SUBSTRATE, which has not special
parsing for now. please report this issue with the KEGG
identifier ( EC 4.2.1.3 Enzyme) into github.com/bioservices. Thanks T.C.

Warning. Found keyword ALL_REAC, which has not special
parsing for now. please report this issue with the KEGG
identifier ( EC 4.2.1.3 Enzyme) into github.com/bioservices. Thanks T.C.

Warning. Found keyword HISTORY, which has not special
parsing for now. please report this issue with the KEGG
identifier (EC 4.2.1.3 Enzyme) into github.com/bioservices. Thanks T.C.

Here is the link to the entry: kegg.

Bad arguments in xmltools.readXML class constructor

Here is the current constructor of readXML class:

class readXML(easyXML):
    def __init__(self, filename, fixing_unicode=False, encoding="utf-8"):
        url = urlopen(filename, "r")  # the bad function call...
        self.data = url.read()
        super(readXML, self).__init__(self.data, fixing_unicode, encoding)  # ...and the outdated constructor call

The second parameter of the urlopen is data, but in this case no data is being passed to the server request through the constructor, since this "r" argument is hardcoded... I don't know where it came from, but doesn't make sense. Also, in Python 3 this parameter must be of type bytes, so in this case we get an exception.

Even if this call succeeds, the fixing_unicode parameter on the parent's constructor was removed some time ago, so the super().__init__() call will fail too.

I will make a pull request fixing it soon.
Thanks.

update setup to use wrapt package

biomart bug

Hi!

I try to run the following code:

from bioservices import *
s = BioMart()
datasets = s.databases("ensembl")

I can create a Biomart object with BioMart(), but whenever I try to call a function I get the following error message (in this case databases()):

In python 2.7:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-9687ac9ee893> in <module>()
      1 s = BioMart()
----> 2 datasets = s.databases("ensembl")

/home/nic/anaconda3/envs/py27/lib/python2.7/site-packages/bioservices/biomart.pyc in _get_databases(self)
    394     def _get_databases(self):
    395         if self._databases is None:
--> 396             ret = self.registry()
    397             names = sorted([x.get("database", "?") for x in ret])
    398             self._databases = names[:]

/home/nic/anaconda3/envs/py27/lib/python2.7/site-packages/bioservices/biomart.pyc in registry(self)
    214         """
    215         ret = self.http_get("?type=registry", frmt="xml")
--> 216         ret = self.easyXML(ret)
    217         # the XML contains list of children called MartURLLocation made
    218         # of attributes. We parse the xml to return a list of dictionary.

/home/nic/anaconda3/envs/py27/lib/python2.7/site-packages/bioservices/services.pyc in easyXML(self, res)
    183         """
    184         from bioservices import xmltools
--> 185         return xmltools.easyXML(res)
    186 
    187 

/home/nic/anaconda3/envs/py27/lib/python2.7/site-packages/bioservices/xmltools.pyc in __init__(self, data, encoding)
     77         #    self.data = x.fixed_string.encode("utf-8")
     78         #else:
---> 79         self.data = data[:]
     80 
     81         try:

TypeError: 'int' object has no attribute '__getitem__'

in python 3.5:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-9687ac9ee893> in <module>()
      1 s = BioMart()
----> 2 datasets = s.databases("ensembl")

/home/nic/anaconda3/lib/python3.5/site-packages/bioservices/biomart.py in _get_databases(self)
    394     def _get_databases(self):
    395         if self._databases is None:
--> 396             ret = self.registry()
    397             names = sorted([x.get("database", "?") for x in ret])
    398             self._databases = names[:]

/home/nic/anaconda3/lib/python3.5/site-packages/bioservices/biomart.py in registry(self)
    214         """
    215         ret = self.http_get("?type=registry", frmt="xml")
--> 216         ret = self.easyXML(ret)
    217         # the XML contains list of children called MartURLLocation made
    218         # of attributes. We parse the xml to return a list of dictionary.

/home/nic/anaconda3/lib/python3.5/site-packages/bioservices/services.py in easyXML(self, res)
    183         """
    184         from bioservices import xmltools
--> 185         return xmltools.easyXML(res)
    186 
    187 

/home/nic/anaconda3/lib/python3.5/site-packages/bioservices/xmltools.py in __init__(self, data, encoding)
     77         #    self.data = x.fixed_string.encode("utf-8")
     78         #else:
---> 79         self.data = data[:]
     80 
     81         try:

TypeError: 'int' object is not subscriptable

I couldn't find anything online, is this a bug or am I doing something wrong?

Best,

Nico

kegg parser improvments

DBLinks entry is not parsed as a dictionary.
set a parse parameter in the get method to so that entries can be parsed automatically by default.
add missing INTERACTION/STR_MAP
strip NTSEQ and AASEQ

to be commited soon

bioservices.RNASEQ_EBI._get_organism chokes EVERY TIME

Any method (including all of your examples in the documentation) that want's to use "ORGANISM" as a piece of information will choke after a good 15 sec with similar to the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-60-d2f890b5dff6> in <module>()
      1 r = RNASEQ_EBI()
----> 2 r.organisms

/home/gus/.anaconda/envs/jupyter/lib/python3.5/site-packages/bioservices/rnaseq_ebi.py in _get_organism(self)
    289             res = sorted(list(set(res)))
    290 
--> 291             res.remove("ORGANISM")
    292 
    293             self._organisms = res

ValueError: list.remove(x): x not in list

due to the assert organism in self.organisms statment in those methods.

Inconsistent behaviour of KeggParser

Hi,

KeggParser behaves differently for the same attribute in different entries, for instance if a compound has only one name KeggParser returns a string, although if it has more than one it returns a list. This makes it hard to automatically parse all the compounds names.

I would suggest that attributes that can be lists should always be returned as lists independently of the number of elements found. This happens with other attributes, e.g. reactions.

Thanks.

Cheers,

QuickGO Term oboxml format returns HTML format

I recently open scripts from a project for my colleagues, began few months ago.
I was surprised that the call format method was changed, but I changed all the impacted scripts.
Now I am confronted to a new trouble. When I send a request to get a GO Term in OBOXML format, bioservices sends me a HTML format. I didn't see anything about this issue on the changelog.

Example:
Get the OBOXML format for the term GO:0000016

from bioservices import QuickGO as qgo
qgo = qgo()
term = qgo.Term("GO:0000016", frmt="oboxml")
print term

I checked the source script, by curiosity, and I wonder if the problem could be the parameters (script quickgO.py, line 116, 'frmt' in place of 'format'?)

cache files should be saved in config/bioservices, not locally

bioservices fails to import in Python 3.3.3 due to gevent

Looks like a useful project for xref genes/proteins!

I can successfully install in Python 2.7.8, but in Python 3.3.3, after pip install bioservices (with some complaining about gevent, but supposed success) on import:

In [1]: import bioservices
File "/home/richard/venv3.3/lib/python3.3/site-packages/gevent/hub.py", line 282
except Exception, ex:
^
SyntaxError: invalid syntax

From google it seems gevent is not Py3k compatible. Is there a forked version I am missing?

add pride service

doc
test
api itself

Possible to get GO term definitions?

Is it possible to get the definitions of GO terms with bioservices? I checked out the docs, but could find no GO examples.

BioMart does not have them afaicr.

Currently outsourcing this to the R GO.db package.

Kegg parser module with non-supported attribute

Hi,

When parsing the some kegg modules a NotImplementedError was raised, see example below:

m = 'M00144'
s.parse(kegg_srv.get(p))

Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/IPython/core/interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
dict([(p, s.parse(kegg_srv.get(p))) for p in kegg_pathways])
File "/Library/Python/2.7/site-packages/bioservices/kegg.py", line 1254, in parse
raise NotImplementedError("Entry %s not yet implemented" % dbentry)
NotImplementedError: Entry Complex Module not yet implemented

Thanks.

Cheers,

Version 1.5

Before 2.0, we should finalise pending issues on bioservices to have a final stable version.

Finalise EUtils
Finalise other modules if needed
Get taxon in ENA
lineage using ENA

Using UniProt Service in C#

Hi All,

I am unable to import the UniProt Service in C# using IronPython.
Please let me know how can get it done

Thanks,

Kegg Parser pathway reaction list treated as a dict

When getting and parsing a kegg pathway the reaction list is splitted by new line which makes it a dictionary instead of a list.

cokelaer / bioservices Goto Github PK

bioservices's Introduction

Hi there

bioservices's People

Contributors

Stargazers

Watchers

Forkers

bioservices's Issues

Recommend Projects

Recommend Topics

Recommend Org