Giter VIP home page Giter VIP logo

bioservices's Introduction

Hi there

  • ๐Ÿ”ญ Iโ€™m currently actively working on Sequana and also maintaining BioServices, Damona, Fitter, colormap, spectrum, Bioconvert and easydev.
  • I'm currently leading the bioinformatics and data management activities of the Biomics NGS platform (biomics.pasteur.fr) building pipelines and tools for production.
  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate on BioConvert, BioServices and Damona and of course Sequana. Would you be interested to take the lead on Spectrum, fitter, colormap, please let me know.

bioservices's People

Contributors

achillesrasquinha avatar arnaudbelcour avatar bonej079 avatar bryan-brancotte avatar cokelaer avatar darthgecko avatar dependabot[bot] avatar erik-white avatar fbnrst avatar gianarauz avatar harper357 avatar hmenager avatar jsmusach avatar kirienko avatar leogama avatar luciansmith avatar pjshort avatar poorrican avatar saapooch avatar schryer avatar slobentanzer avatar sujaypatil96 avatar thobalose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bioservices's Issues

pdb service

This service is not finalised. There seem to be more functionalities that are not yet available

Kegg parser module with non-supported attribute

Hi,

When parsing the some kegg modules a NotImplementedError was raised, see example below:

m = 'M00144'
s.parse(kegg_srv.get(p))

Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/IPython/core/interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
dict([(p, s.parse(kegg_srv.get(p))) for p in kegg_pathways])
File "/Library/Python/2.7/site-packages/bioservices/kegg.py", line 1254, in parse
raise NotImplementedError("Entry %s not yet implemented" % dbentry)
NotImplementedError: Entry Complex Module not yet implemented

Thanks.

Cheers,

"Warning. Found keyword SYSNAME, which has not special parsing for now. please report this issue..."

I received the following warnings which included the request to report them:

Warning. Found keyword SYSNAME, which has not special
parsing for now. please report this issue with the KEGG
identifier ( EC 4.2.1.3 Enzyme) into github.com/bioservices. Thanks T.C.

Warning. Found keyword SUBSTRATE, which has not special
parsing for now. please report this issue with the KEGG
identifier ( EC 4.2.1.3 Enzyme) into github.com/bioservices. Thanks T.C.

Warning. Found keyword ALL_REAC, which has not special
parsing for now. please report this issue with the KEGG
identifier ( EC 4.2.1.3 Enzyme) into github.com/bioservices. Thanks T.C.

Warning. Found keyword HISTORY, which has not special
parsing for now. please report this issue with the KEGG
identifier (EC 4.2.1.3 Enzyme) into github.com/bioservices. Thanks T.C.

Here is the link to the entry: kegg.

installation error of bioservices

Hi,
I have python2.6 installed on my server (I can not change to new one ). i am getting the following error message during installation

Extracting bioservices-1.3.5-py2.6.egg to /home/satya/python/lib/python2.6/site-packages
SyntaxError: ('invalid syntax', ('/home/satya/python/lib/python2.6/site-packages/bioservices-1.3.5-py2.6.egg/bioservices/kegg.py', 1569, 50, " output['dblinks'] = {k[0:-1]:v for k,v in output['dblinks'].items()}\n"))

SyntaxError: ('invalid syntax', ('/home/satya/python/lib/python2.6/site-packages/bioservices-1.3.5-py2.6.egg/bioservices/mapping/mappers.py', 149, 37, " data = {k1:{k2:v2['xkey'] for k2,v2 in self.alldata[k1].iteritems()} for k1 in self.alldata.keys()}\n"))

Please let me know the fix

bioservices.RNASEQ_EBI._get_organism chokes EVERY TIME

Any method (including all of your examples in the documentation) that want's to use "ORGANISM" as a piece of information will choke after a good 15 sec with similar to the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-60-d2f890b5dff6> in <module>()
      1 r = RNASEQ_EBI()
----> 2 r.organisms

/home/gus/.anaconda/envs/jupyter/lib/python3.5/site-packages/bioservices/rnaseq_ebi.py in _get_organism(self)
    289             res = sorted(list(set(res)))
    290 
--> 291             res.remove("ORGANISM")
    292 
    293             self._organisms = res

ValueError: list.remove(x): x not in list

due to the assert organism in self.organisms statment in those methods.

biomart bug

Hi!

I try to run the following code:

from bioservices import *
s = BioMart()
datasets = s.databases("ensembl")

I can create a Biomart object with BioMart(), but whenever I try to call a function I get the following error message (in this case databases()):

In python 2.7:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-9687ac9ee893> in <module>()
      1 s = BioMart()
----> 2 datasets = s.databases("ensembl")

/home/nic/anaconda3/envs/py27/lib/python2.7/site-packages/bioservices/biomart.pyc in _get_databases(self)
    394     def _get_databases(self):
    395         if self._databases is None:
--> 396             ret = self.registry()
    397             names = sorted([x.get("database", "?") for x in ret])
    398             self._databases = names[:]

/home/nic/anaconda3/envs/py27/lib/python2.7/site-packages/bioservices/biomart.pyc in registry(self)
    214         """
    215         ret = self.http_get("?type=registry", frmt="xml")
--> 216         ret = self.easyXML(ret)
    217         # the XML contains list of children called MartURLLocation made
    218         # of attributes. We parse the xml to return a list of dictionary.

/home/nic/anaconda3/envs/py27/lib/python2.7/site-packages/bioservices/services.pyc in easyXML(self, res)
    183         """
    184         from bioservices import xmltools
--> 185         return xmltools.easyXML(res)
    186 
    187 

/home/nic/anaconda3/envs/py27/lib/python2.7/site-packages/bioservices/xmltools.pyc in __init__(self, data, encoding)
     77         #    self.data = x.fixed_string.encode("utf-8")
     78         #else:
---> 79         self.data = data[:]
     80 
     81         try:

TypeError: 'int' object has no attribute '__getitem__'

in python 3.5:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-9687ac9ee893> in <module>()
      1 s = BioMart()
----> 2 datasets = s.databases("ensembl")

/home/nic/anaconda3/lib/python3.5/site-packages/bioservices/biomart.py in _get_databases(self)
    394     def _get_databases(self):
    395         if self._databases is None:
--> 396             ret = self.registry()
    397             names = sorted([x.get("database", "?") for x in ret])
    398             self._databases = names[:]

/home/nic/anaconda3/lib/python3.5/site-packages/bioservices/biomart.py in registry(self)
    214         """
    215         ret = self.http_get("?type=registry", frmt="xml")
--> 216         ret = self.easyXML(ret)
    217         # the XML contains list of children called MartURLLocation made
    218         # of attributes. We parse the xml to return a list of dictionary.

/home/nic/anaconda3/lib/python3.5/site-packages/bioservices/services.py in easyXML(self, res)
    183         """
    184         from bioservices import xmltools
--> 185         return xmltools.easyXML(res)
    186 
    187 

/home/nic/anaconda3/lib/python3.5/site-packages/bioservices/xmltools.py in __init__(self, data, encoding)
     77         #    self.data = x.fixed_string.encode("utf-8")
     78         #else:
---> 79         self.data = data[:]
     80 
     81         try:

TypeError: 'int' object is not subscriptable

I couldn't find anything online, is this a bug or am I doing something wrong?

Best,

Nico

bioservices fails to import in Python 3.3.3 due to gevent

Looks like a useful project for xref genes/proteins!

I can successfully install in Python 2.7.8, but in Python 3.3.3, after pip install bioservices (with some complaining about gevent, but supposed success) on import:

In [1]: import bioservices
File "/home/richard/venv3.3/lib/python3.3/site-packages/gevent/hub.py", line 282
except Exception, ex:
^
SyntaxError: invalid syntax

From google it seems gevent is not Py3k compatible. Is there a forked version I am missing?

Taxonomy missing

I found an other example in the documentation:

from bioservices.apps.taxonomy import Taxon

Result:

ImportError: No module named taxonomy

Using version 1.4.0

QuickGO Term oboxml format returns HTML format

I recently open scripts from a project for my colleagues, began few months ago.
I was surprised that the call format method was changed, but I changed all the impacted scripts.
Now I am confronted to a new trouble. When I send a request to get a GO Term in OBOXML format, bioservices sends me a HTML format. I didn't see anything about this issue on the changelog.

Example:
Get the OBOXML format for the term GO:0000016

from bioservices import QuickGO as qgo
qgo = qgo()
term = qgo.Term("GO:0000016", frmt="oboxml")
print term

I checked the source script, by curiosity, and I wonder if the problem could be the parameters (script quickgO.py, line 116, 'frmt' in place of 'format'?)

Possible to get GO term definitions?

Is it possible to get the definitions of GO terms with bioservices? I checked out the docs, but could find no GO examples.

BioMart does not have them afaicr.

Currently outsourcing this to the R GO.db package.

Possible to access a KEGG entry without specifying the associated organism?

I added a detailed question on stackoverflow.

In short:

Let's say I am interested in a certain gene (e.g. b3640) and I do not know to which organism it belongs, is it then possible to get all the information for this gene without specifying the organism?

For example

from bioservices import *
kegg_con = KEGG()
res = kegg_con.get('b3640', parse=True)['NAME']

does not work since the organism is not specified. When it is specified, it all works as intended

kegg_con.get('eco:b3640', parse=True)['NAME']

returns

[u'dut']

When I try to determine the associated organism by using

kegg_con.find('genes', 'b3640')

the desired entry is not found, unfortunately.

So my two questions are therefore:

  1. Is there a way so that I can access the information about a gene just based on its gene ID without specifying the organism it belongs to?

  2. What would be the best way to retrieve the information to which organism the gene belongs? And why doesn't find fails when I search for the E.coli gene?

ensembl

Need to finalise the wrapping

biomodels contains ascii characters unicode error

>>> from bioservice import BioModels
>>> b = BioModels()
>>> b.getModelSBMLById('MODEL1006230101')
 <repr(<suds.sax.text.Text at 0x5767e90>) failed: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 5171: ordinal not in range(128)>

There seem to be an error but actually, this message is shown if we call type the variable name.

For instance this work without any message:

>>> result = b.getModelSBMLById('MODEL1006230101')

print functions works as well, so this is a repr issue

This is not impotant but one way to fix this is to call encore('utf8') on the output

KEGG parse for keyword "SEQUENCE"

Hi,
I get the following warning for a few compounds (eg. C15682, C12045, ...) with parser:

from bioservices import KEGG
k=KEGG()
c = k.get('C15682')
cp = k.parse(c)

Warning. Found keyword SEQUENCE, which has not special
parsing for now. please report this issue with the KEGG
identifier ( C15682 Compound) into github.com/bioservices. Thanks T.C.

Bad arguments in xmltools.readXML class constructor

Here is the current constructor of readXML class:

class readXML(easyXML):
    def __init__(self, filename, fixing_unicode=False, encoding="utf-8"):
        url = urlopen(filename, "r")  # the bad function call...
        self.data = url.read()
        super(readXML, self).__init__(self.data, fixing_unicode, encoding)  # ...and the outdated constructor call

The second parameter of the urlopen is data, but in this case no data is being passed to the server request through the constructor, since this "r" argument is hardcoded... I don't know where it came from, but doesn't make sense. Also, in Python 3 this parameter must be of type bytes, so in this case we get an exception.

Even if this call succeeds, the fixing_unicode parameter on the parent's constructor was removed some time ago, so the super().__init__() call will fail too.

I will make a pull request fixing it soon.
Thanks.

BioCyc Interface

It's great to see a multi-service package like this being developed! I found you via this post here

I noticed that there isn't an API for BioCyc yet.

I have an implementation of one here here that uses the Web API to allow browsing of BioCyc objects. See the example notebook for an example. Improving the API to seamlessly support a local PathwayTools server is on the someday todo list.

Would you be interested in merging this into the bioservices package? It seems a natural fit

uniprot tab names to be updated

After discussions with Klemens, it appear that we can add those columns:

[1] comment(x) where x can be any of the comment types, like comment(FUNCTION)
[2] database(y) where y can be any of the cross references, like database(CCDS) or database(InterPro)
[3] lineage-id(z) where z can be any of a number of taxonomic ranks, like lineage-id(PHYLUM) or lineage-id(GENUS); returns taxid, e.g. 2759
[3a] lineage(z), see [3] but returns the name, e.g. Eukaryota
[4] feature(a) where a can be any of the feature keys, like feature(DISULFIDE BOND)
[5] version(entry)
[6] sequence-modified
[7] proteome, returns proteome identifiers
[...]

Uniprot return integer

I'm using bioservices after an automatic BLAST, it searches the accesionnumber in Uniprot and retreives the available data. However I check if their is a result using if len(res) != 0, this works well but sometimes I get the following error message: if len(res) != 0: TypeError: object of type 'int' has no len().
But after printing the concerning variable (res) I turns out to be blank.
I hope someone know why this is happening?

Compound DBLINKS KEGGParser error!

from bioservices import *
kegg = KEGG()
c = kegg.parse(kegg.get('cpd:C00087'))

c['DBLINKS']
{u'CAS': u'7704-34-9 10544-50-0 PubChem: 3387 ChEBI: 17909 26833 3DMET: B04617 N
IKKAJI: J3.750H'}

kegg parser improvments

  • DBLinks entry is not parsed as a dictionary.
  • set a parse parameter in the get method to so that entries can be parsed automatically by default.
  • add missing INTERACTION/STR_MAP
  • strip NTSEQ and AASEQ

to be commited soon

Example fails

I just installed bioservices and copy pasted the example code in ipython. It didn't work.

 from bioservices import UniProt
 u = UniProt()
 data = u.search("zap70+and+taxonomy:9606", format="tab", limit=3, columns="entry name,length,id, genes")

Result:

TypeError: search() got an unexpected keyword argument 'format'

test suite takes too long

The test suite takes 15 minutes. We should decrease the amount of time it takes by having a sub set of tests that will be run all the time after each commit and another that will be run before a release.

Inconsistent behaviour of KeggParser

Hi,

KeggParser behaves differently for the same attribute in different entries, for instance if a compound has only one name KeggParser returns a string, although if it has more than one it returns a list. This makes it hard to automatically parse all the compounds names.

I would suggest that attributes that can be lists should always be returned as lists independently of the number of elements found. This happens with other attributes, e.g. reactions.

Thanks.

Cheers,

Version 2

  • XML is an issue in many places. This is not bioservices issue but rather a natural difference between choices made by various organization. The easyXML class is very simple and simply parse the xml thanks to beautifulsoup4. The xmltodict package may be very useful for that purpose.
  • Output in dictiobnary, json is good but most of the times people want to do something with it such as plotting, statistics and so on. Right now, bioservices has hardly any dependencies but Pandas would be great to have. This means that matplotlib and numpy will be required but this would be a great addon.
  • Finalise the remaining missing package such as PDB and Ensembl (almost there)
  • use pandas officially
  • Wikipathway
    • refactoring from WSDL to REST
    • use Pandas
    • missing functionalities to be implemented (not those with login though)

Version 1.5

Before 2.0, we should finalise pending issues on bioservices to have a final stable version.

  • Finalise EUtils
  • Finalise other modules if needed
  • Get taxon in ENA
  • lineage using ENA

Using UniProt Service in C#

Hi All,

I am unable to import the UniProt Service in C# using IronPython.
Please let me know how can get it done

Thanks,

Missing SEQUENCE in KEGGParser

When parsing multiple kgml files I get the warning saying SEQUENCE is missing. Everything still functions. It happens for kegg compounds that are peptide sequences. Seems to be missing from the list on line 1374 of kegg.py. It suggested to pass it along to the repo.

uniprot module cleaning

in the mapping method, right now we are limited in the number of requests but is handle thanks to the multi_mapping. In fact, we can just merge the 2 methods and use a http post request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.