biocommons / eutils Goto Github PK

simplified searching, fetching, and parsing records from NCBI using their E-utilities interface

License: Apache License 2.0

Makefile 3.98% Python 94.92% Perl 1.09%

eutils's Introduction

This project exists only to ensure that the top-level biocommons namespace is correctly declared as a namespace package.

This trivial package was pushed to pypi in order to reserve the biocommons namespace (contributions are welcome!).

Steps:

pyvenv venv
source venv/bin/activate
pip install wheel
python setup.py register
python setup.py sdist bdist bdist_egg bdist_wheel upload

eutils's People

Contributors

Stargazers

Watchers

eutils's Issues

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

update xmlfacades to better match xml tags and cardinality

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #121
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

E.g., gene.py parses Entrezgene-Set assuming a single gene record, which is the only reply as currently used. However, this prevents using the class for multi-gene replies, which in turn precludes supporting webenv searches and iteration.

extract authors information

Originally reported by Christian Buhtz (Bitbucket: buhtz, GitHub: buhtz) in biocommons/eutils #133
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

Sorry for asking this way but I wasn't able to find a mail-adresse on the project website or in the sources. Or maybe I do something wrong with BitBucket?

I just want to know if you know a way or think if this could be possible to extract authors informations (university he/she is working in) via eutils (or any other system using PubMed-API, Entrez or whatever it is named)?

I can see this informations on the PubMed website for each article but looking for a way to extract that via the API.
I added an exmpale of what I mean. The string in the green box is that what I want. ;)

refactor queryservice to separate querying, throttling, and caching

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #123
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

queryservice implements too much functionality. It should be refactored into:

a basic query service that executes queries, period
a thin throttling service based on the basic query service
a caching service on the throttled service (since that's all the caching service should use)

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

optional: implement pubmed parsing

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #13
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

Links

imported from: CORE-13 (Invitae access required)

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

IndexError retrieving "comment" field from GBSeq object

Originally reported by khyox (Bitbucket: khyox, GitHub: khyox) in biocommons/eutils #126
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

I am using the last dev release. Reproducing steps:

#!verbatim

>>> import eutils.clientx as ECX
>>> ecx = ECX.ClientX()
>>> gbseq = ecx.fetch_nuccore_by_ac("DQ225748.1")
>>> gbseq.gi
81238210
>>> gbseq.genes
['bla', 'eyfp', 'bar', 'aadA', 'hph', 'uidA']
>>> gbseq.comment
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/kyox/python3.4/site-packages/eutils/xmlfacades/gbset.py", line 26, in comment
    return self._xmlroot.xpath('/GBSet/GBSeq/GBSeq_comment/text()')[0]
IndexError: list index out of range

sqlite3 fails if directory ~/.cache is not previously created

Originally reported by khyox (Bitbucket: khyox, GitHub: khyox) in biocommons/eutils #127
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

Suggested solution: sqlitecache.py to test the existence of db_path and make the directory if not still there.

Support large search result sets

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #124
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

NCBI's eutiltities interface very nicely supports large search result sets by sending results in chunks. The eutils currently only handles the first chunk.

See http://www.ncbi.nlm.nih.gov/books/NBK25500/#chapter1.Demonstration_Programs
Perl excerpt to generation the continuation URLs:

for($retstart = 0; $retstart < $Count; $retstart += $retmax) {
   my $efetch = "$utils/efetch.fcgi?" .
                "rettype=$report&retmode=text&retstart=$retstart&retmax=$retmax&" .
                "db=$db&query_key=$QueryKey&WebEnv=$WebEnv";

The purpose of this issue is provide full support for large result sets using webenv histories.

Possible implementation:
This seems like an obvious use of python iterators for results. I'd like to keep the eutils.xmlfacades.esearchresults.ESearchResults as parsing-only. However, the interface methods are appropriate. So, one implementation is to write an upper-level (eutils.esearchresults) that wraps the xmlfacade version, holds a reference to the client, and
provides an iterator over results. This upper-level ESearchResults would be passed back to callers in lieu of the xmlfacade version.

add support for fetching sequence slices

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #131
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

Add sequence fetch support to eutils

Consider modifying cache to not cache very large returns

Excerpt from hgvs dataprovider:

    url_fmt = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id={ac}&rettype=fasta"
    if (start_i is None or end_i is None):
        url = url_fmt.format(ac=ac)
    else:
        url_fmt += "&seq_start={start}&seq_stop={stop}"
        url = url_fmt.format(ac=ac, start=start_i + 1, stop=end_i)
    resp = requests.get(url)
    resp.raise_for_status()
    return ''.join(resp.content.splitlines()[1:])

When done, see https://bitbucket.org/biocommons/hgvs/issues/271/ .

PubMedArticle parsing fails in case of CollectiveName authors

Originally reported by Lawrence Lee (Bitbucket: lclee, GitHub: Unknown) in biocommons/eutils #125
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

Seen in PMID:22467948

<Author ValidYN="Y">
    <CollectiveName>Investigators of the Canadian Scleroderma Research Group</CollectiveName>
</Author>

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

Functions have a dict as a default argument

Originally reported by Hamish Downer (Bitbucket: foobacca, GitHub: foobacca) in biocommons/eutils #132
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

In querysearch.py:

def einfo(self, args={}):

def _query(self, path, args={}, skip_cache=False, skip_sleep=False):

This is a bad idea as if you add to args then the empty dict is no longer empty, and the next time you call it, you will have the extra stuff already in it. (Currently this doesn't happen, but it might bite you later).

The recommended way to write this is

def func(args=None):
      if args is None:
          args = {}

For background on this see:

(They both talk about lists, but the problem occurs with both).

move clientx to submodule

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #130
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

clientx is poorly thought out... indicate this by moving to submodule and marking it experimental

create sphinx project documentation

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #129
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

... and upload of course

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

dummy issue

filler issue created by bitbucket_issue_migration

"Documentation" is unclear

Originally reported by Paulo Nuin (Bitbucket: nuin, GitHub: nuin) in biocommons/eutils #120
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

#!python

import eutils.client
esr = ec.esearch(db='gene',term='tumor necrosis factor')

do not work.

#!python
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'ec' is not defined

I admit that checking the code I might be able to find out what to import, but at least add a proper example to the repository.

implement GBFeature

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #128
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

Lots of useful properties are buried in GBFeatures, such as exons, misc_features, CDS start & end, translation, xrefs.

Specifically, implement a GBFeatureTable class that returns GBFeatures, optionally with keyed lookups. e.g., get_features('exon')

add misc_feature support

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/eutils #119
Migrated by bitbucket-issue-migration on 2016-05-25 23:09:02

Apparently some genes still require misc_feature support. They are identifiable from current ncbi.txinfo.gz files by having no exons.

The solution is to 1) modify eutils to fetch misc_features and 2) modify sbin/ncbi-fetch to try for exons first, then misc_features.

example: PECAM1

biocommons / eutils Goto Github PK

eutils's Introduction

eutils's People

Contributors

Stargazers

Watchers

Forkers

eutils's Issues

Links

Links

Links

Recommend Projects

Recommend Topics

Recommend Org