Giter VIP home page Giter VIP logo

jgi-query's Issues

downloading multiple datasets in bulk?

I can't tell whether this is possible using jgi-query or with the JGI API in general. I would like to download all of their bacterial genomes if at all possible but can't find a way to get a list by kingdom.

Can you provide any guidance here?
A

Files on tape can't be downloaded

In many cases, files are stored on tape, and the download seems to have a problem. I am unfamiliar with Python to make a pull request, but one partial workaround could be if the same file can be found on tape and hard drive, the file on the hard drive is selected. Thanks if you have time to check this!

failed when using jgi-query to download files wanted

Hi Glarue,
I am trying to using the following command to download the files in the projects, but I get an error message and the file with the right name but wrong content:

command:

python jgi-query.py -x get-directory.xml # the xml file is downloaded from the project download page

#https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?organism=TheHunmicrobiome#

by click 'Open Downloads as XML '

following the instructions:

user name and password #fine

file to download

for example

2:2216 # a protein seqs file I want download

I got this

Total download size of selected files: 693.23 KB
Continue? (y/n): y
Downloading '81031.assembled.faa' using command:
curl http://genome.jgi.doe.gov/EubpyrIsolGenome/download/_JAMO/56f1982d7ded5e7f7b938de5/81031.assembled.faa -b cookies -c cookies -L > 81031.assembled.faa
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 287 100 287 0 0 101 0 0:00:02 0:00:02 --:--:-- 101
0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0
100 9122 0 9122 0 0 2240 0 --:--:-- 0:00:04 --:--:-- 2240
Finished downloading all files.
ERROR: '81031.assembled.faa' appears to be malformed and will be left unmodified.
Keep temporary files ('/home/mpi/pengfei/Hungate1000p/get-directory.xml' and 'cookies')? (y/n): n
Removing temp files and exiting

and the file is wrong: not protein sequences, see attached

81031.assembled.faa.txt

would you please help checking which step I am doing wrong?
Thanks

Best,
Pengfei

Downloading the fungal database

Hello,
I wanted to download the entire fungal database, but the tool is not responding. Do you have any solution to recommend ?
It takes a lot of time and in the end it gives me an empty XML file.
Thanks in advance.

ERROR :
#-------------------------------------------------------------------------------------------

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    92    0    92    0     0      0      0 --:--:--  0:10:00 --:--:--    22
Retrieving information from JGI for query 'fungi' using command 'curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get-directory?organism=fungi' -L -b cookies > fungi_jgi_index.xml'


Traceback (most recent call last):
  File "/shared/ifbstor1/projects/HE/FungiDB/JGI-db/jgi-query-main/jgi-query.py", line 1151, in <module>
    if not any(v["results"] for v in list(file_list.values())):
AttributeError: 'NoneType' object has no attribute 'values'

#--------------------------------------------------------------------------------------------

FILE XML : fungi_jgi_index.xml

<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

Download error

Hello, when I use python3 ./jgi-query/jgi-query.py --xml get-directory.xml to download files from JGI genome portal,
I get the following download error. Do you know what the issue could be?

Thanks!


Downloading '7393.1.70539.TCGAAG.fastq.gz' using command:
curl -m 120 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/Poptrisequencing_78/download/_JAMO/5254b441067c0136350e4f73/7393.1.70539.TCGAAG.fastq.gz' -b cookies > 7393.1.70539.TCGAAG.fastq.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Trying '7393.1.70539.TCGAAG.fastq.gz' again due to download error (1/4):

Downloading error: curl: (28)

Hi @glarue
I'm running the script in a remote server to download the *.tar.gz files from bacterial groups (as suggested here: #4).

Every time I run this command (starts in tar.gz not shown):

python3 jgi-query.py tenericutes -r '.tar.gz.' --retry_n 0 -c

I get something like this, on every single genome ID contained in tenericutes:

Downloading '2582580514.tar.gz' using command:
curl -m 120 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/Comgenmetab10417/download/_JAMO/53e5233f0d87856ba82b2ddc/2582580514.tar.gz' -b cookies > 2582580514.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:01:59 --:--:-- 0
curl: (28) Operation timed out after 120000 milliseconds with 0 bytes received

I also run it with default retry number (4) and it happens the same. I hope you can guide me through this.

Many thanks in advance

Error downloading the fungal database.

Hello @glarue

I am launching the script to retrieve all the fungal assembly sequences in fasta format, but it is showing me this error:

`python3 jgi-query.py fungi

Retrieving information from JGI for query 'fungi' using command 'curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get-directory?
organism=fungi' -L -b cookies > fungi_jgi_index.xml'

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 92 0 92 0 0 0 0 --:--:-- 0:10:00 --:--:-- 28

Traceback (most recent call last):
File /JGI-db/jgi-query-main/jgi-query.py", line 1151, in
if not any(v["results"] for v in list(file_list.values())):
AttributeError: 'NoneType' object has no attribute 'values'`

Do you have any idea how to download all the fungal genome sequences?

Incorporate into Biopython

Love the idea of this tool--what would you think of making it a package in Biopython so users could just do from Bio import jgi-query, or even making it a single function in Biopython? I don't know anything about how you would do this, but I'm sure the folks over there would be happy to help, and I think it would make it even easier to deploy and use.

10 Minute time limit

So first off thanks for this tool, it's very useful.

My Problem:
I am trying to create an XML file from a very large database on jgi, and there seems to be a 10 minute runtime limit. Is this something built in or can it be changed?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.