glarue / jgi-query Goto Github PK
View Code? Open in Web Editor NEWA simple command-line tool to download data from Joint Genome Institute databases
License: Mozilla Public License 2.0
A simple command-line tool to download data from Joint Genome Institute databases
License: Mozilla Public License 2.0
Hi,
I have to download specific bacterial genomes from JGI. Is there a way to download them in bulk through their submission IDs via jgi-query?
Thanks,
Marco
I can't tell whether this is possible using jgi-query or with the JGI API in general. I would like to download all of their bacterial genomes if at all possible but can't find a way to get a list by kingdom.
Can you provide any guidance here?
A
In many cases, files are stored on tape, and the download seems to have a problem. I am unfamiliar with Python to make a pull request, but one partial workaround could be if the same file can be found on tape and hard drive, the file on the hard drive is selected. Thanks if you have time to check this!
Hi Glarue,
I am trying to using the following command to download the files in the projects, but I get an error message and the file with the right name but wrong content:
python jgi-query.py -x get-directory.xml # the xml file is downloaded from the project download page
#https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?organism=TheHunmicrobiome#
by click 'Open Downloads as XML '
user name and password #fine
2:2216 # a protein seqs file I want download
Total download size of selected files: 693.23 KB
Continue? (y/n): y
Downloading '81031.assembled.faa' using command:
curl http://genome.jgi.doe.gov/EubpyrIsolGenome/download/_JAMO/56f1982d7ded5e7f7b938de5/81031.assembled.faa -b cookies -c cookies -L > 81031.assembled.faa
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 287 100 287 0 0 101 0 0:00:02 0:00:02 --:--:-- 101
0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0
100 9122 0 9122 0 0 2240 0 --:--:-- 0:00:04 --:--:-- 2240
Finished downloading all files.
ERROR: '81031.assembled.faa' appears to be malformed and will be left unmodified.
Keep temporary files ('/home/mpi/pengfei/Hungate1000p/get-directory.xml' and 'cookies')? (y/n): n
Removing temp files and exiting
would you please help checking which step I am doing wrong?
Thanks
Best,
Pengfei
Hello,
I wanted to download the entire fungal database, but the tool is not responding. Do you have any solution to recommend ?
It takes a lot of time and in the end it gives me an empty XML file.
Thanks in advance.
ERROR :
#-------------------------------------------------------------------------------------------
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 92 0 92 0 0 0 0 --:--:-- 0:10:00 --:--:-- 22
Retrieving information from JGI for query 'fungi' using command 'curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get-directory?organism=fungi' -L -b cookies > fungi_jgi_index.xml'
Traceback (most recent call last):
File "/shared/ifbstor1/projects/HE/FungiDB/JGI-db/jgi-query-main/jgi-query.py", line 1151, in <module>
if not any(v["results"] for v in list(file_list.values())):
AttributeError: 'NoneType' object has no attribute 'values'
#--------------------------------------------------------------------------------------------
FILE XML : fungi_jgi_index.xml
<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
Hello, when I use python3 ./jgi-query/jgi-query.py --xml get-directory.xml to download files from JGI genome portal,
I get the following download error. Do you know what the issue could be?
Thanks!
Downloading '7393.1.70539.TCGAAG.fastq.gz' using command:
curl -m 120 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/Poptrisequencing_78/download/_JAMO/5254b441067c0136350e4f73/7393.1.70539.TCGAAG.fastq.gz' -b cookies > 7393.1.70539.TCGAAG.fastq.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Trying '7393.1.70539.TCGAAG.fastq.gz' again due to download error (1/4):
Hi @glarue
I'm running the script in a remote server to download the *.tar.gz files from bacterial groups (as suggested here: #4).
Every time I run this command (starts in tar.gz not shown):
python3 jgi-query.py tenericutes -r '.tar.gz.' --retry_n 0 -c
I get something like this, on every single genome ID contained in tenericutes:
Downloading '2582580514.tar.gz' using command:
curl -m 120 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get_tape_file?blocking=true&url=/Comgenmetab10417/download/_JAMO/53e5233f0d87856ba82b2ddc/2582580514.tar.gz' -b cookies > 2582580514.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:01:59 --:--:-- 0
curl: (28) Operation timed out after 120000 milliseconds with 0 bytes received
I also run it with default retry number (4) and it happens the same. I hope you can guide me through this.
Many thanks in advance
Hello @glarue
I am launching the script to retrieve all the fungal assembly sequences in fasta format, but it is showing me this error:
`python3 jgi-query.py fungi
Retrieving information from JGI for query 'fungi' using command 'curl 'https://genome.jgi.doe.gov/portal/ext-api/downloads/get-directory?
organism=fungi' -L -b cookies > fungi_jgi_index.xml'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 92 0 92 0 0 0 0 --:--:-- 0:10:00 --:--:-- 28
Traceback (most recent call last):
File /JGI-db/jgi-query-main/jgi-query.py", line 1151, in
if not any(v["results"] for v in list(file_list.values())):
AttributeError: 'NoneType' object has no attribute 'values'`
Do you have any idea how to download all the fungal genome sequences?
Love the idea of this tool--what would you think of making it a package in Biopython so users could just do from Bio import jgi-query
, or even making it a single function in Biopython? I don't know anything about how you would do this, but I'm sure the folks over there would be happy to help, and I think it would make it even easier to deploy and use.
You have a reference to creating an account at signon.jgi-psf.org. Please have it use the current server address: signon.jgi.doe.gov
So first off thanks for this tool, it's very useful.
My Problem:
I am trying to create an XML file from a very large database on jgi, and there seems to be a 10 minute runtime limit. Is this something built in or can it be changed?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.