Comments (5)
Thanks for the bug report @dweemx. From a first look, I can confirm this is indeed a bug. I will revert with a possible solution/explanation shortly.
from pysradb.
HI @dweemx, It looks like the origin of this bug is at the NCBI's search interface. Looking up SRP125768
on https://www.ncbi.nlm.nih.gov/sra only shows up 128 hits while the total hits clearly should be 136 (corresponding to the total runs). These are the missing run ids:
'SRR6327103', 'SRR6327106', 'SRR6327114', 'SRR6327120', 'SRR6327118', 'SRR6327122', 'SRR6327135', 'SRR6327116'
I will have to look for a way to ensure such runs are not missed. Thanks once again for reporting this.
from pysradb.
Hi,
I contacted the SRA team and they told me that there was an issue with the SRA file pairing system when the data was ported from GEO to SRA database. This issue should be fixed now.
However, some samples are still missing when I'm using SRAweb
: 'SRR6327106', 'SRR6327114', 'SRR6327120', 'SRR6327118', 'SRR6327122', 'SRR6327116'
from pysradb.
Thanks for the update @dweemx. It seems https://www.ncbi.nlm.nih.gov/sra/?term=SRP125768 still sends only 128 results. I will have time to work on a way to fix this in the coming few weeks. Thanks for your patience and sorry for the trouble this has been causing you.
from pysradb.
Hi @dweemx
Thanks for your patience. I was finally able to fix this in v0.9.9.
See this notebook for example with this ID: https://colab.research.google.com/drive/1C60V-jkcNZiaCra_V5iEyFs318jgVoUR
The web mode's default --detailed
output gives all the metadata you see on SRA's run table.
> pysradb metadata SRP125768 --detailed | head
study_accession experiment_accession experiment_title experiment_desc organism_taxid organism_name library_strategy library_source library_selection sample_accession sample_title instrument total_spots total_size run_accession run_total_spots run_total_bases run_alias experiment_alias source_name age genotype/variation tissue genotype
SRP125768 SRX4084637 GSM3142622: w1118_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142622: w1118_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301695 NextSeq 500 3552575 79516196 SRR7166639 3552575 176271295 GSM3142622_r1 GSM3142622 w1118_1d_WholeBrain_Unstranded_RNA-seq 1 Day W[1118] brain NaN
SRP125768 SRX4084636 GSM3142621: w1118_1d_WholeBrain_Stranded_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142621: w1118_1d_WholeBrain_Stranded_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301693 NextSeq 500 4513696 100655283 SRR7166638 4513696 220693988 GSM3142621_r1 GSM3142621 w1118_1d_WholeBrain_Stranded_RNA-seq 1 Day W[1118] brain NaN
SRP125768 SRX4084635 GSM3142620: DGRP-551_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142620: DGRP-551_1d_WholeBrain_Unstranded_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301694 NextSeq 500 19374029 433332434 SRR7166637 19374029 961111968 GSM3142620_r1 GSM3142620 DGRP-551_1d_WholeBrain_Unstranded_RNA-seq 1 Day DGRP-551 brain NaN
SRP125768 SRX4084634 GSM3142619: DGRP-551_1d_WholeBrain_Stranded_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142619: DGRP-551_1d_WholeBrain_Stranded_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301692 NextSeq 500 2936449 65552609 SRR7166636 2936449 145074237 GSM3142619_r1 GSM3142619 DGRP-551_1d_WholeBrain_Stranded_RNA-seq 1 Day DGRP-551 brain NaN
SRP125768 SRX4084633 GSM3142618: DGRP-551_1d_WholeBrainNuclei_Unstranded_Rep2_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142618: DGRP-551_1d_WholeBrainNuclei_Unstranded_Rep2_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301691 NextSeq 500 24342212 458751469 SRR7166635 24342212 1207043823 GSM3142618_r1 GSM3142618 DGRP-551_1d_WholeBrainNuclei_Unstranded_RNA-seq 1 Day DGRP-551 brain NaN
SRP125768 SRX4084632 GSM3142617: DGRP-551_1d_WholeBrainNuclei_Unstranded_Rep1_RNA-seq; Drosophila melanogaster; RNA-Seq GSM3142617: DGRP-551_1d_WholeBrainNuclei_Unstranded_Rep1_RNA-seq; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301696 Illumina HiSeq 4000 7398351 236600904 SRR7166634 7398351 551705108 GSM3142617_r1 GSM3142617 DGRP-551_1d_WholeBrainNuclei_Unstranded_RNA-seq 1 Day DGRP-551 brain NaN
SRP125768 SRX4084631 GSM3142616: Adapted_SMART_seq2_R23E10_Cell_9; Drosophila melanogaster; RNA-Seq GSM3142616: Adapted_SMART_seq2_R23E10_Cell_9; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301688 NextSeq 500 267487 6409898 SRR7166633 267487 13266487 GSM3142616_r1 GSM3142616 Adapted_SMART_seq2_R23E10_Cell 0-7 Days R23E10-Gal4 x UAS-CD8::GFP brain NaN
SRP125768 SRX4084630 GSM3142615: Adapted_SMART_seq2_R23E10_Cell_8; Drosophila melanogaster; RNA-Seq GSM3142615: Adapted_SMART_seq2_R23E10_Cell_8; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301690 NextSeq 500 192550 4678011 SRR7166632 192550 9548043 GSM3142615_r1 GSM3142615 Adapted_SMART_seq2_R23E10_Cell 0-7 Days R23E10-Gal4 x UAS-CD8::GFP brain NaN
SRP125768 SRX4084629 GSM3142614: Adapted_SMART_seq2_R23E10_Cell_7; Drosophila melanogaster; RNA-Seq GSM3142614: Adapted_SMART_seq2_R23E10_Cell_7; Drosophila melanogaster; RNA-Seq 7227 Drosophila melanogaster RNA-Seq TRANSCRIPTOMIC cDNA SRS3301689 NextSeq 500 199223 4833365 SRR7166631 199223 9885888 GSM3142614_r1 GSM3142614 Adapted_SMART_seq2_R23E10_Cell 0-7 Days R23E10-Gal4 x UAS-CD8::GFP brain NaN
Please let me know if you run into any issues.
from pysradb.
Related Issues (20)
- Stats not saved in output file HOT 3
- [ENH] MD5 value is not HOT 1
- [ENH] Pysradb doesn't work on "superseries" HOT 1
- [BUG] requests.exceptions.ConnectionError
- [BUG] SRX to SRR produces incorrect results when multiple SRRs are present
- [BUG] linux-64 Conda version out of date HOT 7
- [BUG] varying number of columns in output
- [BUG] gse-to-srp not producing results HOT 1
- [BUG] gse_to_srp returns an error in Python API HOT 1
- [ENH] Include data processing steps, reference to which the reads were aligned or if possible lab protocol into the main table HOT 1
- [BUG] The error arises from setting a deprecated value for the "display.max_colwidth" option in pandas. HOT 2
- [BUG] Parsing error in gse-to-srp HOT 1
- installation using conda fails with UnsatisfiableError HOT 3
- [BUG] HOT 1
- ValueError: Value must be a nonnegative integer or None HOT 1
- Possible missing keys in esearch response results HOT 1
- Data download is interrupted after a few minutes HOT 7
- Filtering results by instrument type HOT 3
- [ENH] Super useful package!
- [BUG] cannot download a single experiment from command line HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pysradb.