qiime2 / docs Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://docs.qiime2.org
License: BSD 3-Clause "New" or "Revised" License
Home Page: https://docs.qiime2.org
License: BSD 3-Clause "New" or "Revised" License
@thermokarst reported issues with make html
on his macOS machine and on various Linux VMs. It sounds like commands marked with :no-exec:
are being executed. @thermokarst is going to work with me to debug this issue. Note that @gregcaporaso and I didn't have this issue on our OS X machines.
Came up at NH Workshop
This question often comes up -- when should the output be an artifact vs a visualization? Having some guidelines in place would be helpful.
This should illustrate importing demultiplexed or multiplexed reads, and work from the moving pictures tutorial.
The table that is being used in the artifact api doc changed, but I didn't update the output from the commands that print to the terminal. We should update this to avoid confusion.
The r
packages installed as part of bootstrapping a QIIME2 environment apparently have a curl
requirement. It looks like the recipe uses my system-installed curl, but, conda rewrites the certs path. Running make html
with the -s
removed on a curl command:
Extension error:
Command 'curl https://codeload.github.com/qiime2/q2studio/zip/0.0.6 -o q2studio-0.0.6.zip' exited with non-zero return code 77.
stdout:
stderr:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (77) error setting certificate verify locations:
CAfile: /Users/matthew/miniconda3/envs/q206/etc/pki/tls/certs/cacert.pem
CApath: none
We have had a few curl
-related issues pop up lately, maybe it is worth looking at wget
instead?
References
Moving this here.
@thermokarst and I were chatting about this today. Commands in the docs are a little magical because output artifacts/visualizations aren't named with a .qza
/.qzv
extension. When only specifying the basename of the output file (sans extension), it isn't clear how the .qza/.qzv extension factors into output files. It also is hard to determine if the output filename is an artifact or visualization b/c the extension isn't in the output name. Using the basename only also makes it ambiguous whether the output is being saved as a directory or a file without an extension.
I had to explain this behavior several times during the Phoenix workshop, and it seemed to confuse users because .qza/.qzv files were a new concept that was just introduced, and the commands don't explicitly reflect how those extensions relate to output files.
Can we use explicit file extensions in the docs, and note somewhere that if the extension is omitted, it will automatically be appended?
Change to https://slack.qiime2.org/
Note, this subdomain isn't set up yet. Will update here when ready.
It would be great if Sphinx and/or our commandblock
directive supported multiline commands:
$ qiime \
tools \
import \
--help
Create page(s) for resources similar to QIIME 1's resources.qiime.org. This will include:
I have fasta+qual files. Qiime1 has convert_fastaqual_fastq.py. What's the equivalent in qiime2? Or, even better, how to create a qiime2 artifact directly from the fasta+qual+mapping files? I looked here, which seems to be the appropriate location:
https://docs.qiime2.org/2.0.6/tutorials/import-sequence-data/
zsh tab-completion support was added to q2cli in qiime2/q2cli#100
Travis now supports a directive for long-running commands, so it should be possible to hook up the full doc build (make html
) to Travis. This will add significant time to the Travis build (~20-30 mins) but it seems worth it to catch any build errors in the docs before merge.
Low-resolution displays make it hard to follow what is going on with long commands.
Blocked by #37
I am wondering if we should do something like the Django docs, where we provide a "version selector", and also some way to indicate that an old doc is "stale" or "outdate" (see the red bar at the top of the page).
@BenKaehler, would you be available to submit a tutorial illustrating how to train new classifiers, including the primer-based sequence trimming step?
References
Frozen-Flask came up in discussion this week.
Pull in info from the forum --- a lot of work went into providing detailed answers, we should be able to recycle a fair amount of that here.
Notes from office hours:
It appears that different versions of cURL handle redirects differently. wget
works, but it does not ship with OSX by default. Versions of cURL observed to have the problem:
Linux:
$ curl --version
curl 7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Protocols: tftp ftp telnet dict ldap ldaps http file https ftps scp sftp
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz
OSX (Sierra):
curl --version
curl 7.49.1 (x86_64-apple-darwin16.0) libcurl/7.49.1 SecureTransport zlib/1.2.8
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets
The new name is --p-sampling-depth
.
In the feature-classifier.rst document we import aligned reads:
qiime tools import \
--type FeatureData[Sequence] \
--input-path aligned_85_otu_sequences.fasta.gz \
--output-path 85_otus.qza
and then run extract-reads
where we extract based on the primers:
qiime feature-classifier extract-reads \
--i-sequences 85_otus.qza \
--p-f-primer GTGCCAGCMGCCGCGGTAA \
--p-r-primer GGACTACHVGGGTWTCTAAT \
--p-read-length 100 \
--o-reads ref-seqs.qza
@BenKaehler, can we import unaligned sequences here? It looks like we just strip the gap characters, and unaligned reads are easier for users to provide in general (and also work directly if we want to train a classifier without extracting reads). So importing unaligned reads is preferable if it works.
This could be a separate tutorial from the importing tutorial, or we could modify the importing data tutorial to include exporting.
The importing data tutorial shows how to import a .biom file in BIOM v1.0 format. It'd be helpful to include an example of importing a BIOM v2.1 file, which is the more common case (and QIIME doesn't autodetect v2.1). This question came up on the forum here.
Improvement Description
I have reviewed the documentation and tutorial with an eye toward figuring out how a completely novice user (e.g., new microbiology grad student without any bioinformatics or programming experience) would view the material. Most of the documentation is fantastic (esp. for alpha) and I love new features, such as the glossary, that improve usability over the qiime1
docs. I have various suggestions below, labeled by what I see as the importance: "[high]", "[low]", or "[enhancement]" (the latter meaning it would enhance usability but is not currently a hindrance to understanding the docs). Some I expect are already planned anyway, but I hope my comments may help hammer these out.
This section needs more detail on the expected file names and other requirements for each type. For example, it is unclear that specific filenames are actually enforced for FeatureData[Sequence] (and I suspect for other semantic types as well). The following error is clear (to me), but it's best to avoid this via better documentation:
ValueError: Missing one or more files for EMPMultiplexedDirFmt: 'sequences.fastq.gz'
Need a directory of methods, to fill the same niche as qiime1's script index. Most are currently covered in the tutorial, but not all and this will only expand as qiime2 grows. One issue is that qiime2's methods are hidden within plugins, not free-standing commands, and hence just listing the plugins does not reveal all potential methods (and just a few names are not immediately transparent or are jargony). Some sort of function description index, written in plain english for new users, rather than a list of methods names, could be a useful way to approach this (and an enhancement above qiime1's script index, which was difficult to navigate and translate at times). Instead of listing the plugin or command, list the function. Each function will link to 1) the method entry on the plugin doc page or 2) a tutorial page for multi-step procedures (e.g., procrustes plots). For example, functions such as "demultiplex sequences", "build phylogeny", and "pick OTUs" could all be listed as functions.
qzv/qza formats are confusing, and as someone very familiar with qiime1 it took me some time to understand what these file formats are and why they are used. The rationale for these formats should be better documented, along with an explanation that these files can be unzipped to examine the contents. This rationale can link to the pages on semantic types and provenance tracking to discuss those topics. some discussion appears here but this should be more clearly documented here and elsewhere (perhaps on its own page that appears in the table of contents). Also make a note of this in the glossary.
As an aside (and I know it's too late to quip about this), I don't really like the choice of the term "artifact", because it has other meaning in biology, e.g., "sequencing artifact".
A discussion of the taxonomy format could be useful. Terms like "level 2" are used in the docs but are not immediately apparent to outsiders, nor will a google search be much help. This may be appropriate to include within a file format page (see recommendation below).
The "ported wiki documentation" is very useful, and I recommend continuing to build this as an archive of release docs if possible, rather than removing these pages. One frustration with the qiime1 site was that docs only covered the release version, and if working with an earlier version of qiime or reviewing a list of commands/files generated using an earlier version of qiime, the older docs no longer existed. As qiime2 grows, may I recommend keeping the "ported wiki documentation" as a table of contents (TOC) at the bottom of the current release docs TOC, which will link to TOCs for archived doc versions.
I LOVE the glossary, as it defines some of the lingo-y words that are new to qiime2. This should be on the reading list of everyone starting with qiime2, to whom "action" and "method" are otherwise more general terms, and "artifact" is not entirely intuitive. I wonder whether it would be useful to include separate glossaries on more general microbiome terminology, and on file types. I recommend separate, because this will keep the technical glossary pure and simple.
Microbiome Terminology: Much of this goes outside of the jurisdiction of qiime, but could be very useful to new users (and would give the developers control over the terminology). After all, users come from all backgrounds and qiime may be the first exposure to any kind of bioinformatics software, microbiome/ecology concepts, or all of the above for many users. For many of these terms, great explanations exist elsewhere on the web (though not necessarily with a simple google search), and a short sentence and link will suffice (and link to citation if appropriate). Some useful terms: distance matrix, OTU, feature table, demultiplex, barcode, index (see barcode), metadata, phiX, chimera, biom, metric (e.g., alpha diversity), (include alpha/beta diversity metrics in glossary, short sentences such as shown here and a link ideally to the original citation would suffice), alpha diversity, beta diversity, discrete (metadata), continuous (metadata), ordination, PCoA, richness
File Formats/Types: In many ways, this is should be similar to qiime1's file types page. A similar resource does not yet exist in qiime2. This is in part to describe file formats that are used in qiime2, and in part to describe how to input specific file types into qiime2 artifacts (yeah, yeah, could be more appropriately described in importing data but if that doc expands to include this you can link to the entries for each file format in that doc from this glossary). Some formats/terms to include: fasta, fastq, gz, qza, qzv, mapping file, biom, OTU table, feature table
Hope this all helps. I can elaborate on details / brainstorm more if prompted.
References
Ported from original issue.
Related a question on the forum. The q2studio
documentation doesn't mention that you have to be in your conda environment for it to be able to find qiime2
and run properly.
Currently there is information in the native install guide that will also be relevant to users who install QIIME 2 using the virtual machines.
Current Behavior
make html
will perform a make clean
first, effectively disabling incremental builds. The current commandblock
directive is not safe to use with incremental builds because data that has already been generated from a previous build will not be regenerated, and data that should be cleaned/removed in the current (incremental) run will remain as orphaned files in the build.
Proposed Behavior
Incremental builds should be possible if a hook can be added that will delete the current file's corresponding data assets directory (under source/assets/
) before any commands are executed in the file.
References
Ported from original issue.
Related to #36
References
See #107 for discussion.
Ported from original issue.
--p-counts-per-sample
→ --p-sampling-depth
(pending qiime2/q2-feature-table#45 and qiime2/q2-diversity#57)feature-table merge-taxa-data
→ taxa merge
feature-table view-taxa-data
→ taxa tabulate
(pending qiime2/q2-feature-table#46)feature-table view-seq-data
→ feature-table tabulate-seqs
(pending qiime2/q2-feature-table#46)Instead of having to run source tab-qiime
after activating a q2cli environment, users can edit a conda env file to run the command each time the environment is activated. Thanks @thermokarst for finding this 💎 !
Current Behavior
The fmt-tutorial-demux-*.qza
artifacts have no provenance as they were generated with 2.0.5 (I think).
Comments
This is not urgent, but should be addressed when we ultimately re-write this tutorial to include paired tests.
See this section, for an example.
The metadata.yaml
following the import of the raw sequences, and the metadata.yaml
following demux, indicate the data are phred 33. However, that does not appear to be accurate as the character set used includes characters defined outside of phred 33 encoding (e.g., "["):
$ funzip sequences.fastq.gz | head
@HWI-EAS440_0386:1:23:17547:1423#0/1
TACGNAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCTTGAGTGCAGTTGAGGCAGGGGGGGATTGGTGTG
+
hhhdHddddddddfehhfhhhghggfhhhfhhgggfhhgfgdfcfhehfdgfhggfggfggffgddfgdffdgdaagaaddcbdccc]a^ad__a]_____ba_`a`__^__\]^OWZR\Z\\WYTZ_U^BBBBBBBBBBBBBBBBBBBBBB
I checked with @gregcaporaso and he indicated that it made sense given the age of the data.
At the time of creation of this issue, I do not believe there is a functional impact with the inaccuracy of metadata.yaml
as quality scores are not interrogated explicitly by Q2 in this tutorial.
Similar to how we're linking to older QIIME 2 VB images.
Maybe this can be bumped up 100px-200px? It would be nice to see a rendering or two at either extreme.
"Sample type" is now "body site", update the prose.
In the description of semantic types (https://docs.qiime2.org/2.0.6/semantic-types/), RawSequences is missing. And, if there is any documentation on it (e.g., required format for creating an artifact from sequence data), Google does not know about it. All I was able to find was the tutorial that made use of it, but never really described it. I can read the source code and figure it out, but it would be much more efficient if it was well-documented somewhere.
I am a CS person moving into bioinformatics, if this background helps explain my thought processes.
Built docs are currently hosted via GitHub Pages. We'll need to transition to a Amazon S3 bucket at some point. Having a script to automate uploading built docs to a versioned subdirectory in an S3 bucket would be useful. Thanks @thermokarst for the idea!
Ported from original issue.
This was requested during the Iceland workshop. Basically protocols/recommendations for researchers starting a study: links to primers, PrimerProspector, EMP protocol, etc. See http://resources.qiime.org for some useful protocols to include.
In the q2 filtering tutorial, add in examples for filtering the feature table with two instances of a category using AND and OR SQL commands like those found in the docs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.