sbslee / dokdo Goto Github PK

View Code? Open in Web Editor NEW

42.0 4.0 12.0 12.22 MB

A Python package for microbiome sequencing analysis with QIIME 2

Home Page: https://dokdo.readthedocs.io

License: MIT License

Python 100.00%

qiime2 microbiome-sequencing-analysis api cli visualization

dokdo's People

Contributors

Stargazers

Watchers

Forkers

ktbiotech khemlalnirmalkar aemohr caio-andrey yossarian92 yonghyun09 biota-inc genostack jetbluejetyj mmnashrullah dinindusenanayake sara-mashhadi-nejad

dokdo's Issues

alpha and beta diversity without qiime file

Hi @sbslee,
Can you add some codes to make alpha (Shannon, observed and evenness) and beta (Bray-Curtis and Jaccard index) with some plots from a normal text file?

If someone is not using qiime or shotgun data which comes as a normal text file, this code can be useful,
Of course, phylogenetic info cant be added here, but still, this is going to be useful,
There are vegan and phyloseq packages available but not straightforward codes with proper explanation,
Thanks,
Khem

correlation analysis

Hi @sbslee,
Does correlation (spearman/Pearson) analysis with FDR-correction fall under Dokdo's scope of work?
If yes, is it possible to have a heatmap with correlation values for many factors/features or some scatter plots for individual features?

Thanks
Khem

dokdo summarize error

Hello, I'm hosting a microbiome study these days. Currently, I am learning a lot and feeling grateful through a program called 'dokdo'. This is a wonderful kackage! In the process of performing dokdo pipeline, if you look at the 6th item (6. Summarize and Filter ASV Table), A parameter error occurs in the code of 'dokdo summarize table.qza'. You will be busy with a lot of research, but please check. Thanks. (I think the parameter item is missing, for example -i?)

Add the rank abundance curve plot

A user in the QIIME 2 forum suggested to add the rank abundance curve plot to Dokdo (see post). Below is an example from the post:

[taxa_abundance_box_plot] Deletion of notation of species not observed in samples

@sbslee

Hello Steven Lee,

Thank you for your kind answers to my persistent questions! Your answers are very helpful for visualization analysis.
I have a question regarding taxa box plot visualization.

Of the total of 28 samples I analyzed by NGS, only 4 samples were classified as Salmonella. So, it was observed in only 4 samples in the 'taxa bar plot'.

However, during the process of marking with a 'box plot', as shown below, i observed that boxes were marked at a little more than 0% of the relative abundance standard in all other samples.
As for other species, there was a problem that all samples were marked on the box plot even though they existed only in some samples. In this regard, is there a way to adjust the box so that only the existing samples appear?

I referred to the Parameters of the API docs, but it was difficult, so I would appreciate it if you could tell me which code to use.

Thank you for all your assistance.

taxa_names = ['Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Salmonella',]

ax=dokdo.taxa_abundance_box_plot(
    qzv_file,
    level=6,
    hue='sample-id',
    hue_order=['Sample-1', 'Sample-2', 'Sample-3', 'Sample-4', 'Sample-5', 'Sample-6', 'Sample-7', 'Sample-8', 'Sample-9', 
               'Sample-10', 'Sample-11', 'Sample-12', 'Sample-13', 'Sample-14', 'Sample-15', 'Sample-16', 'Sample-17', 'Sample-18', 
               'Sample-19', 'Sample-20', 'Sample-21', 'Sample-22', 'Sample-23', 'Sample-24', 'Sample-25', 'Sample-26', 'Sample-27', 
               'Sample-28'],
    taxa_names=taxa_names,
    show_others=False,
    pretty_taxa=True,
    pseudocount=True,
    palette='flare',
    figsize=(10, 7),
)

plt.legend(bbox_to_anchor=(1,1))
plt.tight_layout()

can i install without qiime2 in windows

Hi @sbslee
Is it possible to use dokdo through Jupyter notebook in windows without qiime2?
Sorry, if I missed something in your documentation section.
Thanks,

Reverse order of legend for taxa_abundance_bar_plot

Hi @sbslee

Are you able to advise how I can reverse the order of the legend of a taxa_abundance_bar_plot?

I've tried using:

handles, labels = ax2.get_legend_handles_labels()
ax2.legend(handles[::-1], labels[::-1])

But this results in a missing legend.

Thanks!

genus level average bar plot

Hi @sbslee ,
Thanks for your codes, much easier to make bar plots,
Please can you also suggest, how to make an average of all samples from each group and make a bar plot at the genus level from qzv file?
Thanks

can we use dokdo without qiime files

Hi @sbslee ,
Dokdo works great with qzv files,
Can we also use Dokdo for relative abundance data from txt files, instead of qiime's qzv files?
Thanks,

Add taxon name in the pcoa biplot

Hi @sbslee
I have plotted a pcoa biplot using this;

pcoa_results = dokdo.ordinate(qza_file, biplot=True, number_of_dimensions=10)

ax = dokdo.beta_2d_plot(
    pcoa_results,
    metadata=metadata_file,
    hue='body-site',
    figsize=(8, 8)
)

dokdo.addbiplot(pcoa_results, ax=ax, count=7)

plt.tight_layout()

I was wondering if there is a way to display taxon names instead of ASV ID on the biplot?

I tried using dokdo.api.common.pname(name, levels=None, delimiter=';') in the function like this:

pcoa_results = dokdo.ordinate(qza_file, biplot=True, number_of_dimensions=10)

ax = dokdo.beta_2d_plot(
    pcoa_results,
    metadata=metadata_file,
    hue='body-site',
    figsize=(8, 8)
)

dokdo.addbiplot(pcoa_results, ax=ax, count=7)

dokdo.api.common.pname(name, levels=3, delimiter=';')

plt.tight_layout()

But returned with NameError: name 'name' is not defined

Could you please help with this.

Thanks for your help and improving dokdo.

Thanks

Update `make-manifest` command to ignore `Undetermined.fastq.gz`

Current behavior of the command is to include undetermined files in the output. I would then manually remove the line with undertmined files from the manifest file. While this process has worked for me, this needs to be fixed.

Use of taxa_names in taxa_abundance_bar_plot

HI @sbslee ,

I'm trying to use the 'taxa_names' flag in 'taxa_abundance_bar_plot'. Without the 'taxa_names' I can show different taxa levels from my barplot.qzv file without issue. However, I would like to show just the class level from a specific taxa - e.g. show %abundance from classes from just k__Bacteria;_p__Proteobacteria.

I have tried a few different things with syntax etc (including putting the taxon='k__Bacteria;p__Proteobacteria' just below the file inputs), but get errors including:

KeyError: "['k__Bacteria;p__Proteobacteria'] not found in axis"

Could you please advise if what I am trying to achieve is possible? Or correct my syntax? I can upload the files if that is helpful.

Here what I'm running in Jupyter:

qzv_file = '/home/nano/Alert4_MVP/taxa-bar-plots-no-Unassigned.qzv'
metadata_file = '/home/nano/Alert4_MVP/TFP_C4BD_metadata.tsv'
taxon = 'k__Bacteria;p__Proteobacteria'

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(14, 7), gridspec_kw={'width_ratios': [9, 1]})
ax = dokdo.taxa_abundance_bar_plot(qzv_file,
                              metadata=metadata_file,     
                              ax=ax1,
                              level=3,
                              taxa_names=['k__Bacteria;p__Proteobacteria'], 
                             label_columns=['SampleName'],
                              cmap_name='tab20')

for ticklabel in ax.get_xticklabels():
    ticklabel.set_rotation(45)
ax.set_xlabel("Sample", fontsize = 12)  

dokdo.taxa_abundance_bar_plot(qzv_file,
                              ax=ax2,
                              level=3,
                              taxa_names=['k__Bacteria;p__Proteobacteria'],
                              cmap_name='tab20',
                              legend_short=True,
                              pname_kws=dict(levels=[3])
                              )

handles, labels = ax2.get_legend_handles_labels()
ax2.clear()
ax2.legend(handles[::-1], labels[::-1], loc='center left')
ax2.axis('off')

    
plt.tight_layout()

Thanks!

`alpha_rarefaction_plot` apparently ignores 'N/A' values when plotting

ModuleNotFoundError: No module named 'skbio'

Hi,

I installed dokdo using the following command when in conda environment qiime2-2022.8

$ git clone https://github.com/sbslee/dokdo
$ cd dokdo
$ pip install .

However, I do not understand where I have to run the below script for plotting the taxa barplot. I assumed I can run it on Jupyter Notebook and did that but got the below error. Please let me know where I am going wrong

import sys
sys.path.append('/home/streptomyces/Rupesh/Suzanne/aqua_fastq/qiime2/dokdo/')
import dokdo
dokdo.taxa_abundance_bar_plot(
    '/home/streptomyces/Rupesh/Suzanne/aqua_fastq/qiime2/taxa-bar-plots-nc-wobl-nmcl.qzv',
    figsize=(10, 7),
    level=6,
    count=8,
    legend_short=True
)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [7], in <cell line: 3>()
      1 import sys
      2 sys.path.append('/home/streptomyces/Rupesh/Suzanne/aqua_fastq/qiime2/dokdo/')
----> 3 import dokdo
      4 dokdo.taxa_abundance_bar_plot(
      5     '/home/streptomyces/Rupesh/Suzanne/aqua_fastq/qiime2/taxa-bar-plots-nc-wobl-nmcl.qzv',
      6     figsize=(10, 7),
   (...)
      9     legend_short=True
     10 )

File ~/Rupesh/Suzanne/aqua_fastq/qiime2/dokdo/dokdo/__init__.py:1, in <module>
----> 1 from .api import *

File ~/Rupesh/Suzanne/aqua_fastq/qiime2/dokdo/dokdo/api/__init__.py:1, in <module>
----> 1 from .common import get_mf, pname
      2 from .ordinate import ordinate
      3 from .num2sig import num2sig

File ~/Rupesh/Suzanne/aqua_fastq/qiime2/dokdo/dokdo/api/common.py:15, in <module>
     13 from matplotlib.patches import Patch
     14 import seaborn as sns
---> 15 import skbio as sb
     16 from skbio.stats.ordination import OrdinationResults
     17 from scipy import stats

ModuleNotFoundError: No module named 'skbio'

#I am new to API, your guidelines will be a big help!

exclude "others" from relative abundance and plot

Hi @sbslee

How can i remove "others" from the relative abundance bar-plot and plot the rest of them?
I understand that bar plot will not be 100%.
If you see the attached image, the order of colors in bars (e.g. orange at the bottom) and legends (orange on top) are almost in opposite directions. How can I keep both of them in the same order?

Thanks

KeyError: "['merged'] not in index", after running "dokdo.denoising_stats_plot"

Hi Seung-been,

Thanks a lot for your wonderful code.

As you can see below, I could not attach "qza_file" here.
"We don’t support that file type.
Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP."

So, I shared the files ("qza_file", "qzv_file" and "metadata_file") in a Google Drive folder. Here is the link to the folder:
https://drive.google.com/drive/folders/1lRKmI_f51IwwHihucV78LvMJcg3G6Kpe

Thanks,
Sara

P.S. code, error, explanation, question:

---Code-------------------------
qza_file = '/home/sara/S1B54/stats-dada2_16S_S1B54.qza'
metadata_file = '/home/sara/S1B54/metadata_sara_R1_Batch1_V3.tsv'

dokdo.denoising_stats_plot(
qza_file,
metadata_file,
'name_abr_source',
figsize=(8, 6)
)

plt.tight_layout()

---Error-------------------------
KeyError: "['merged'] not in index"

---Explanation-----------------------
The columns of my " denosing-stat.qzv" file:

sample-id, input, filtered, percentage of input passed filter, denoised, non-chimeric, percentage of input non-chimeric

---Question-----------------------
As you can see my file does not have these two columns:
merged, percentage of input merged

Ideas for the next Dokdo version (1.3.0)

These ideas will be implemented to the 1.3.0-dev branch before the official release.

Ideas that are already implemented:

Update the ordinate() method so that the user can choose not to perform rarefying. This is because the user may wish to provide an already rarefied/normalized table. The current behavior requires rarefying. Done, see 686ad0c.
~~Update the ordinate() method so that the user can provide sampling_depth manually for rarefying. Currently, the default behavior is to rarefy based on the sample with minimum depth.~~ Done, see 686ad0c.
~~Update the ordinate() method so that the user can provide qiime2.Artifact as feature table. The current behavior is to accept only str (i.e. file path).~~ Done, see 7cfec8f.
~~Add a new plotting method for scree plot from OrdinationResults.~~ Done, see 1e4a8fc.
~~Add a new plotting method for parallel plot from OrdinationResults.~~ Done, see baf6fc1.
~~Add a new plotting method for biplot from OrdinationResults.~~ Done, see 147e630.
~~Update the ordinate() method so that it can output PCoAResults % Properties('biplot') as well as PCoAResults.~~ Done, see 3b51016.
~~Add a new plotting method for grouped abundance bar chart. Currently, you have to manually create a grouped bar chart with taxa_abundance_bar_plot(), which can be troublesome.~~ Done, see b48ec7f.

Ideas that will most likely be implemented:

Other ideas:

Any plan to publish to Pypi or Conda ?

Hi, this is a wonderful package! Is there any schedule or roadmap of publishing to Pypi or Conda?

Ideas for the next Dokdo version (1.4.0)

These ideas will be implemented to the 1.4.0-dev branch before the official release.

Ideas that are already implemented:

Update the make_manifest command so that the sample IDs are determined better. Currently, the
word before the first underscore is set as the sample ID (e.g. 'EXAMPLE'
in EXAMPLE_S1_R1_001.fastq.gz). This works fine in most cases, but gives an error for file names like EXAM_PLE_S1_R1_001.fastq.gz. Done, see b68159a.
~~Rearrange the arguments for the _artist method to prevent unexpected behaviors. For example, currently using hide_ytexts=True still displays the y-axis tick labels if ymax is also used.~~ Done, see ac1c4fc.
Create a command called summarize for extracting summary statistics (e.g. minimum depth) from FeatureTable[Frequency]. Currently, in order to view summary statistics, you have to create a Visualization file and then open it with other tools (e.g. QIIME 2 View), which is not ideal when you are working in a remote server. Done, see 0b37080.
Update the alpha_rarefaction_plot method to plot lines for all the individual samples but label them by their group variable instead of sample name. The current behavior is either labeling the individual lines by sample name or plotting the lines for group averages. Done, see b07b73a.
~~Add a new argument called legend_lw to the _artist method.~~ Done, see 8cea3c9.
~~Update the add_metadata command so that it does not change the sample order after merging.~~ Done, see ec46f80.
~~Update the add_metadata command to warn when the final metadata contains NaN.~~ Done, see e2858e2.
~~Update summarize command to output sample IDs and feature IDs.~~ Done, see edecf1e.
~~Update the _artist method to detect the legend handle so that the legend_lw argument can be interpreted better.~~ Done, see f7682c9.
~~Add the --verbose option to the summarize command.~~ Done, see 66be20c.
~~Add the --output_dir option to the collapse command.~~ Done, see 3de7ac3.
~~Add a new plotting method called heatmap.~~ Done, see f5c4049.
~~Add cmap_name argument to taxa_abundance_bar_plot method.~~ Done, see 3cfeb7d.
~~Changed the taxa_abundance_box_plot method to use seaborn.stripplot() instead of seaborn.swarmplot() for plotting data points.~~ Done, see aebd400.
~~Added to the taxa_abundance_box_plot method the jitter and alpha arguments for setting the aesthetics of data points.~~ Done, see aebd400.
~~Add seed argument to alpha_rarefaction_plot method so that the result is reproducible.~~ Done, see 62d48f2.

Ideas that will most likely be implemented:

Other ideas:

The Seaborn package is misusing a legend label as a title (https://stackoverflow.com/questions/51579215/remove-seaborn-lineplot-legend-title). This behavior makes the detection of legend title difficult. I made an update (8b642fd) that will fix the issue for the alpha_rarefaction_plot method specifically, but it's not a general solution. I will need to come up with more general solution.

import process issue

Hi,

I encountered an issue during the import process while working with Dokdo. I followed the tutorial to write the code and executed it.

Here is my code:

import dokdo
import matplotlib.pyplot as plt

qzv_file = "taxa-bar-plots.qzv"

dokdo.taxa_abundance_bar_plot(qzv_file, level=2, count=8,figsize=(9,7))

plt.tight_layout()

However, I noticed a problem with the import section of the Dokdo/api/ordinate file during execution.
ImportError: cannot import name 'diversity_lib' from 'qiime2.plugins' (~/envs/qiime2/lib/python3.8/site-packages/qiime2/plugins.py)

It fails to recognize the following part:

from qiime2.plugins import diversity_lib
from qiime2.plugins import feature_table
from qiime2.plugins import diversity

I am using Python version 3.8.17 and have installed Qiime2 and Dokdo in a virtual environment. Qiime2 is at its latest version, 2023.7.

Is there anything I might have done wrong while using it?
Or could this be a problem caused by Dokdo not being updated?
If it's the latter, I'm curious if there are any plans for an update.

Thank you.

Overlapping legends on clustermap

Hi @sbslee
I was plotting a cluster map using the following command on a .csv file that contains relative frequency of top20 bacterial class.

import pandas as pd
import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
df = pd.read_csv(csv_file, index_col=0)
dokdo.clustermap(df,
              metadata=metadata_file,
              normalize='clr',
              hue1='matrix',
              figsize=(10, 8)
              )
plt.savefig('heatmap-3.png')

and got this image

Can you suggest how to plot the legends and scale so that it doesn't overlap on the fig?
Also when I am using flip=True the color code of samples are going missing from the graph:

Please help!

how to make heatmap with text file

Hi @sbslee ,
Is it possible to make a heatmap using a panda dataframe (not qza)?
i am trying to replicate the same style like bar plot, splitted categories with and without dendrogram.
Please can you help?

like this image

Thanks,
Khem

how to control the color codes for specific bacteria

Hi,
Dokdo has been very useful for me, and thanks again for the v1.10
I was wondering is there any way to control the color codes for specific bacteria and the order in the bar.
I have two different set of data but my control is the same for both.
Now, i wanted to keep for some bacteria (top 10-15) same color codes. so, i used RGB color codes to control the type of color but color is changing based on the abundance i guess...and also order,
If you see attached images, from the left, first 2 bars are same for both figures, but color codes for bacteria and order in the bar has changed.
Any suggestion to control them?
I hope i explained myself,
Thanks

taxa_abundance_bar_plot issue

안녕하세요 @sbslee
dokdo API들 중, taxa_abundance_bar_plot 관련 문의 드립니다.

다양한 parameter들 중에서 sample 이름으로만 제거하거나 혹은 keep할 수 있는 option이 없는데
혹시 추가 가능하실까요?

각종 연구로 많이 바쁘실텐데 항상 도움 주셔서 감사합니다.

Select samples for PCoA

Hi sbslee,
I was using dokdo for plotting PCoA as below:

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

qza_file = '/home/rupesh/rupesh/qiime2/new_qiime/uchime-dn-out_wdout_clustering/table-nc-wobl-nmcl.qza'
metadata_file = '/home/rupesh/rupesh/qiime2/new_qiime/sample_metada.tsv'


pcoa_results = dokdo.ordinate(qza_file)

dokdo.beta_2d_plot(
    pcoa_results,
    hue='matrix',
    metadata=metadata_file,
    figsize=(8, 8)
)

plt.tight_layout()

which is working fine.
But,
I was wondering if we can have function like 'where' or 'include_samples' or 'exclude_samples' or something like that to control the samples to be presented on the plot.

Thanks

installation issue

Hi,
I installed dokdo in qiime2 environment (v 2019, 2021.4),
$ git clone https://github.com/sbslee/dokdo
$ cd dokdo
$ pip install .
but i am keep getting following issue:

pkg_resources.DistributionNotFound: The 'dokdo==1.8.0' distribution was not found and is required by the application

and also checked in notebook but cant import dokdo, saying not found dokdo module
Please can you guide me where is the issue?
Thanks

Deprecate the artist_kwargs argument from plotting methods

The artist_kwargs argument in main plotting methods -- and the dokdo.common._artist method for that matter -- will be gradually deprecated because it has gotten unnecessarily complicated.

This argument was originally created because I was not familiar enough with the matplotlib package and did not want to memorize the package's methods used for changing various properties of a figure (e.g. title, axis, legend, font size). However, it has become clear that it's not feasible to capture all the functionality of matplotlib with a single private method (i.e. dokdo.common._artist). Therefore, the artist_kwargs argument will be removed from the main plotting methods in the upcoming releases.

Plot options for `beta_2d/3d_plot` and `taxa_abundance_box_plot` methods

Hi @sbslee

Recently, I noticed two minor things on Dokdo and thought you could help

beta_3D_plot doesn't have an option for colors, if possible please can you add
initially, I used to use the add_datapoints option for box plots, it's not working anymore

Thanks

Ideas for the next Dokdo version (1.6.0)

These ideas will be implemented to the 1.6.0-dev branch before the official release.

Ideas that are already implemented:

~~Fix typo for the add-metadata command.~~ Done, see 496c06e.
Fix bug with the heatmap() method giving the FloatingPointError: NaN dissimilarity value. error when sample-filtered metadata is provided and the metric='correlation' argument is used. The error is related to this Stack Overflow post where more than one columns "are all zeros, and have no variation at all, making it return nan with correlation". Done, see 6dfa39c.
~~Add centered log-ratio (CLR) transformation as a normalization option to the heatmap() method.~~ Done, see 1927cbe.
~~Update the heatmap() method to support kwargs that are passed to the seaborn.heatmap() method.~~ Done, see 39da3d0.
~~Add a new method called pname() that returns a prettified taxon name.~~ Done, see 4340c70.
~~Update the prepare-lefse command to output more informative taxa name than just underscores (e.g. __ or g__). Note that this issue was also raised in this post from the QIIME 2 Forum.~~ Done, see 61dfb2c.
~~Update the addpairs() method to accept additional keyword arguments.~~ Done, see 099f961.
~~Add a new method called wilcoxon() that computes p-value from the Wilcoxon Signed-rank test.~~ Done, see d05a449.
~~Add a new method called num2sig() that converts a p-value to significance annotation.~~ Done, see 92d318b.
~~Update addpairs() to support more than two boxes.~~ Done, see 29759ff.
~~Add a new method called mannwhitneyu() that computes p-value from from the Mann–Whitney U test.~~ Done, see 10bfd3d.
~~Fix bug with the alpha_diversity_plot() method only recognizing an Artifact file, and not an Artifact object, as input.~~ Done, see dc7275e.
~~Update the alpha_diversity_plot() method to display sample size in the x-axis.~~ Done, see 1e981a5.
~~Update the summarize command to support FeatureData[Sequence].~~ Done, see c220a31.
~~Update the summarize command to support FeatureData[AlignedSequence].~~ Done, see 8344201.
~~Update the summarize command to support the -v/--verbose option. When this option is used, the command will only output the first five records.~~ Done, see 8344201.
~~Fix bug with the heatmap() method giving an error when one of the metadata columns has only zeros.~~ Done, see 28b4674.
~~Fix bug with the heatmap() method giving an error when using normalize='log10'. Note that this bug was introduced during 1.6.0-dev (1927cbe specifically).~~ Done, see 659fe63.
~~Update the heatmap() method to support two grouping variables instead of just one.~~ Done, see 0440fbb.

Ideas that will most likely be implemented:

Other ideas:

ImportError: libhdf5_cpp.so.200: cannot open shared object file: No such file or directory

Hi @sbslee
One day while I was using the library, the following error occurred during the import process. Could you please ask for help on this? Error codes generated during import are attached as follows.

ImportError Traceback (most recent call last)
Cell In [13], line 1
----> 1 import dokdo

File ~/dokdo/dokdo/init.py:1
----> 1 from .api import *

File ~/dokdo/dokdo/api/init.py:2
1 from .common import get_mf, pname
----> 2 from .ordinate import ordinate
3 from .num2sig import num2sig
4 from .wilcoxon import wilcoxon

File ~/dokdo/dokdo/api/ordinate.py:3
1 from qiime2 import Artifact
2 from qiime2 import Metadata
----> 3 from qiime2.plugins import diversity_lib
4 from qiime2.plugins import feature_table
5 from qiime2.plugins import diversity

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/plugins.py:425, in QIIMEArtifactAPIImporter.find_spec(self, name, path, target)
422 plugin_details = fqn[2:] # fqn[len(['qiime2', 'plugins']):]
423 plugin_name = plugin_details[0]
--> 425 plugin = self._plugin_lookup(plugin_name)
426 if plugin is None or len(plugin_details) > 2:
427 return None

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/plugins.py:402, in QIIMEArtifactAPIImporter._plugin_lookup(self, plugin_name)
400 def plugin_lookup(self, plugin_name):
401 import qiime2.sdk
--> 402 pm = qiime2.sdk.PluginManager()
403 lookup = {s.replace('-', ''): s for s in pm.plugins}
404 if plugin_name not in lookup:

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/plugin_manager.py:67, in PluginManager.new(cls, add_plugins)
65 cls.__instance = self
66 try:
---> 67 self._init(add_plugins=add_plugins)
68 except Exception:
69 cls.__instance = None

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/plugin_manager.py:105, in PluginManager._init(self, add_plugins)
103 project_name = entry_point.dist.project_name
104 package = entry_point.module_name.split('.')[0]
--> 105 plugin = entry_point.load()
107 self.add_plugin(plugin, package, project_name,
108 consistency_check=False)
110 self._consistency_check()

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/pkg_resources/init.py:2465, in EntryPoint.load(self, require, *args, **kwargs)
2463 if require:
2464 self.require(*args, **kwargs)
-> 2465 return self.resolve()

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/pkg_resources/init.py:2471, in EntryPoint.resolve(self)
2467 def resolve(self):
2468 """
2469 Resolve the entry point from its module and attrs.
2470 """
-> 2471 module = import(self.module_name, fromlist=['name'], level=0)
2472 try:
2473 return functools.reduce(getattr, self.attrs, module)

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_diversity_lib/plugin_setup.py:18
15 from q2_types.distance_matrix import DistanceMatrix
16 from unifrac._meta import CONSOLIDATIONS
---> 18 from . import alpha, beta, version
20 citations = Citations.load('citations.bib', package='q2_diversity_lib')
21 plugin = Plugin(
22 name="diversity-lib",
23 version=version,
(...)
28 " community alpha and beta diversity.",
29 )

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_diversity_lib/beta.py:12
10 import skbio.diversity
11 import sklearn.metrics
---> 12 import unifrac
13 from skbio.stats.composition import clr
14 from scipy.spatial.distance import euclidean

File ~/miniconda/envs/qiime2-2022.2/lib/python3.8/site-packages/unifrac/init.py:30
9 import pkg_resources
11 from unifrac._methods import (unweighted,
12 weighted_normalized,
13 weighted_unnormalized,
(...)
28 h5unifrac,
29 h5pcoa)
---> 30 from unifrac._api import ssu, faith_pd, ssu_to_file
33 version = pkg_resources.get_distribution('unifrac').version
34 all = ['unweighted', 'weighted_normalized', 'weighted_unnormalized',
35 'generalized', 'unweighted_fp32', 'weighted_normalized_fp32',
36 'weighted_unnormalized_fp32', 'generalized_fp32',
(...)
44 'h5unifrac', 'h5pcoa',
45 'ssu', 'faith_pd', 'ssu_to_file']

ImportError: libhdf5_cpp.so.200: cannot open shared object file: No such file or directory

ImportError: cannot import name 'diversity_lib'

Dear @sbslee,
Thank you for your contribution, I have installed dokdo into Qiime2 (2020.2) but I encountered a problem related to the importing of "diversity_lib" module
I tried to upgrade the qiime2 to version 2020.8 as you suggested in the tutorial, but the problem still resist. Please take a look at the image below

dokdo `beta_parallel_plot` issue

안녕하세요, dokdo를 사용하고 있고 많이 배워가고 있는 새내기 과학자입니다.
dokdo에서 beta_parallel_plot 을 사용하여 2개의 axis에 plot을 그려보려고 했으나 Warning이 발생하여 문의드립니다.
2개의 ax1, ax2를 그리고, 각각 plot을 그리려고 했지만 한곳에 모두 그려지는 현상을 발견했습니다.
각종연구로 인해 많이 바쁘시겠지만 확인부탁드립니다 :)

/00.Qiime2/dokdo/dokdo/api/beta_parallel_plot.py:121: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels(props)

[taxa_abundance_bar_plot] Customizing bar colors with using 'Seaborn palette' or 'Matplotlib colormap'

@sbslee

Hello,

Thank you for kindly answering the last question, I have another question during the visualization this time.
I'm aiming to visualize by customizing the colors of the taxa bars this time, but I'm struggling with the cmap setting step.

I tried to set the cmap by importing Seaborn's palette by specifying color codes and then importing it into matplotlib.
However, it continuously failed during the conversion process.
It would be nice to form cmap using Seaborn's color code designation method.. but It seems difficult.

Could you please advise on how to customize that?

My goal is to combine more than 20 desired colors with a qualitative cmap and display them as a taxa bar.

Thank you!

Problem with taxa_abundance_bar_plot

Hey there!

Firstly, congrats on dokdo... I'm in love with the plots it can make and very excited to generate my first one!

Well, I'm working on Jupyter Notebook...
I am with the qiime2 environment already activated, but after running the following code...

import dokdo

import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np
np.random.seed(1)

qzv_file = '/home/rayana/Documents/MiSeq_16S/taxonomy.qzv'

dokdo.taxa_abundance_bar_plot(
    qzv_file,
    level=2,
    count=8,
    figsize=(9, 7)
)

plt.tight_layout()

... I've got the following error:

AttributeError Traceback (most recent call last)
in
----> 1 dokdo.taxa_abundance_bar_plot(
2 qzv_file,
3 level=2,
4 count=8,
5 figsize=(9, 7)

AttributeError: module 'dokdo' has no attribute 'taxa_abundance_bar_plot'

Could you please help me on what I should do?

Thank you in advance!

Best regards,
Rayana

orders and group mismatch

Hi @sbslee ,
I found a small mismatch during grouping and orders of bacteria.
Probably the way my data is but thought maybe you will have a solution,
instead of going and labeling each bacteria with colors.
if you see both figures, one small difference is prevotella and lactobacillaceae.
I wanted to keep Prevotella in both figures but couldn't control.
I guess, its how they started grouping, in first I started with control/donors but the second one with samples.

scripts for the first figure:

fig, [ax1, ax2, ax3, ax4] = plt.subplots(1, 4, figsize=(16, 7), gridspec_kw={'width_ratios': [2, 1.40, 2.5, 2.5]})
kwargs = dict(level=6, count=13, sort_by_mean2=False)

qzv_file = '/media/scebmeta/raw_backup/DoD_fastqs/finalRun_analysis/3932480_Nirmalkar/part4/mergd_GrpAnB/taxa-bar-plots.qzv'
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              ax=ax1,
                              count=13,
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              group='DonNew',
                              group_order=['Donors_Fin6', 'Donors_Fin7', 'Donors_Fin8'],
                              legend_short=True,
                              artist_kwargs=dict(title='Finch Donors',title_fontsize=16, legend_loc='lower right', xticklabels_fontsize=14, yticklabels_fontsize=12, ylabel_fontsize=14)),

fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30)
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              ax=ax2,
                              count=13,
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              group='DonNew',
                              group_order=['Donors_UMN1', 'Donors_UMN2'],
                              figsize=(10, 7),
                              legend_short=True,
                              artist_kwargs=dict(title='UMN Donors',title_fontsize=16,show_legend=False, legend_loc='lower right', xticklabels_fontsize=14, hide_ylabel=True,
                                                hide_yticks=True)),
fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30),
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              ax=ax3,
                              count=13,
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              group='DonNew',
                              group_order=['Base_B_Finch', 'Vanco_B_Finch', 'End_01_B_Finch', 'End_02_B_Finch'],
                              figsize=(10, 7),
                              legend_short=True,
                              artist_kwargs=dict(title='GroupB: Finch recepients',title_fontsize=16, xticklabels_fontsize=14, hide_ylabel=True,
                                                hide_yticks=True)),
fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30),
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              ax=ax4,
                              count=13,
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              group='DonNew',
                              group_order=['Base_B_UMN', 'Vanco_B_UMN', 'End_01_B_UMN', 'End_02_B_UMN'],
                              legend_short=True,
                              artist_kwargs=dict(title='GroupB: UMN recepients',title_fontsize=16, xticklabels_fontsize=14, hide_ylabel=True,
                                                hide_yticks=True)),
plt.legend(bbox_to_anchor=(1, 1), loc=2, fontsize=14,facecolor='white')
plt.tight_layout()
#plt.xticks(fontsize=13)
#plt.yticks(fontsize=13)
#set_xlabel("", fontsize=25, fontweight='bold')
fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30)
fig1.savefig('GrpB_DONsepGenus_barpltT13.svg')
fig1.savefig('GrpB_DONsepGenus_barpltT13.png', dpi=500)

******* for the second figure

qzv_file = '/media/scebmeta/raw_backup/DoD_fastqs/finalRun_analysis/3932480_Nirmalkar/part4/mergd_GrpAnB/taxa-bar-plots.qzv'
dokdo.taxa_abundance_bar_plot(qzv_file,
                              level=6,
                              #ax=ax1,
                              count=13,
                              figsize=(50,22),
                              sort_by_mean3=False,
                              sort_by_mean2=False,
                              by=['order'],
                              label_columns=['order','DonNew'],
                              include_samples={'DonNew':['Donors_UMN1', 'Donors_UMN2', 'Base_A_UMN','Vanco_A_UMN', 'End_01_A_UMN','End_02_A_UMN']},
                              colors=['#FF5533', '#3B7A57', '#6890F0','#A040A0','#F8D030', '#E0C068', '#EE99AC', '#C03028', '#78C850', '#332B00', '#D4F6F7', '#08F6FF','#A87CE9'],
                              #orders={'DonNew':['Donors_Fin6', 'Donors_Fin7', 'Donors_Fin8','Base_A_Finch','Vanco_A_Finch', 'End_01_A_Finch','End_02_A_Finch']},
                              #orders={'DonNew':['Base_A_Finch','Vanco_A_Finch', 'End_01_A_Finch','End_02_A_Finch','Base_B_Finch','Vanco_B_Finch', 'End_01_B_Finch','End_02_B_Finch']},
                              #group='DonNew',
                              #group_order=['Donors_Fin6', 'Donors_Fin7', 'Donors_Fin8','Base_A_Finch','Vanco_A_Finch', 'End_01_A_Finch','End_02_A_Finch'],
                              legend_short=True,
                              artist_kwargs=dict(title='', legend_loc='lower right', xticklabels_fontsize=34, yticklabels_fontsize=48, ylabel_fontsize=48)),

plt.legend(bbox_to_anchor=(1, 1),fontsize=45, loc=2, facecolor='white')
plt.tight_layout()
#plt.xticks(fontsize=13)
#plt.yticks(fontsize=13)
#set_xlabel("", fontsize=25, fontweight='bold')
fig1= plt.gcf()
#plt.gca().xaxis.set_tick_params(rotation = 30)
fig1.savefig('GrpA_UMN_genus_barpltT13.svg')
fig1.savefig('GrpA_UMN_genus_barpltT13.png', dpi=500)

Note: Prevotella is more abundant in bar 1 and 2 (figure 1st) and last two bars (figure2) and they are my controls...so i want to keep.

Any suggestions?

Dokdo issue

Please fix : Docs -> Dockdo API -> Main Plotting Methods -> alpha_diversity_plot -> Examples
qzv_file = '/Users/sbslee/Desktop/dokdo/data/moving-pictures-tutorial/faith_pd_vector.qza'

Ideas for the next Dokdo version (1.7.0)

These ideas will be implemented to the 1.7.0-dev branch before the official release.

Ideas that are already implemented:

~~Add a new command called count-reads which counts the number of sequence reads from FASTQ.~~ Done, see 4692736.
~~Replace _pretty_taxa() with pname() for the taxa_abundance_box_plot() method.~~ Done. see a4d35f9.
~~Move the legend_short argument from the _aritst() method to the taxa_abundance_bar_plot() method.~~ Done, 81e4ea8.
~~Update the distance_matrix_plot() method's handling of input file so that it does not create a temporary file anymore.~~ Done, see a453144.
~~Update the distance_matrix_plot() method to support density plot.~~ Done, see 2bd9b0e.
~~Update the summarize command to support the DistanceMatrix semantic type.~~ Done, see ca7a6d4.
~~Update the summarize command to support the FeatureData[Taxonomy] semantic type.~~ Done, see 62e6a3e.
~~Fix minor bug in the ordinate() method giving an error when Metadata object is given to the metadata argument.~~ Done, see 0cab2b3.
~~The where argument in the ordinate() method has been deprecated. From now on, users who wish to perform sample filtration on the feature table should provide filtered metadata.~~ Done, see bf095e0.
~~Fix the legend_short conflict in the barplot() method, which was introduced in this version.~~ Done, 08f70d0.

Ideas that will most likely be implemented:

Other ideas:

Update docs

Please remove: Pipeline - 6. Summarize and Filter ASV Table - Taxonomy-Based Filtering - filter-seqs

Heatmap with prettified names?

Hi @sbslee ,
Thanks for your very recent updates to heatmap (from clustermap) - just in time for what I need. Just wondering how I could incorporate your 'prettified names' function - to get taxa names as the y-axis labels such as is possible in the legend of the taxa_abundance_bar_plot output?

Thanks,
Michelle

How to plot many species on a bar graph

Hello sbslee,

Thank you for using the dokdo package you provided.
I have a question regarding taxa bar plots.

I recently performed my first 16S rRNA analysis and successfully completed the Qiime2 pipeline, and I am currently trying to visualize it.

However, unlike the easy practice in the tutorial, my sample actually analyzed included numerous species based on Genus level, and I encountered an unintended error when displaying the qzv file as taxa bar plots.

I'm guessing this is the reason why it's hard to include as a legend because there are so many species, I'd like to ask how to control it.
I am studying matplotlib, etc., but I am at a beginner level in using Python, so please understand that I lack application skills.

In addition, if you look at the microbiome analysis papers, most of them visualize only some species of the genus level, and the rest are expressed as 'the others'.
In my case, the genus I want to observe is Salmonella with a relative abundance of 0.1%, and I would like to mark all genus above that ratio (about 70 species). However, this is also extensive to include in a taxa bar & heatmap, so I would like to ask for advice on this.

I upload reference photos of my samples observed in qiime view and errors encountered during analysis as follows.

thank you very much.

Implementing differential abundance analysis

Hi @sbslee
I am curious to know, are differential analyses tools like lefse, ANCOM, ALDX2, and/or DeSeq2 for microbiota data out of scope for Dokdo?
it would be wonderful to have them in Dokdo, if it is not possible, then never mind,

Thanks

Ideas for the next Dokdo version (1.5.0)

These ideas will be implemented to the 1.5.0-dev branch before the official release.

Ideas that are already implemented:

~~Update the alpha_diversity_plot method to accept metadata.~~ Done, see 290cebf.
Update the alpha_diversity_plot method to accept as input a SampleData[AlphaDiversity] file (e.g. shannon_vector.qza) instead of the Visualization file from qiime diversity alpha-group-significance (e.g. shannon_group-significance.qzv). Done, see 290cebf.
~~Add legend_title option to _artist method.~~ Done, see f37d0cc.
~~Update beta_3d_plot method to have legend title.~~ Done, see f37d0cc.
~~Update beta_parallel_plot method to have legend title.~~ Done, see 0995a40.
~~Add cmap_name argument to barplot method.~~ Done, see a23e8bc.
~~Fix a bug in the ordinate method when using both the biplot=True and where arguments with the following error message: The eigenvectors and the descriptors must describe the same samples.~~ Done, see eecc1b0.
~~Update the taxa_abundance_box_plot method to accept metadata as input.~~ Done, see a573e00.
~~Update the taxa_abundance_box_plot method to add legend title when the hue argument is provided.~~ Done, see 1478119.
~~Add the hue_order argument to the heatmap method.~~ Done see 39ece11.
~~Change the name of the where argument to hue for the heatmap method.~~ Done see 39ece11.
~~Update the taxa_abundance_bar_plot and taxa_abundance_box_plot methods to sort the samples by their name.~~ Done, see 3ebc9d9.
~~Add a new method called regplot for plotting relative abundance data and a linear regression model fit from paired samples for the given taxon.~~ Done, see 3352737.
~~Package dokdo with setuptools.~~ Done, see 7e2896a.
~~Update the make-manifest command to output the absolute file path instead of the relative file path.~~ Done, see d3c4b41.
~~Add the prepare-lefse command for creating a text file which can be used as input for the LEfSe tool.~~ Done, see 54488a7.
~~Retire the merge_metadata command.~~ Done, see a87b020.

Ideas that will most likely be implemented:

Other ideas:

Ideas for the next Dokdo version (1.2.0)

Ideas that are already implemented:

Retire the tax2seq command from the Dokdo CLI. It turns out the QIIME 2 CLI already has this functionality. For details, see this QIIME 2 forum post. Also, update the QIIME 2 CLI page in Dokdo's Wiki page. Done, see 47cbcff.
~~Update the _artist() method to be able to set the font size of title, labels, etc. Consult to this Stack Overflow post when making the update.~~ Done, see bc16414 as an example.
~~Add the s argument to the ancom_volcano_plot() method for setting marker size.~~ Done, see 6c00786.

Ideas that will most likely be implemented:

Other ideas:

Update the docstring to include the default values of artist_kwargs for each plotting method. For example, the ancom_volcano_plot() method currently has artist_kwargs={'xlabel': 'clr', 'ylabel': 'W'} as default, but there is no way the user can know about this unless he or she looks at the code. Exposing these default values explicitly in the documentation could be potentially useful for the users. I decided not to pursue this idea because things were more complicated than I initially thought. For example, the alpha_rarefaction_plot() method does not have a fixed ylabel value because it depends on the metric argument. And there is no simple way of indicating this in the docstring.
Add a general keyword argument to the main plotting methods, similar to the artist_kwargs argument, which will be passed down to the underlying drawing method (e.g. the seaborn.scatterplot() method) to control various aspects of a figure that cannot be set by the _artist() method (e.g. marker size in a scatter plot). If implementing this, I'm thinking about naming this argument as plot_kwargs. I decided not to pursue this idea because adding plot_kwargs will actually hide some of the arguments in the plotting methods and will make the use of those methods unnecessarily difficult.

sbslee / dokdo Goto Github PK

dokdo's People

Contributors

Stargazers

Watchers

Forkers

dokdo's Issues

Ideas that are already implemented:

Ideas that will most likely be implemented:

Other ideas:

Ideas that are already implemented:

Ideas that will most likely be implemented:

Other ideas:

Ideas that are already implemented:

Ideas that will most likely be implemented:

Other ideas:

Hi @sbslee One day while I was using the library, the following error occurred during the import process. Could you please ask for help on this? Error codes generated during import are attached as follows.

Ideas that are already implemented:

Ideas that will most likely be implemented:

Other ideas:

Ideas that are already implemented:

Ideas that will most likely be implemented:

Other ideas:

Ideas that are already implemented:

Ideas that will most likely be implemented:

Other ideas:

Recommend Projects

Recommend Topics

Recommend Org

Hi @sbslee
One day while I was using the library, the following error occurred during the import process. Could you please ask for help on this? Error codes generated during import are attached as follows.