marekborowiec / amas Goto Github PK

View Code? Open in Web Editor NEW

70.0 70.0 14.0 444 KB

Calculate summary statistics and manipulate multiple sequence alignments

License: Other

Python 100.00%

amas's People

Contributors

Stargazers

Watchers

Forkers

carlosp420 python3pkg chengguobiao jeremias-brand gabicamacho jil595 zai910 duominuolin yuzhenpeng anandksrao jiangchb gaurav-agavekar jglahe modelturnedgeek

amas's Issues

Split method does not work in Python module

This is the script:

from amas import AMAS
sequences_path = './concatenated_alignment_copy.phy'
partitions_path = './partition_scheme_2.txt'
meta_aln = AMAS.MetaAlignment(in_files=[sequences_path], data_type="dna", in_format="phylip", cores=10)
parsed_parts = meta_aln.get_partitions(partitions_path)
partitions = meta_aln.get_partitioned(partitions_path)

The function meta_aln.get_partitioned(partitions_path) gives me the following error:
AttributeError: 'MetaAlignment' object has no attribute 'remove_empty'

AMAS just broke

Hello,

I love AMAS for my phylogenomics projects. I've had it running for 2 years, and I just realized that it broke sometime over the last month. This is true for several installations, as well as a fresh one. Not sure if I screwed up a dependency in Python3 or not ... I'm running latest Python .... that might be it, not sure.

here's the error I get running on Linux (Debian)

Traceback (most recent call last):
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 2017, in
main()
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 1985, in main
meta_aln.write_summaries(kwargs["summary_out"])
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 1514, in write_summaries
summary_out = self.get_summaries()
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 1452, in get_summaries
summaries = [alignment.get_summary() for alignment in alignments]
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 1452, in
summaries = [alignment.get_summary() for alignment in alignments]
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 953, in get_summary
data = self.summarize_alignment()
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 706, in summarize_alignment
self.no_missing_ambiguous = self.get_sites_no_missing_ambiguous()
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 784, in get_sites_no_missing_ambiguous
no_missing_ambiguous_sites = [self.get_site_no_missing_ambiguous(column) for column in range(self.get_alignment_length())]
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 784, in
no_missing_ambiguous_sites = [self.get_site_no_missing_ambiguous(column) for column in range(self.get_alignment_length())]
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 788, in get_site_no_missing_ambiguous
site = self.get_column(column)
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 776, in get_column
return [row[i] for row in self.matrix]
File "/mnt/griffin/chrwhe/software/AMAS/amas/AMAS.py", line 776, in
return [row[i] for row in self.matrix]
IndexError: list index out of range

multiprocessing.pool.MaybeEncodingError

Hi,
I have a alignment with 7902 seqs, when I'm running cms as AMAS.py summary -f fasta -i ./Clade.fasta -d dna -c 8 -o Clade_summary.txt with any threads >2, then it gaves me err as below:

Traceback (most recent call last):
  File "/home/cactus/miniconda3/bin/AMAS.py", line 10, in <module>
    sys.exit(main())
  File "/home/cactus/miniconda3/lib/python3.7/site-packages/amas/AMAS.py", line 1982, in main
    meta_aln = MetaAlignment(**kwargs)
  File "/home/cactus/miniconda3/lib/python3.7/site-packages/amas/AMAS.py", line 1039, in __init__
    self.alignment_objects = self.get_alignment_objects()
  File "/home/cactus/miniconda3/lib/python3.7/site-packages/amas/AMAS.py", line 1354, in get_alignment_objects
    alignments = pool.map(self.get_alignment_object, self.in_files)
  File "/home/cactus/miniconda3/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/cactus/miniconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<amas.AMAS.DNAAlignment object at 0x2b6421bdcba8>]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647")'

Please help!

Miao

Concat of interleaved nexus files produces erroneous alignments, but without error

Hi Marek --

Thanks so much for writing AMAS. It's been a useful and easy program to use. I've been using the concat function a lot, and in particular to concatenate hundreds of UCE nexus alignments. I recognize that users can, and should, properly specify the alignment input type, but I noticed that AMAS still ran (returned no error) when the input alignment type has been misspecified. In this case I was able to concat interleaved nexus files despite incorrectly using the -i nexus, and not the -i nexus-int, flag. It ran without error, but it did produce concatenated alignments of odd length (e.g., shorter than the input alignments combined). It was only after checking the size of the resulting concatenated files that I noticed something was off. Anyway, might not be something to be concerned about, but I figured you might want to know, in case it was of interest.

AMAS with many genes

Under certain conditions, AMAS does not work if the number of files to concatenate (or to make summary) exceeds a limit, e.g., 5,000 (the system report 'Argument list too long' error if, e.g., '*.fas' is used to define input files). Is there any chance to input file names using a file with files names specified? Or do you have any other solution? Thanks, Tomas

Invalid argument *.fasta

Good morning,

First of all, congratulations for AMAS, it is an extremely useful and powerful tool.

I am trying to concatenate 1065 fasta files. For this I wanted to use command

C:>python3 AMAS.py summary -f fasta -d dna -i *.fasta -c 4

but I get the error attached below (I tried changing to *fasta and I get the same error.). I don't know what I'm doing wrong

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "XXX\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Program FilesXXX\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(args))
File "C:\AMAS-master\amas\AMAS.py", line 1352, in get_alignment_object
aln = DNAAlignment(alignment, self.in_format, self.data_type)
File "C:\AMAS-master\amas\AMAS.py", line 682, in init
self.parsed_aln = self.get_parsed_aln()
File "C:\AMAS-master\amas\AMAS.py", line 694, in get_parsed_aln
aln_input = self.get_aln_input()
File "C:\AMAS-master\amas\AMAS.py", line 689, in get_aln_input
aln_input = FileParser(self.in_file)
File "C:\AMAS-master\amas\AMAS.py", line 435, in init
with FileHandler(in_file) as handle:
File "C:\AMAS-master\amas\AMAS.py", line 418, in enter
self.in_file = open(self.file_name, "r")
OSError: [Errno 22] Invalid argument: '.fasta'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\AMAS-master\amas\AMAS.py", line 2075, in
main()
File "C:\AMAS-master\amas\AMAS.py", line 2040, in main
meta_aln = MetaAlignment(**kwargs)
File "C:\AMAS-master\amas\AMAS.py", line 1047, in init
self.alignment_objects = self.get_alignment_objects()
File "C:\AMAS-master\amas\AMAS.py", line 1362, in get_alignment_objects
alignments = pool.map(self.get_alignment_object, self.in_files)
File "C:\Program Files\XXX\lib\multiprocessing\pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\XXX\pool.py", line 774, in get
raise self._value
OSError: [Errno 22] Invalid argument: '*.fasta'

Cannot Generate by Taxon Summary Statistics

Hello,

I'm unable to generate summary statistics using the summary command for a fasta file of dna sequences. I've attached the fasta file and the output file "summary.txt".

Here's the command I used:

amas summary -i Noctuidae_Amphipyrinae_Manuscript_1_v001.fasta -f fasta -
d dna --by-taxon

Best,
Kevin
fasta_summary.zip

-n flag not working for concat

Hello-

I am trying to concatenate genes with a partition file formatted by codon position. I am running the following:

python3 AMAS.py concat -f fasta -d dna -i renamed_for_mcmctree_* -n 12 -u phylip -t full_concat_align_for_mcmctree.out

But I get the message:

AMAS.py: error: unrecognized arguments: -n 12

I have tried using --codons instead of -n, and also -n 123, but I get the same message.

The line of code above works fine if I remove the "-n" flag.

Added AMAS to bioconda

Hi! Thanks for the great package, I really prefer it to Fasconcat.
Not really an issue, but I've added AMAS to bioconda, if you want to update the installation instructions:
conda install -c bioconda amas
Cheers!
Matt

Partition to codon position

Currently it doesn't seem to be possible to output partition files with codon positions when concatenating loci. Would it perhaps be possible to add an option for this, e.g generating a partition file like this?

DNA, p1=1-60\3,2-60\3
DNA, p2=3-60\3

DNA, p1=1-60\3
DNA, p2=2-60\3
DNA, p3=3-60\3

It would also be great to be able to split based on such partition files.

ignore Ns for uSNP selection

After running ipyrad, my resulting dataset with full loci has an aligned length of 255,532 bp, 72.6% missing site data and 7.8% variable. For the unlinked SNP selection from this set I would expect 100% of the resulting sites to be variable? But instead the uSNP alignment with 694 sites only has 28.2% variable sites - the only explanation I have for this is that iPyrad must count Ns (missing data) as candidate SNPs when selecting an unlinked SNP per locus? I think that would be unwanted behavior?

Summary always returns zero in proportion columns

Hi,

I find that the AMAS summary command returns only zero's in the proportion columns (See example below). All other values (e.g. Parsimony_informative_sites) appear to be computed correctly.

Alignment_name No_of_taxa Alignment_length Total_matrix_cells Undetermined_characters Missing_percent No_variable_sites Proportion_variable_sites Parsimony_informative_sites Proportion_parsimony_informative .....
A1.aln 43 194 8342 354 0.0 186 0.0 173 0.0 .....
A2.aln 41 164 6724 292 0.0 152 0.0 140 0.0 .....
A3.aln 41 88 3608 48 0.0 64 0.0 53 0.0 .....
A4.aln 40 170 6800 157 0.0 144 0.0 126 0.0 .....

Thanks,
Tim.

question su

Are ambiguous states or missing data included counted in the number of parsimony informative characters? I would think not but I am getting pretty high numbers so I wanted to double check. I couldn't find this information in the manuscript.