vib-psb / i-adhore Goto Github PK

i-ADHoRe is a highly sensitive software tool to detect degenerated homology relations within and between different genomes.

License: GNU Affero General Public License v3.0

Perl 29.36% CMake 0.85% Dockerfile 0.03% C++ 69.66% C 0.10%

i-adhore's Introduction

i-ADHoRe 3

The i-ADHoRe algorithm is based on the initial ADHoRe algorithm. After detecting initial pairs of colinear segments using the basic ADHoRe algorithm, these pairs are aligned to each other to form a profile that combines their gene order and content information. This profile is then used to detect additional homologous segments that show conserved gene order and content when compared to the profile rather than individual segments. If such an additional segment is discovered, it is included in the profile as well and the search is repeated until no additional segments can be found. All results are outputted in tab delimited text files.

More information can be found on bioinformatics.psb.ugent.be.

Basic Installation on Linux

This package requires CMake to build the software.

Installing CMake

As root, execute the following commands:

a) on Redhat / Fedora distributions

yum install cmake

b) on Ubuntu / Debian distributions

sudo apt-get install cmake

If you encounter problems try installing the packages build-essential and libpng-dev using following command:
sudo apt-get install build-essential libpng-dev 

Installing i-Adhore

mkdir build
cd build
cmake ..

A useful option to specify for the cmake command is CMAKE_INSTALL_PREFIX so that you can tell cmake where to install the software. For example, to install in your local $(HOME)/i-adhore directory you would run:

cmake .. -DCMAKE_INSTALL_PREFIX=$(HOME)/i-adhore

Afterwards run:

make

And to install run (as root if necessary):

make install

It is required that you have a Pthreads library installed. Support for MPI and Googletest unit testing framework is optional.

Running the test cases:

1) Dataset I

Starting the simulation:

cd testset/datasetI
./i-adhore datasetI.ini

(runs for about 1 minute)

Dataset I consists of the Arabidopsis thaliana genome. The .ini file contains a "compareAligners" flag, which will compare different alignment methods. The results are summarized at the end of the run.

2) Dataset II

Starting the simulation:

cd testset/datasetII
./i-adhore datasetII.ini

(runs for about 2-3 hours)

Dataset II consists of the Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa genomes. As in dataset I, the goal is to compare the different alignment methods.

Run with docker

docker run vibpsb/i-adhore i-adhore <parameters>

Note that the input files need to be mounted inside the container.

Citations

Proost, S., Fostier, J., De Witte, D., Dhoedt, B., Demeester, P., Van de Peer, Y., & Vandepoele, K. (2012). i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets. Nucleic acids research, 40(2), e11-e11.
Fostier, J., Proost, S., Dhoedt, B., Saeys, Y., Demeester, P., Van de Peer, Y., & Vandepoele, K. (2011). A greedy, graph-based algorithm for the alignment of multiple homologous gene lists. Bioinformatics, 27(6), 749-756.
Simillion, C., Janssens, K., Sterck, L., Van de Peer, Y. (2008) i-ADHoRe 2.0: An improved tool to detect degenerated genomic homology using genomic profiles. Bioinformatics 24(1):127-8.

i-adhore's People

Contributors

Stargazers

Watchers

Forkers

arzwa altingia wook2014 stephrom

i-adhore's Issues

Script

Hi everyone, could you please help me with an i-adhore script for bash, to infer synteny for circos plots. Thanks in advance for your help

problem installing i-adhore 3.0.01

Hi,
CMake runs without errors or warnings but Make throws the following errors:

[ 57%] Linking CXX executable i-adhore
/usr/bin/ld: CMakeFiles/i-adhore.dir/higherLevel.cpp.o: in function void std::deque<Multiplicon*, std::allocator<Multiplicon*> >::_M_push_front_aux<Multiplicon* const&>(Multiplicon* const&)': higherLevel.cpp:(.text._ZNSt5dequeIP11MultipliconSaIS1_EE17_M_push_front_auxIJRKS1_EEEvDpOT_[_ZNSt5dequeIP11MultipliconSaIS1_EE17_M_push_front_auxIJRKS1_EEEvDpOT_]+0x22f): undefined reference to std::__throw_bad_array_new_length()'
/usr/bin/ld: CMakeFiles/i-adhore.dir/AlignmentDrawer.cpp.o: in function AlignmentDrawer::AlignmentDrawer(Profile const*)': AlignmentDrawer.cpp:(.text+0xbb6): undefined reference to std::__throw_bad_array_new_length()'
/usr/bin/ld: CMakeFiles/i-adhore.dir/AlignmentDrawer.cpp.o: in function AlignmentDrawer::AlignmentDrawer(std::vector<GeneList*, std::allocator<GeneList*> > const&)': AlignmentDrawer.cpp:(.text+0xcd6): undefined reference to std::__throw_bad_array_new_length()'
/usr/bin/ld: CMakeFiles/i-adhore.dir/parallel.cpp.o: in function ParToolBox::statPartWorklSort(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned int, std::vector<unsigned int, std::allocator<unsigned int> >&, std::vector<unsigned long, std::allocator<unsigned long> >&, std::vector<unsigned int, std::allocator<unsigned int> >&, std::vector<unsigned long, std::allocator<unsigned long> >&, std::map<unsigned long, unsigned int, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned int> > >&)': parallel.cpp:(.text+0xbe9): undefined reference to std::__throw_bad_array_new_length()'
/usr/bin/ld: CMakeFiles/i-adhore.dir/DataSet.cpp.o: in function DataSet::printProfiles()': DataSet.cpp:(.text+0x8618): undefined reference to std::__throw_bad_array_new_length()'
/usr/bin/ld: CMakeFiles/i-adhore.dir/GHM.cpp.o:GHM.cpp:(.text+0x35d4): more undefined references to `std::__throw_bad_array_new_length()' follow
collect2: error: ld returned 1 exit status
make[2]: *** [src/CMakeFiles/i-adhore.dir/build.make:471: src/i-adhore] Error 1
make[1]: *** [CMakeFiles/Makefile2:172: src/CMakeFiles/i-adhore.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

I am using cmake version 3.25.1, make v4.3 and gcc v11.4.0

Thanks,
Mark

Aborted (core dumped)

Hi,

i-adhore: /mnt/DATA/Software/i-ADHoRe/src/Profile.cpp:95: void Profile::createNodes(const std::set&, const std::vector&, std::vector<std::vector<Node*> >&, bool, bool) const: Assertion `geneX.isPairWith(geneY)' failed.

How can I solve that?

Empty Output

Hello,
I am trying to use the i-ADHoRe tool for human and mouse using the output files from OrthoFinder. For this purpose, I created the blast_table=Orthologs.list file, which looks like:
ENSG00000125498 OG0000000
ENSG00000273510 OG0000000
ENSG00000273517 OG0000000
ENSG00000273578 OG0000000
ENSG00000273603 OG0000000
ENSG00000273661 OG0000000
ENSG00000273794 OG0000000
ENSG00000274406 OG0000000
ENSG00000274412 OG0000000
ENSG00000274438 OG0000000
The iadhore.ini file I used for the analysis is as follows:
genome=Human
ENSG00000000003 query/ENSG00000000003.lst
ENSG00000000005 query/ENSG00000000005.lst
...
genome=Mouse
ENSMUSG00000000001 subject/ENSMUSG00000000001.lst
ENSMUSG00000000003 subject/ENSMUSG00000000003.lst
ENSMUSG00000000028 subject/ENSMUSG00000000028.lst
ENSMUSG00000000037 subject/ENSMUSG00000000037.lst
ENSMUSG00000000049 subject/ENSMUSG00000000049.lst
ENSMUSG00000000056 subject/ENSMUSG00000000056.lst
ENSMUSG00000000058 subject/ENSMUSG00000000058.lst
ENSMUSG00000000078 subject/ENSMUSG00000000078.lst
ENSMUSG00000000085 subject/ENSMUSG00000000085.lst
ENSMUSG00000000088 subject/ENSMUSG00000000088.lst
....
blast_table=Orthologs.list
table_type=family
number_of_threads=16
visualizeAlignment=false
output_path=output
level_2_only=true
alignment_method=gg2
prob_cutoff=0.001
anchor_points=3
gap_size=15
cluster_gap=20
q_value=0.05
I can run the analysis without errors using i-adhore iadhore.ini. However, the resulting alignment.txt file is empty, and the genes.txt file shows column values as 0 for "coordinate, remapped_coordinate, is_tandem, is_tandem_representative, tandem_representative, remapped." Other files also only have column values. Can you help me find the missing point?
Thank you in advance.

Segmentation fault

Hello,
We've been receiving segmentation fault with the cloud analysis and we guess it is because the MPI is not set up correctly during installation but we tried several things but nothing helped. Do you have any suggestions? The collinear analysis works fine.
Thanks!

Problems installing i-adhore

Hello,

I was installing i-adhore in my lab's server and no issues were reported during the process. However when I try to run the example I get the following error message:
i-adhore: error while loading shared libraries: libmpi_cxx.so.40: cannot open shared object file: No such file or directory

Any ideas about what's wrong?

Thanks in advance

Same chromosome comparison

Hello,

The number of anchorpoints in same-chromosome comparison is way lower than the number of genes in the chromosome, although blastp computes the similarities between each gene with itself.

Why does the diagonal show a low number of anchorpoints in this plot? (There is 40 anchropoints for MD01-MD01, 5 anchorpoints for MD03-MD03...)

Is this behavior expected from i-ADHoRe? I guess it isn't, according to the Figure S3 from the publication Phylogenetic reconstruction based on synteny block and gene adjacencies:

So what am I doing wrong?

Here are the parameters I used:

blast_table = tmp/blast/all_vs_all.txt

output_path = tmp/iadhore/

tandem_gap = 3
alignment_method = gg2
gap_size = 30
cluster_gap = 35
q_value = 0.75
prob_cutoff = 0.01
anchor_points = 5
level_2_only = true
number_of_threads = 7

genome = Prunus_persica
Pp01 tmp/data/PP_lst/Pp01.lst
Pp02 tmp/data/PP_lst/Pp02.lst
Pp03 tmp/data/PP_lst/Pp03.lst
Pp04 tmp/data/PP_lst/Pp04.lst
Pp05 tmp/data/PP_lst/Pp05.lst
Pp06 tmp/data/PP_lst/Pp06.lst
Pp07 tmp/data/PP_lst/Pp07.lst
Pp08 tmp/data/PP_lst/Pp08.lst

genome = Malus_domestica
Md01 tmp/data/MD_lst/Md01.lst
Md02 tmp/data/MD_lst/Md02.lst
Md03 tmp/data/MD_lst/Md03.lst
Md04 tmp/data/MD_lst/Md04.lst
Md05 tmp/data/MD_lst/Md05.lst
Md06 tmp/data/MD_lst/Md06.lst
Md07 tmp/data/MD_lst/Md07.lst
Md08 tmp/data/MD_lst/Md08.lst
Md09 tmp/data/MD_lst/Md09.lst
Md10 tmp/data/MD_lst/Md10.lst
Md11 tmp/data/MD_lst/Md11.lst
Md12 tmp/data/MD_lst/Md12.lst
Md13 tmp/data/MD_lst/Md13.lst
Md14 tmp/data/MD_lst/Md14.lst
Md15 tmp/data/MD_lst/Md15.lst
Md16 tmp/data/MD_lst/Md16.lst
Md17 tmp/data/MD_lst/Md17.lst

Thank you!

Citation

Hi Bert, I think the citation section in the README should be updated. I believe the relevant citation for I-ADHoRe 3.0 is:

Proost, S., Fostier, J., De Witte, D., Dhoedt, B., Demeester, P., Van de Peer, Y., & Vandepoele, K. (2012). i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets. Nucleic acids research, 40(2), e11-e11.

and perhaps also:

Fostier, J., Proost, S., Dhoedt, B., Saeys, Y., Demeester, P., Van de Peer, Y., & Vandepoele, K. (2011). A greedy, graph-based algorithm for the alignment of multiple homologous gene lists. Bioinformatics, 27(6), 749-756.

(Of course the reference to older work is nice to include too).

Bug(Abandoned (spitting kernel))Comparison of human and chimpanzee species

i-adhore: /data1/tools/i-ADHoRe/src/Profile.cpp:95: void Profile::createNodes(const std::set&, const std::vector<GeneList*>&, std::vector<std::vector<Node*> >&, bool, bool) const: Assertion `geneX.isPairWith(geneY)' failed.
374 multiplicons to evaluate - evaluating level 4 multiplicon... Abandoned (spitting kernel).
Seeking a solution !!!

ERROR: Genelist not found! - i-ADHoRe

Any suggestion?

I use the following .ini file

genome= myco
contig_1 myco/contig_1.lst
contig_3 myco/contig_3.lst
contig_4 myco/contig_4.lst
contig_7 myco/contig_7.lst
contig_8 myco/contig_8.lst
contig_9 myco/contig_9.lst
contig_10 myco/contig_10.lst
contig_13 myco/contig_13.lst
contig_16 myco/contig_16.lst
contig_21 myco/contig_21.lst
contig_27 mycocontig_27.lst
scaffold_12 myco/scaffold_12.lst
scaffold_17 myco/scaffold_17.lst

genome= dend
1 dend/1.lst
2 dendr/2.lst
3 dend/3.lst
4 dend/4.lst
5 dend/5.lst
6 dend/6.lst
7 dend/7.lst
8 dend/8.lst

blast_table= Cl_myco_vs_dendr_all.csv

output_path= myco_dend_output/

alignment_method=gg2
gap_size=30
cluster_gap=35
q_value=0.75
prob_cutoff=0.01
anchor_points=3
level_2_only=false
number_of_threads=4
multiple_hypothesis_correction=FDR

Is there documentation?

Dear i-AdHoRe 3.0 team,

Your software seems to do great job, and quickly! At least inasmuch as I can tell from using it within the wgd v2 pipeline. Thank you for making it!

However, as I tried to make more use of its output, I faced lack of explanations of what some fields in the output mean. I tried to find more help on the website referenced in README, but it only points to the paper with details about the algorithm. My attempts to invoke a help message in the command line were also unsuccessful.

My question is: Is there a resource that would explain the meaning of all output files & fields therein as well as the design of the config file?

If directly helping me with my usecase would be easier: I want to break my genome assembly into longest collinear blocks (i.e. getting two fragments of an assembly for every non-overlapping longest self-collinearity hit). By this I am trying to get the "subgenomes" of an ancient polyploid which has experienced a few homoeologous recombination events for further phasing.

anchor points in synteny/cloud mode

Dear authors,

I am using the docker version of i-adhore 3.0

I am looking at synteny inside Aristolochia genome and the gene order is not conserved in the area I am looking at. So I tried using the synteny mode instead of the collinearity one.

However, since I get many detections with default parameters I tried being more restrictive by enhancing the number of anchor points (among other things). It seems that no matter the number of anchor points I set I still get clouds with at least 3 anchor points. I don't experience this behaviour with the collinearity mode.

Is this behaviour expected ?

the parameters I used:
************* i-ADHoRe parameters *************
Number of genelists = 7
Blast table = families_blast_orthofinder.tsv
Output path = unassigned/plaza/aristolochia_61460_synteny_25_6_1e-3_q085/
Gap size = 0
Cluster gap size = 0
Cloud gap size = 25
Cloud cluster gap size = 30
Max gaps in alignment = 0
Tandem gap = 15
Flush output = 1000
Q-value = 0.85
Anchorpoints = 6
Probability cutoff = 0.001
Cloud filtering method = Binomial
Level 2 only = true
Use family = true
Write statistics = true
Alignment method = GreedyGraphbased4
Multiple hypothesis correction = FDR
Number of threads = 4
Compare aligners = false
Synthenic cloud searches only
Visualize GHM.png = false
Visualize Alignment = true
Verbose output = true
************ END i-AdDHoRe parameters *********

and I get results in the clouds.txt file like this (this is just the beginning of the file):
id genome_x list_x genome_y list_y number_of_anchorpoints cloud_density dim_x dim_y
1 afi 2 afi 2 53 0.317365 328 167
2 afi 2 afi 2 41 0.577465 71 206
3 afi 2 afi 2 37 0.72549 193 51
4 afi 2 afi 2 16 0.225352 95 71
5 afi 2 afi 2 24 0.230769 176 104
6 afi 2 afi 2 15 0.214286 88 70
7 afi 2 afi 2 14 0.285714 125 49
8 afi 2 afi 2 64 0.412903 155 264
9 afi 2 afi 2 3 1 3 3
10 afi 2 afi 2 17 0.223684 171 76
11 afi 2 afi 2 5 0.277778 18 18
12 afi 2 afi 2 9 0.214286 73 42
13 afi 2 afi 2 7 0.225806 47 31
14 afi 2 afi 2 5 0.25 20 21
15 afi 2 afi 2 5 0.238095 21 21
16 afi 2 afi 2 4 0.4 10 20
17 afi 2 afi 2 4 0.571429 7 29
18 afi 2 afi 2 3 1.5 24 2
19 afi 2 afi 2 9 0.2 45 84

Maybe there is an option I missed...
Thank you for your help !

Marie