oushujun / edta Goto Github PK
View Code? Open in Web Editor NEWExtensive de-novo TE Annotator
Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
License: GNU General Public License v3.0
Extensive de-novo TE Annotator
Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
License: GNU General Public License v3.0
Hello,
I am trying to use EDTA in order to annotate an avian genome(as a test I do it on a single chromosome), but i keep running into an error.
I have installed it with conda, following the script you have on github.
here is the copy of what I execute in order to run your script :
PATH=$PATH:/home/tkastylevsky/EDTA
cd
cd /home/tkastylevsky/FASTA_files/EDTA/gallus_gallus/chr1/
EDTA.pl -genome chr1.fa -anno 1 -force 1
(I tried to add the force 1 based on a solved issue on this github but it didn't help)
and this is what I get (some of it is in french, sorry, feel free to ask me if you need any translation, at first glance it seemed to me that everything was roughly understandable) :
########################################################
Extensive de-novo TE Annotator (EDTA) v1.7.6
Shujun Ou ([email protected])
########################################################
mercredi 29 janvier 2020, 18:08:10 (UTC+0100) Dependency checking:
All passed!
mercredi 29 janvier 2020, 18:08:20 (UTC+0100) Obtain raw TE libraries using various structure-based programs:
At least 1 parameter mandatory:
- Input fasta file: --genome
Obtain raw TE libraries using various structure-based programs
perl EDTA_raw.pl [options]
--genome [File] The genome FASTA
--species [rice|maize|others] Specify the species for identification of TIR candidates. Default: others
--type [ltr|tir|helitron|all] Specify which type of raw TE candidates you want to get. Default: all
--overwrite [0|1] If previous results are found, decide to overwrite (1, rerun) or not (0, default).
--threads|-t [int] Number of theads to run this script. Default: 4
--help|-h Display this help infocat: chr1.fa.mod.LTR.intact.fa: Aucun fichier ou dossier de ce type
cat: chr1.fa.mod.TIR.intact.fa: Aucun fichier ou dossier de ce type
cat: chr1.fa.mod.Helitron.intact.fa: Aucun fichier ou dossier de ce type
cat: chr1.fa.mod.LTR.intact.fa.gff3: Aucun fichier ou dossier de ce type
cat: chr1.fa.mod.TIR.intact.fa.gff: Aucun fichier ou dossier de ce type
cat: chr1.fa.mod.Helitron.intact.fa.gff: Aucun fichier ou dossier de ce typeperl bed2gff.pl EDTA.TE.combo.bed
mv: impossible d'évaluer 'chr1.fa.mod.EDTA.intact.bed.gff': Aucun fichier ou dossier de ce type
cp: impossible d'évaluer 'chr1.fa.mod.EDTA.intact.gff': Aucun fichier ou dossier de ce type
mercredi 29 janvier 2020, 18:08:20 (UTC+0100) Obtain raw TE libraries finished.
All intact TEs found by EDTA:
chr1.fa.mod.EDTA.intact.fa
chr1.fa.mod.EDTA.intact.gffmercredi 29 janvier 2020, 18:08:20 (UTC+0100) Perform EDTA advcance filtering for raw TE candidates and generate the stage 1 library:
Genome file chr1.fa.mod not exists!
Perform EDTA basic and advcanced filterings for raw TE candidates and generate the stage 1 library
perl EDTA_processF.pl [options]
-genome [File] The genome FASTA
-ltr [File] The raw LTR library FASTA
-tir [File] The raw TIR library FASTA
-helitron [File] The raw Helitron library FASTA
-mindiff_ltr [float] The minimum fold difference in richness between LTRs and contaminants (default: 1)
-mindiff_tir [float] The minimum fold difference in richness between TIRs and contaminants (default: 1)
-mindiff_hel [float] The minimum fold difference in richness between Helitrons and contaminants (default: 4)
-repeatmasker [path] The directory containing RepeatMasker (default: read from ENV)
-blast [path] The directory containing Blastn (default: read from ENV)
-protlib [File] Protein-coding aa sequences to be removed from TE candidates. (default lib: alluniRefprexp082813 (plant))
You may use uniprot_sprot database available from here:
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/
-threads|-t [int] Number of theads to run this script
-help|-h Display this help infoERROR: Stage 1 library not found in chr1.fa.mod.EDTA.combine/chr1.fa.mod.LTR.TIR.Helitron.fa.stg1 at /home/tkastylevsky/EDTA/EDTA.pl line 384.
I know, through other annotation methods (repeatmodeler) that there are LTR, TIR and helitrons on this chromosome.
Thank you in advance,
I think I closed this a bit too early, I do have a question that isn't discussed in #8. If we plan on including homology-based TEs from RepBase or Dfam as well as the structure-based TEs from EDTA, do you suggest including the RepBase/Dfam libraries in the -curatedlib option of the EDTA run? Or should we run EDTA and then concatenate with RepBase/Dfam results?
Originally posted by @Neato-Nick in #18 (comment)
Hi Shujun,
I am running EDTA.pl in a conda environment using --threads 30. The 'Identify LTR' step finished in less than one day and the 'Identify TIR' has been running for six days now. I've also noticed that this process is using only one CPU. Is it normal?
##### Extensive de-novo TE Annotator (EDTA) v1.7.9 ####
##### Shujun Ou ([email protected]) ####
########################################################
Mon Feb 3 19:37:57 -02 2020 Dependency checking:
All passed!
Mon Feb 3 19:38:41 -02 2020 Obtain raw TE libraries using various structure-based programs:
Mon Feb 3 19:38:41 -02 2020 EDTA_raw: Check dependencies, prepare working directories.
Mon Feb 3 19:38:53 -02 2020 Start to find LTR candidates.
Mon Feb 3 19:38:53 -02 2020 Identify LTR retrotransposon candidates from scratch.
Use of uninitialized value $chr_pre in hash element at /home/augustold/miniconda3/envs/EDTA/share/LTR_retriever/bin/call_seq_by_list.pl line 86.
Tue Feb 4 13:07:26 -02 2020 Finish finding LTR candidates.
Tue Feb 4 13:07:26 -02 2020 Start to find TIR candidates.
Tue Feb 4 13:07:26 -02 2020 Identify TIR candidates from scratch.
Species: others
Best wishes and thank you for providing this tool.
Hello,
in the file.fa.EDTA.TElib.fa, virtually all transposons are labelled as "unknown":
16712 are unknown
48 are Gypsy
Presuming my study organism is not having completely strange transposons, is it the kind of expected statistics?
Thank you
When running EDTA_raw.pl script the output for both TIR and Helitron raw fasta files are empty. I think the problem is at the call_seq.pl script because the TIR.ext30.list gives an output such as:
000000F:152380..154395 000000F:152350..154425
000000F:292101..295163 000000F:292071..295193
000000F:429115..433751 000000F:429085..433781
000000F:433252..438167 000000F:433222..438197
But then the TIR.ext30.fa is empty
I tried to call the script alone:
perl $call_seq $seq.ext$extlen.list -C $genome
but it doesn't give any output neither.
Same situation applies for HelitronScanner.raw.ext.list and HelitronScanner.raw.ext.fa
The fasta header format is as follows:
000160F 000285294:B
000285294:B000284495:B~000284495:B ctg_linear 11256 10841
but even with no spaces the problem persists:
000161F_000058666:E
000058666:E000414072:B~000414072:B_ctg_linear_15599_15577
Any help or suggestion will be appreciated.
Thanks!
Hi !
I run into a memory issue trying to run TIR-Learner. Did you already run into it? And what can I do to solve this issue?
Here are the commands/outputs that I get:
$ nohup perl ../EDTA/EDTA_raw.pl -genome F2.genome.fasta -species Maize -type tir -threads 20 > essai_tir.out 2> essai_tir.err &
$ cat essai_tir.err
nohup: ignoring input
Wed Jan 15 19:24:43 CET 2020 EDTA_raw: Check files and dependencies, prepare working directories.
Wed Jan 15 19:24:43 CET 2020 Start to find TIR candidates.
ln: failed to create symbolic link 'F2.genome.fasta': Input/output error
Wed Jan 15 19:24:43 CET 2020 Identify TIR candidates from scratch.
Species: Maize
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
Out of memory!
ERROR: no LOC list!
Usage: perl call_seq_by_list.pl MSU_format_list -C genome.fasta -out file.fa [options]
itself Output sequence specified in the list (default).
up_[int] Output sequences [int] bp upstream of the region.
down_[int] Output sequences [int] bp downstream of the region.
-C [fasta] A fasta file you want to extract sequence from.
-out Output file name. Default: MSU_format_list.fa
-header [0|1] Output sequence with (1, default) or without (0) sequence header.
-rmvoid [0|1] Remove empty sequence (1, default) or retain empty sequence (0) in output.
-ex Exclude sequence specified by the list. Default: Output sequence specified by the list.
-cov [0-1] Work with -ex. If excluding too much of the target (default 1), discard the entire sequence.
-purge [0|1] Work with -ex. Switch on=1/off=0(default) to clean up aligned region and joint unaligned sequences.
Example:
Call sequence of upper 2000 bp region in the list and output to result.fa
perl call_seq_by_list.pl array_list -C rice.fasta up_2000 -out result.fa
Out of memory!
Out of memory!
Out of memory!
Out of memory!
hi Shujun,
Unfortunately I'm still having trouble with this. Following on from my previous comment, I am now using my own installed version of RepeatMasker and everthing seems to work until it gets to TIR learner, where I am now getting the below error.
Fri Aug 16 19:10:44 CEST 2019 Dependency checking:
All passed!
Fri Aug 16 19:10:56 CEST 2019 Obtain raw TE libraries using various structure-based programs:
/stn4/djeffrie/EDTA/bin/TIR-Learner1.19/../GenericRepeatFinder/bin//grf-main: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /stn4/djeffrie/EDTA/bin/TIR-Learner1.19/../GenericRepeatFinder/bin//grf-main)
/stn4/djeffrie/EDTA/bin/TIR-Learner1.19/../GenericRepeatFinder/bin//grf-main: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /stn4/djeffrie/EDTA/bin/TIR-Learner1.19/../GenericRepeatFinder/bin//grf-main)
Apparently I don't have the correct GLIBCXX version?
Any ideas?
Best
Dan
Originally posted by @DanJeffries in #11 (comment)
Hello,
I'm currently trying to run EDTA on the cluster of my laboratory, and I encounter an issue that looks similar to the one listed below in the EDTA issues, except I'm running on the 1.8 version. i installed it through the step by step conda installation (for some reason, the one line conda installation doesn't want to work on my devices).
I encounter this error :
Mon Feb 10 17:57:54 CET 2020 EDTA_raw: Check dependencies, prepare working directories.
Mon Feb 10 17:58:14 CET 2020 Start to find LTR candidates.
Mon Feb 10 17:58:14 CET 2020 Identify LTR retrotransposon candidates from scratch.
Mon Feb 10 18:39:20 CET 2020 Finish finding LTR candidates.
Mon Feb 10 18:39:20 CET 2020 Start to find TIR candidates.
Mon Feb 10 18:39:20 CET 2020 Identify TIR candidates from scratch.
Species: others
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "/beegfs/data/tkastylevsky/programs/EDTA/bin/TIR-Learner2.4/Module3_New/getDataset.py", line 11, in <module>
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/sklearn/preprocessing/__init__.py", line 8, in <module>
from .data import Binarizer
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 18, in <module>
from scipy import stats
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/scipy/stats/__init__.py", line 348, in <module>
from .stats import *
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/scipy/stats/stats.py", line 177, in <module>
from . import distributions
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/scipy/stats/distributions.py", line 13, in <module>
from . import _continuous_distns
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/scipy/stats/_continuous_distns.py", line 15, in <module>
from scipy._lib._numpy_compat import broadcast_to
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/scipy/_lib/_numpy_compat.py", line 10, in <module>
from numpy.testing.nosetester import import_nose
ModuleNotFoundError: No module named 'numpy.testing.nosetester'
cat: '*-+-DTA.fa': No such file or directory
cat: '*-+-DTC.fa': No such file or directory
cat: '*-+-DTH.fa': No such file or directory
cat: '*-+-DTM.fa': No such file or directory
cat: '*-+-DTT.fa': No such file or directory
cat: '*-+-NonTIR.fa': No such file or directory
cat: '*-+-*-+-*.gff3': No such file or directory
rm: cannot remove '*-+-*-+-*.gff3': No such file or directory
Traceback (most recent call last):
File "/beegfs/data/tkastylevsky/programs/EDTA/bin/TIR-Learner2.4/Module3_New/CombineAll.py", line 75, in <module>
f_m3=removeDupinSingle("%s.gff3"%(genome_Name+spliter+"Module3"))
File "/beegfs/data/tkastylevsky/programs/EDTA/bin/TIR-Learner2.4/Module3_New/CombineAll.py", line 57, in removeDupinSingle
f=pd.read_csv(file,header=None,sep="\t") #shujun
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/beegfs/data/tkastylevsky/programs/EDTA/bin/TIR-Learner2.4/Module3/GetAllSeq.py", line 32, in GetListFromFile
f=open(file,"r+")
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/beegfs/data/tkastylevsky/programs/EDTA/bin/TIR-Learner2.4/Module3/GetAllSeq.py", line 63, in <module>
pool.map(GetListFromFile,fileList) #shujun
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 288, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/beegfs/home/tkastylevsky/.conda/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 670, in get
raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
mv: cannot stat 'TIR-Learner/*FinalAnn*.gff3': No such file or directory
mv: cannot stat 'TIR-Learner/*FinalAnn*.fa': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /beegfs/data/tkastylevsky/programs/EDTA/util/rename_tirlearner.pl line 18.
Warning: LOC list galgal6_chr1.fa.mod.TIR.ext30.list is empty.
Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!
Mon Feb 10 21:29:38 CET 2020 Start to find Helitron candidates.
Mon Feb 10 21:29:38 CET 2020 Identify Helitron candidates from scratch.
Tue Feb 11 01:27:12 CET 2020 Finish finding Helitron candidates.
Tue Feb 11 01:27:12 CET 2020 Execution of EDTA_raw.pl is finished!
ERROR: Raw TIR results not found in galgal6_chr1.fa.mod.EDTA.raw/galgal6_chr1.fa.mod.TIR.raw.fa at /beegfs/data/tkastylevsky/programs/EDTA/EDTA.pl line 368.
thanks in advance,
Dear Shujun,
Thanks for developing EDTA. It's really helpful.
I am now running this pipeline for my genome but encounter an error:
2020-02-05 19:50:18,695 -INFO- generating gene anntations
Traceback (most recent call last):
File "/media/bulk_01/users/cai020/software/miniconda3/envs/EDTA/bin/TEsorter", line 10, in
sys.exit(main())
File "/media/bulk_01/users/cai020/software/miniconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/app.py", line 976, in main
pipeline(Args())
File "/media/bulk_01/users/cai020/software/miniconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/app.py", line 171, in pipeline
for rc in Classifier(gff, db=args.hmm_database, fout=fc):
File "/media/bulk_01/users/cai020/software/miniconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/app.py", line 391, in classify
for rc in self.parse():
File "/media/bulk_01/users/cai020/software/miniconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/app.py", line 380, in parse
line = LTRgffLine(line)
File "/media/bulk_01/users/cai020/software/miniconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/app.py", line 609, in init
super(LTRgffLine, self).init(line)
File "/media/bulk_01/users/cai020/software/miniconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/app.py", line 604, in init
self.attributes = self.parse(self.attributes)
File "/media/bulk_01/users/cai020/software/miniconda3/envs/EDTA/lib/python3.6/site-packages/TEsorter/app.py", line 606, in parse
return dict(kv.split('=') for kv in attributes.split(';'))
ValueError: dictionary update sequence element #0 has length 3; 2 is required
Warning...unknown stuff <
my command line is below: (EDTA v1.7.9)
EDTA.pl -genome $genome -species others -step all -overwrite 0 -cds $cds -sensitive 0 -anno 1 -evaluate 1 -threads $thread -repeatmasker $repeatMasker
I checked your code and guess this might be caused by cleanup CDS with TEsorter, but not sure. I already generate $genome.mod.MAKER.masked, $genome.mod.EDTA.TEanno.gff/sum results. Now evaluating the level of inconsistency is running.
Could you please help me figure it out? Thank you very much in advance.
Best regards,
Chengcheng
Hi,
Here are the count from the TE library genome.FLYE.sixLongest.fa.EDTA.TElib.fa
DNA/DTA 52
DNA/DTC 50
DNA/DTH 476
DNA/DTM 654
DNA/DTT 2722
DNA/Helitron 15
LTR/Gypsy 38
LTR/unknown 20
MITE/DTA 75
MITE/DTC 10
MITE/DTH 88
MITE/DTM 104
MITE/DTT 570
Then I ran RepeatMasker
RepeatMasker genome.FLYE.sixLongest.fa -no_is -pa 8 -lib genome.FLYE.sixLongest.fa.EDTA.TElib.fa
Here is the summary
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
Retroelements 1333 187637 bp 0.16 %
SINEs: 20 1160 bp 0.00 %
Penelope 63 3689 bp 0.00 %
LINEs: 487 62803 bp 0.05 %
CRE/SLACS 0 0 bp 0.00 %
L2/CR1/Rex 12 561 bp 0.00 %
R1/LOA/Jockey 23 2819 bp 0.00 %
R2/R4/NeSL 0 0 bp 0.00 %
RTE/Bov-B 50 23094 bp 0.02 %
L1/CIN4 177 20812 bp 0.02 %
LTR elements: 826 123674 bp 0.11 %
BEL/Pao 105 7431 bp 0.01 %
Ty1/Copia 2 131 bp 0.00 %
Gypsy/DIRS1 256 55114 bp 0.05 %
Retroviral 179 10844 bp 0.01 %
DNA transposons 2314 176348 bp 0.15 %
hobo-Activator 689 43072 bp 0.04 %
Tc1-IS630-Pogo 167 54954 bp 0.05 %
En-Spm 0 0 bp 0.00 %
MuDR-IS905 0 0 bp 0.00 %
PiggyBac 18 2279 bp 0.00 %
Tourist/Harbinger 249 12509 bp 0.01 %
Other (Mirage, 24 1231 bp 0.00 %
P-element, Transib)
Rolling-circles 77 8371 bp 0.01 %
Unclassified: 51 3907 bp 0.00 %
Total interspersed repeats: 367892 bp 0.32 %
Small RNA: 431 137483 bp 0.12 %
Satellites: 130 7935 bp 0.01 %
Simple repeats: 48930 1869437 bp 1.61 %
Low complexity: 9266 432567 bp 0.37 %
==================================================
The number for the DNA transposons do not seem to match.
For example, I have more DNA elements reported from the non-redundant EDTA output than from RepeatMasker, but I would expect the opposite since RepeatMasker should count the occurrence of each element. Or am I missing something?
Hello (it's me again sorry),
following issue #14
I have a machine where I thought EDTA was running fine but it seems to work or not depending of the genome fasta provided. Here is what is happening with a fasta that seems to cause an error
I have removed any scaffolds below 5500 bp. The RepeatMasker and RepeatModeler used are not the ones from conda
Mon Oct 7 20:13:39 CEST 2019 Dependency checking:
All passed!
Mon Oct 7 20:14:01 CEST 2019 Obtain raw TE libraries using various structure-based programs:
Mon Oct 7 20:14:01 CEST 2019 EDTA_raw: Check files and dependencies, prepare working directories.
Mon Oct 7 20:14:01 CEST 2019 Start to find LTR candidates.
Mon Oct 7 20:14:01 CEST 2019 Identify LTR retrotransposon candidates from scratch.
Mon Oct 7 20:21:12 CEST 2019 Finish finding LTR candidates.
Mon Oct 7 20:21:12 CEST 2019 Start to find TIR candidates.
Mon Oct 7 20:21:12 CEST 2019 Identify TIR candidates from scratch.
Species: others
rm: cannot remove './TIR-Learner/*': No such file or directory
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/lege/anaconda3/envs/EDTA/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/lege/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/lege/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/mnt/sdc1/Alessandro/TR_2013/EDTA/bin/TIR-Learner1.23/Module3_New/getDataset2.py", line 109, in Predict
predicted_labels = model.predict(np.stack(prefeature))
File "<__array_function__ internals>", line 6, in stack
File "/home/lege/.local/lib/python3.6/site-packages/numpy/core/shape_base.py", line 421, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/sdc1/Alessandro/TR_2013/EDTA/bin/TIR-Learner1.23/Module3_New/getDataset2.py", line 130, in <module>
d = pool.map(Predict,files)
File "/home/lege/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/lege/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: need at least one array to stack
cat: '*-+-DTA.fa': No such file or directory
cat: '*-+-DTC.fa': No such file or directory
cat: '*-+-DTH.fa': No such file or directory
cat: '*-+-DTM.fa': No such file or directory
cat: '*-+-DTT.fa': No such file or directory
cat: '*-+-NonTIR.fa': No such file or directory
cat: '*-+-*-+-*.gff3': No such file or directory
rm: cannot remove '*-+-*-+-*.gff3': No such file or directory
Traceback (most recent call last):
File "/mnt/sdc1/Alessandro/TR_2013/EDTA/bin/TIR-Learner1.23/Module3_New/CombineAll.py", line 90, in <module>
keep=removeIRFhomo("%s.gff3"%(genome_Name+spliter+dataset),remove,"%sClean.gff3"%(genome_Name+spliter+dataset+spliter))
File "/mnt/sdc1/Alessandro/TR_2013/EDTA/bin/TIR-Learner1.23/Module3_New/CombineAll.py", line 76, in removeIRFhomo
f=pd.read_csv(file,header=None,sep="\t")
File "/home/lege/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/lege/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/lege/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/home/lege/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/lege/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1917, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/lege/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/lege/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/mnt/sdc1/Alessandro/TR_2013/EDTA/bin/TIR-Learner1.23/Module3/GetAllSeq.py", line 32, in GetListFromFile
f=open(file,"r+")
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/sdc1/Alessandro/TR_2013/EDTA/bin/TIR-Learner1.23/Module3/GetAllSeq.py", line 63, in <module>
pool.map(GetListFromFile,fileList) #shujun
File "/home/lege/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/lege/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
mv: cannot stat 'TIR-Learner/*FinalAnn.gff3': No such file or directory
mv: cannot stat 'TIR-Learner/*FinalAnn.fa': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /mnt/sdc1/Alessandro/TR_2013/EDTA/util/rename_tirlearner.pl line 18.
Warning: LOC list long_scaffolds.fa.TIR.ext30.list is empty.
Warning: The TIR result file has 0 bp!
Mon Oct 7 20:57:12 CEST 2019 Start to find MITE candidates.
Mon Oct 7 20:57:12 CEST 2019 Identify MITE candidates from scratch.
Mon Oct 7 20:57:12 CEST 2019 Warning: Because MITE-Hunter is too slow and only contribute limited new TIR candidates, it is taken down temporary until a better solution is found.
As a temporary fix, the TIR-Learner is used to mock the MITE-Hunter result. Please run -type tir first.
Error: MITE results not found!
ERROR: Raw TIR results not found in long_scaffolds.fa.EDTA.raw/long_scaffolds.fa.TIR.raw.fa at ./EDTA/EDTA.pl line 177.
the fasta file can be sent to you if you would like to investigate.
Thanks a lot
Hi, all
EDTA
pipeline rely on the RepeatModeler
in the conda, but it have a known issue, the conda version seems cannot produce the consensi.fa
.
Dfam-consortium/RepeatModeler#38
If you want to find TE in your genome by RepeatModeler, please install the software by yourself, assign the -repeatmodeler
and -repeatmasker
to the install path, and then use the consensi.fa.classified
as your RepeatModerler raw fa.
Hi Shujun,
I use the genome.fa.EDTA.TElib.fa
produced by EDTA.pl
as lib to run the RpeatMasker
. But the result clafficication only have LTR elements and DNA elements without specific classfication (such as LTR/Copia). How can I get more detailed repeat classicication by RpeatMasker. Do I need to run the RepeatMasker in homo mode (set -species), then combine the two lib as final result?
Here is the command and result.
The first 10 lines of genome.fa.EDTA.TElib.fa
>TE_00000000#DNA/DTH
>TE_00000001#DNA/Helitron
>TE_00000002#DNA/DTC
>TE_00000003#DNA/Helitron
>TE_00000004#DNA/DTT
>TE_00000005#DNA/Helitron
>TE_00000006#DNA/Helitron
>TE_00000007#DNA/Helitron
>TE_00000008#DNA/DTT
>TE_00000009#DNA/Helitron
RepeatMasker
RepeatMasker -pa 24 -lib genome.fa.EDTA.TElib.fa -dir ./ -xsmall -gff -e ncbi -q -no_is -norna -nolow -div 40 -cutoff 225 genome.fa
==================================================
file name: genome.fa
sequences: 125
total length: 336324563 bp (336315300 bp excl N/X-runs)
GC level: 33.22 %
bases masked: 189970773 bp ( 56.48 %)
==================================================
number of length percentage
elements* occupied of sequence
--------------------------------------------------
SINEs: 0 0 bp 0.00 %
ALUs 0 0 bp 0.00 %
MIRs 0 0 bp 0.00 %
LINEs: 0 0 bp 0.00 %
LINE1 0 0 bp 0.00 %
LINE2 0 0 bp 0.00 %
L3/CR1 0 0 bp 0.00 %
LTR elements: 97559 83216326 bp 24.74 %
ERVL 0 0 bp 0.00 %
ERVL-MaLRs 0 0 bp 0.00 %
ERV_classI 0 0 bp 0.00 %
ERV_classII 0 0 bp 0.00 %
DNA elements: 203839 83514886 bp 24.83 %
hAT-Charlie 0 0 bp 0.00 %
TcMar-Tigger 0 0 bp 0.00 %
Unclassified: 123082 29657324 bp 8.82 %
Total interspersed repeats:196388536 bp 58.39 %
Small RNA: 0 0 bp 0.00 %
Satellites: 0 0 bp 0.00 %
Simple repeats: 0 0 bp 0.00 %
Low complexity: 0 0 bp 0.00 %
==================================================
* most repeats fragmented by insertions or deletions
have been counted as one element
The query species was assumed to be homo
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
run with rmblastn version 2.6.0+
The query was compared to classified sequences in "genome.fa.EDTA.TElib.fa"
Cheers,
Zhigui
Hello,
I am trying to run EDTA in a conda environment and the setup is well done but at TIR identification step I have the following error:
Tue Jan 28 12:37:40 CET 2020 Finish finding LTR candidates.
Tue Jan 28 12:37:40 CET 2020 Start to find TIR candidates.
Tue Jan 28 12:37:40 CET 2020 Identify TIR candidates from scratch.
Species: others
/mnt/vol2/conda/miniconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is de
precated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
terminate called after throwing an instance of 'std::system_error'
what(): terminate called after throwing an instance of 'terminate called after throwing an instance of 'Resource temporarily unavailablestd::system_errorstd::system_error
'
'
what(): Resource temporarily unavailable
terminate called after throwing an instance of 'std::system_error'
Any clues about the possible solution to this error?
Dear Shujun,
When I run the EDTA with EDTA/EDTA.pl -genome non-redundant.shortname.fa -species others -step all -t 28
, but it got some error when identify TIR candidates as following:
EDTA/bin/TIR-Learner2.4/Module2/RunGRF.py", line 79, in <module>
if (len(str(records[0].seq))>int(length)+500):
IndexError: list index out of range
So how I can fix this error, Thank you!
Hello,
I'd like to try out the EDTA pipeline to construct a repeat library for a large (20Gbp) genome assembly. Would you expect this to be scalable to a genome of this size? Would it be possible to partition the genome and EDTA separately on each partition of the assembly?
Any tips or guidance would be much appreciated.
Thank you!
Lauren
I can ran successfully this two commands:
perl $EDTA_raw --genome $TAIR10_mod -species others -type ltr --overwrite 0 --threads 8
perl $EDTA_raw --genome $TAIR10_mod -species others -type helitron --overwrite 0 --threads 8
But this one:
perl $EDTA_raw --genome $TAIR10_mod -species others -type tir --overwrite 0 --threads 8
Gives the following error:
EDTA_raw: Check dependencies, prepare working directories.
Start to find LTR candidates.
Existing result file Arabidopsis_thaliana.TAIR10.dna.toplevel_14lines.fa.mod.LTR.raw.fa found! Will keep this file without rerunning this module.
Please specify -overwrite 1 if you want to rerun this module.
Finish finding LTR candidates.
Start to find TIR candidates.
Identify TIR candidates from scratch.
Species: others
dirname: missing operand
Try 'dirname --help' for more information.
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at EDTA/util/rename_tirlearner.pl line 18.
Warning: LOC list Arabidopsis_thaliana.TAIR10.dna.toplevel_14lines.fa.mod.TIR.ext30.list is empty.
Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!
Any suggestions?
Thank you!
We tried to run the pipeline using our genome assembly fasta file, xxx.fa. Unfortunately,
the error message showed up "xxx.fa.masked does not contain any sequences!"
What's going on?
Apparently, at line 48 of the code of EDTA.pl , "if (0){", should be changed to "if (1){".
Hello,
the bioArxiv paper describes the usef of MITE-Hunter but the new figure on github suggests it's not there any more. If I understand correctly it has been disabled for the moment, right?
Hello,
I've used EDTA to successfully annotate and mask my plant genome (via RepeatMasker). However, I am also interested in the actual flanking LTR pairs for each LTR retrotransposon.
I know that LTR_finder and LTR harvest report these on their own. By running them individually on a segment of my genome, I'm able to only regenerate some of these pairs (maybe less than 10% of the total unique types found by EDTA). And furthermore, many of them do not match the reported positions found by running the full EDTA pipeline.
What would be the best way to find the corresponding LTR pairs for each LTR subfamily reported?
Much appreciated,
Bryan
I run EDTA pipeline for identifying TE using about 100 fungi isolate genome sequences. All genome sequences were de novo assembly. Around 70% isolates can get good results using EDTA pipeline. However, others can not get. with the error as following: I have tried lots of times.
Mon Dec 9 17:13:26 EST 2019 EDTA_raw: Check files and dependencies, prepare working directories.
Mon Dec 9 17:13:26 EST 2019 Start to find LTR candidates.
Mon Dec 9 17:13:26 EST 2019 Identify LTR retrotransposon candidates from scratch.
awk: fatal: cannot open file `L009.fa.pass.list' for reading (No such file or directory)
Warning: LOC list - is empty.
Error: Error while loading sequencecp: cannot stat ‘L009.fa.LTRlib.fa’: No such file or directory
cp: cannot stat ‘L009.fa.LTRlib.fa’: No such file or directory
Error: LTR results not found!
ERROR: Raw LTR results not found in L009.fa.EDTA.raw/L009.fa.LTR.raw.fa at /home/AAFC-AAC/fuf/EDTA/EDTA.pl line 250.
Did you meet like this error before?
Thanks,
Fuyou
Hi Shujun,
Just a quick question. I have completed some initial tests on a small fraction (~150Mb) of a ~5Gb genome and am ready to give the real thing a try! However, as I'm sure is the case for many users, I have to tactically dodge run-time limits whilst maximising the resources I can use on the various queues on my cluster. In my case for example I can run a job for 24 hours with a lot of resources, or a job for 10 days with limited resources. So one question I have is:
Can I independently and simultaneously run the TE library steps for tir, ltr and helitron (i.e. divide an conquer) into the same output folder and then use these for the final steps in a later job? Or is there something that would get confused if I did this?
Also if you have any other tips for maximising efficiency when constrained by cluster resources I'd be very happy to hear them. Specifically if you could give some guidance as to whether parallelism or memory are more important for each step that would already be very helpful!
Best wishes, and thanks again for an awesome tool and paper!
Dan
Shujun, I tested the v1.5 with a small data set. It showed errors as:
########################################################
########################################################
Mon Aug 26 12:33:52 CDT 2019 Dependency checking:
All passed!
Mon Aug 26 12:33:57 CDT 2019 Obtain raw TE libraries using various structure-based programs:
Mon Aug 26 12:33:57 CDT 2019 EDTA_raw: Check files and dependencies, prepare working directories.
Mon Aug 26 12:33:57 CDT 2019 Start to find LTR candidates.
Mon Aug 26 12:33:57 CDT 2019 Identify LTR retrotransposon candidates from scratch.
Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa
Options:
-misschar n Define the letter representing unknown sequences; case insensitive; default: n
-Nscreen [0|1] Enable (1) or disable (0) the -nc parameter; default: 1
-nc [int] Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
-nr [0-1] Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
-minlen [int] Minimum sequence length filter after clean up; default: 100 (bp)
-cleanN [0|1] Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
-trf [0|1] Enable (1) or disable (0) tandem repeat finder (trf); default: 1
-trf_path path Path to the trf program
cp: cannot stat ‘TF05-1v012.fasta.mod.retriever.scn.adj’: No such file or directory
cp: cannot stat ‘TF05-1v012.fasta.LTRlib.fa’: No such file or directory
cp: cannot stat ‘TF05-1v012.fasta.LTRlib.fa’: No such file or directory
Error: LTR results not found!
ERROR: Raw LTR results not found in TF05-1v012.fasta.EDTA.raw/TF05-1v012.fasta.LTR.raw.fa at /homes/liu3zhen/.conda/envs/EDTA3/EDTA/EDTA.pl line 176.
Originally posted by @liu3zhenlab in #12 (comment)
Hello,
I have this error, however the pipeline is still running. Is it a benign warning?
I indeed have some very short sequences in my fasta, however I also have scaffolds of several Mb. But I don't know on which scaffolds it failed. I think that would be helpful to know that?
And what is the minimal length of a sequence?
thank you
Hi dear Shujun,
I have configured the environment for computing about the EDTA.
But I work on genome for amphibians, the genome size is bigger than other animals. I have run EDTA_raw for TIR, LTR, helitron.EDTA_raw.pl -genome frog1_genome.chromosome.fa -type tir -thrads 16
. It's been running for 48 hours and it's not finished yet.
Is there any methods for speed up for big genomes?
Thank you for your attention and reply.
Zhangyi
Hi Shujun,
Is there any method to further classify the DNA transposon that name as DNA/DTT, DNA/DTA by EDTA into specific superfamily names such as Harbinger, Mu, AC/DS and others?
Best regards,
Junpeng
Hello,
I tried a small dataset and got the results as following:
Confusion matrix of BL06.R11.pilon.fasta.EDTA.TE.fa.stat for the all category
DNA/DTC DNA/DTH DNA/DTM LTR/Copia LTR/unknown MITE/DTM Misclas_rate
DNA/DTC 7 0 0 0 0 0 0.0000
DNA/DTH 0 1163 1 0 0 0 0.0009
DNA/DTM 0 0 7936 0 3 1 0.0005
LTR/Copia 0 0 0 259 0 0 0.0000
LTR/unknown 1 1 4 0 25193 1 0.0003
MITE/DTM 0 0 2 0 0 168 0.0118
So my question is that EDTA can analyze the repeat elments, such as AT-rich, GC-rich, short repeat elments, like (AT)n.
Thanks,
Fuyou
Hi Shujun
I have been trying to install EDTA on my server but I have an annoying situation of a storage quota on my home directory meaning that the default location for the conda env isn't big enough to complete the installation. I am trying to get around it using:
conda create --prefix /scratch/djeffrie/EDTAenv
The installation seems to work fine. However when I run the pipeline I get the error:
The RMblast engine is not installed in RepeatMasker!
I see some issues for TIR_retriever with the same error but I can't figure out if its the same problem or not. I followed the suggestion [here]
(oushujun/LTR_retriever#43) of running
RepeatMasker -e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa.$rand -lib dummy060817.fa.$rand
but I didn't get the expected output relating to the taxonomy data file, I got the error
RepeatMasker::setspecies: Could not find user specified library dummy060817.fa..
Would you have any solutions for how to get round this? Perhaps its a problem of using the --prefix argument? Or maybe just the server?
Best,
Dan
Sorry this question may be irrelevant to EDTA. I am having problems installing repeatmodeler or repeatmasker. Could you please help me with this? Thanks. When I run "conda install -y -c bioconda repeatmodeler", the error messages look like this:
Collecting package metadata (current_repodata.json): done
Solving environment: failed with current_repodata.json, will retry with next repodata source.
Initial quick solve with frozen env failed. Unfreezing env and trying again.
Solving environment: failed with current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed
Initial quick solve with frozen env failed. Unfreezing env and trying again.
Solving environment: failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Package tk conflicts for:
python=3.6 -> tk[version='8.6.|>=8.6.7,<8.7.0a0|>=8.6.8,<8.7.0a0']
Package libstdcxx-ng conflicts for:
python=3.6 -> libstdcxx-ng[version='>=7.2.0|>=7.3.0']
Package repeatscout conflicts for:
repeatmodeler -> repeatscout
Package perl-threaded conflicts for:
repeatmodeler -> perl-threaded
Package readline conflicts for:
python=3.6 -> readline[version='7.|>=7.0,<8.0a0']
Package perl conflicts for:
repeatmodeler -> perl[version='5.22.0.|>=5.26.2,<5.27.0a0']
Package pip conflicts for:
python=3.6 -> pip
Package recon conflicts for:
repeatmodeler -> recon
Package libffi conflicts for:
python=3.6 -> libffi[version='3.2.|>=3.2.1,<4.0a0']
Package ncurses conflicts for:
python=3.6 -> ncurses[version='6.0.|>=6.0,<7.0a0|>=6.1,<7.0a0']
Package zlib conflicts for:
python=3.6 -> zlib[version='>=1.2.11,<1.3.0a0']
Package xz conflicts for:
python=3.6 -> xz[version='>=5.2.3,<6.0a0|>=5.2.4,<6.0a0']
Package libgcc-ng conflicts for:
python=3.6 -> libgcc-ng[version='>=7.2.0|>=7.3.0']
Package trf conflicts for:
repeatmodeler -> trf
Package openssl conflicts for:
python=3.6 -> openssl[version='1.0.|1.0.*,>=1.0.2l,<1.0.3a|>=1.0.2m,<1.0.3a|>=1.0.2n,<1.0.3a|>=1.0.2o,<1.0.3a|
=1.0.2p,<1.0.3a|>=1.1.1a,<1.1.2a|>=1.1.1c,<1.1.2a']
Package repeatmasker conflicts for:
repeatmodeler -> repeatmasker
Package perl-text-soundex conflicts for:
repeatmodeler -> perl-text-soundex
Package rmblast conflicts for:
repeatmodeler -> rmblast
Package sqlite conflicts for:
python=3.6 -> sqlite[version='>=3.20.1,<4.0a0|>=3.22.0,<4.0a0|>=3.23.1,<4.0a0|>=3.24.0,<4.0a0|>=3.25.2,<4.0a0|>
=3.26.0,<4.0a0|>=3.29.0,<4.0a0']
Hello, I am running the whole pipeline on a huge server. I specified 64 cores, but for the last 4 hours, the program (LTR_FINDER
) is using only 6 threads that running on ~20% each resulting in ~single core run. Wonder what might have gone wrong.
I installed EDTA using conda (following instructions from README) and run it as follows
perl EDTA/EDTA.pl -genome my_genome.fasta -species others -step all -anno 1 -threads 64
In htop
the executed program looks like this:
.../LTR_FINDER_parallel -seq scaffolds.fasta -threads 64 -harvest_out -size 1000000 -time 300
Thanks for making EDTA, it was a good twist in a benchmarking paper :-). By the way, did you try to compare EDTA with PiRATE? It also seems like a quite comprehensive pipeline, but I could find a comparison of the two.
Hi !
Thank you very much for this great tool! I was really pleased to discover it.
I have comments/questions related to RepeatModeler.
The version available within bioconda was wrong until recently (I fixed it before Christmas).
The RepeatModeler fix involved a small update of the RepeatMasker recipe. It also include trf by default now.
So I guess you could update the installation procedure:
conda install -n EDTA -y cd-hit repeatmodeler muscle mdust blast-legacy java-jdk perl perl-text-soundex multiprocess regex tensorflow=1.14.0 keras=2.2.4 scikit-learn=0.19.0 biopython pandas glob2 python=3.6
.
RepeatModeler 2.0 now supports LTR structural search using a combination of LTR_harvest and LTR_retriever. How this will affect the result of EDTA? Do you have a benchmark? Should we avoid to use RepeatModeler LTR detection?
Hello,
I am rerunning the last push in the same folder and get errors, here is the STDOUT and STDERR
This is a follow-up of this issue:
#10
./EDTA/EDTA.pl -genome Avaga.Masurca.Graal.min5500.fa -species others -step all -t 48 2>&1 |tee EDTA.log
########################################################
##### Extensive de-novo TE Annotator (EDTA) v1.5 ####
##### Shujun Ou ([email protected]) ####
########################################################
Mo Aug 19 10:52:23 CEST 2019 Dependency checking:
All passed!
Mo Aug 19 10:52:33 CEST 2019 Obtain raw TE libraries using various structure-based programs:
Mo Aug 19 10:52:33 CEST 2019 EDTA_raw: Check files and dependencies, prepare working directories.
Mo Aug 19 10:52:33 CEST 2019 Start to find LTR candidates.
Mo Aug 19 10:52:33 CEST 2019 Existing result file Avaga.Masurca.Graal.min5500.fa.LTRlib.fa found! Will keep this file without rerunning this module.
Please specify -overwrite 1 if you want to rerun this module.
Mo Aug 19 10:52:33 CEST 2019 Finish finding LTR candidates.
Mo Aug 19 10:52:33 CEST 2019 Start to find TIR candidates.
Mo Aug 19 10:52:33 CEST 2019 Identify TIR candidates from scratch.
Species: others
/media/urbe/MyBDrive/Alessandro/TR_annotation/EDTA/bin/TIR-Learner1.22/TIR-Learner.sh: 95: [: others: unexpected operator
/media/urbe/MyBDrive/Alessandro/TR_annotation/EDTA/bin/TIR-Learner1.22/TIR-Learner.sh: 95: [: others: unexpected operator
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/media/urbe/MyBDrive/Alessandro/TR_annotation/EDTA/bin/TIR-Learner1.22/Module3_New/getDataset2.py", line 107, in Predict
model = load_model(path+"/Module3_New/"+'CNN0724.h5')
File "/home/urbe/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/saving.py", line 249, in load_model
optimizer_config, custom_objects=custom_objects)
File "/home/urbe/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizers.py", line 838, in deserialize
printable_module_name='optimizer')
File "/home/urbe/.local/lib/python3.6/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 194, in deserialize_keras_object
return cls.from_config(cls_config)
File "/home/urbe/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizers.py", line 159, in from_config
return cls(**config)
File "/home/urbe/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizers.py", line 471, in __init__
super(Adam, self).__init__(**kwargs)
File "/home/urbe/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizers.py", line 68, in __init__
'passed to optimizer: ' + str(k))
TypeError: Unexpected keyword argument passed to optimizer: name
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/media/urbe/MyBDrive/Alessandro/TR_annotation/EDTA/bin/TIR-Learner1.22/Module3_New/getDataset2.py", line 131, in <module>
d = pool.map(Predict,files)
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
TypeError: Unexpected keyword argument passed to optimizer: name
cat: '*-+-DTA.fa': No such file or directory
cat: '*-+-DTC.fa': No such file or directory
cat: '*-+-DTH.fa': No such file or directory
cat: '*-+-DTM.fa': No such file or directory
cat: '*-+-DTT.fa': No such file or directory
cat: '*-+-NonTIR.fa': No such file or directory
cat: '*-+-*-+-*.gff3': No such file or directory
rm: cannot remove '*-+-*-+-*.gff3': No such file or directory
Traceback (most recent call last):
File "/media/urbe/MyBDrive/Alessandro/TR_annotation/EDTA/bin/TIR-Learner1.22/Module3_New/CombineAll.py", line 90, in <module>
keep=removeIRFhomo("%s.gff3"%(genome_Name+spliter+dataset),remove,"%sClean.gff3"%(genome_Name+spliter+dataset+spliter))
File "/media/urbe/MyBDrive/Alessandro/TR_annotation/EDTA/bin/TIR-Learner1.22/Module3_New/CombineAll.py", line 76, in removeIRFhomo
f=pd.read_csv(file,header=None,sep="\t")
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/urbe/anaconda3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Traceback (most recent call last):
File "/media/urbe/MyBDrive/Alessandro/TR_annotation/EDTA/bin/TIR-Learner1.22/Module3/GetAllSeq.py", line 62, in <module>
file=open(f,"r+")
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
mv: cannot stat 'TIR-Learner/*FinalAnn.gff3': No such file or directory
mv: cannot stat 'TIR-Learner/*FinalAnn.fa': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /media/urbe/MyBDrive/Alessandro/TR_annotation/EDTA/util/rename_tirlearner.pl line 18.
Warning: LOC list Avaga.Masurca.Graal.min5500.fa.TIR.ext30.list is empty.
Warning: The TIR result file has 0 bp!
Mo Aug 19 10:52:56 CEST 2019 Start to find MITE candidates.
Mo Aug 19 10:52:56 CEST 2019 Existing result file Avaga.Masurca.Graal.min5500.fa.MITE.raw.fa found! Will keep this file without rerunning this module.
Please specify -overwrite 1 if you want to rerun this module.
Mo Aug 19 10:52:56 CEST 2019 Finish finding MITE candidates.
Mo Aug 19 10:52:56 CEST 2019 Start to find Helitron candidates.
Mo Aug 19 10:52:56 CEST 2019 Existing result file Avaga.Masurca.Graal.min5500.fa.Helitron.raw.fa found! Will keep this file without rerunning this module.
Please specify -overwrite 1 if you want to rerun this module.
Mo Aug 19 10:52:56 CEST 2019 Finish finding Helitron candidates.
Hi all,
Just update the testing result. It seems that new release TIR can close this issue.
Step | maxvmem | time(h) | raw_fa size |
---|---|---|---|
Helitron | 7.914GB | 2.352222 | 1.3Mb |
MITE | 1.529GB | 1.815278 | 4.9kb |
TIR | 42.127GB | 4.895556 | 20Mb |
LTR | 19.049GB | 1.417222 | 2.5Mb |
EDTA_Final | 19.388GB | 19.42389 | 19Mb |
Thanks for the developing.
Bests,
Zhigui
Originally posted by @baozg in #4 (comment)
Dear Shujun,
I ran the EDTA pipeline v1.7.1 for a genome with the following command. It failed at the step of identify LTR retrotransposon candidates from scratch.
perl /LabShares/Tools/EDTA/EDTA/EDTA.pl -genome DR_OL_ens90.fa -step all -cds DR_OL_cds_ens90.fa -sensitive 1 -anno 1
The STDERR showed an error:
Unsuccessful stat on filename containing newline at /LabShares/Tools/EDTA/EDTA/bin/LTR_FINDER_parallel/LTR_FINDER_parallel line 156, line 10314.
In the folder of LTR, a list of intermediate files have been generated:
alluniRefprexp082813.197723
alluniRefprexp082813.197723.phr
alluniRefprexp082813.197723.pin
alluniRefprexp082813.197723.psq
DR_OL_ens90.fa.finder.combine.scn
DR_OL_ens90.fa.harvest.scn
DR_OL_ens90.fa.list
DR_OL_ens90.fa.LTR.intact.fa
DR_OL_ens90.fa.LTR.intact.fa.ori
DR_OL_ens90.fa.LTR.intact.fa.ori.dusted
DR_OL_ens90.fa.LTR.intact.fa.ori.dusted.cleanup
DR_OL_ens90.fa.rawLTR.scn
Tpases020812DNA.197723
Tpases020812DNA.197723.phr
Tpases020812DNA.197723.pin
Tpases020812DNA.197723.psq
Tpases020812LINE.197723
Tpases020812LINE.197723.phr
Tpases020812LINE.197723.pin
Tpases020812LINE.197723.psq
All DR_OL_ens90.fa.LTR.intact* files are empty. Could you give me some suggestions to fix this?
Here I have the full STDERR attached for your reference.
Thank you so much.
Best,
Yixuan
AN_EDTA_DR_OL_ens90.txt
Hello,
I might have a suggestion:
I was wondering if it wouldn't be useful for the final user to be able to get a file with the coordinates of the transposon, for example if one is interested to look at their position in the genome.
Thanks for EDTA, it's a cool pipeline.
Hi,
I'm annotating a genome pretty distant from homo sapiens. Checking the RM_ output folder, it looks like the call to RepeatMasker just queries this as the default species ("The query species was assumed to be homo
" in the RM_/.fasta.tbl output in the *.fasta.EDTA.final/ folder). Is there any way to change this in my call to EDTA so I can most effectively use a homology-based TE calling method? Alternatively, I could just run RepeatMasker myself at the end using *.fasta.EDTA.TElib.fa as a custom library
Shujun,
Thank you for updating EDTA. I am using 1.3 on a maize genome and the MITE step took a long time (~11 days). The problem is that no MITE raw sequences were output after TIR and MITE runs. Now the running is at Helitron. I will update after the run is finished.
-Sanzhen
Dear Shujun
I managed to generate the raw files for LTR, TIR, MITE (Copy of TIR) and Helitrons. I am getting the following error while running EDTA_processF.pl.
/usr/local_sbs/source_files/EDTA/EDTA_processF.pl -genome HaploidAssemblyPilonPolishedCleaned.fasta -ltr HaploidAssemblyPilonPolishedCleaned.fasta.EDTA.raw/HaploidAssemblyPilonPolishedCleaned.fasta.LTR.raw.fa -tir HaploidAssemblyPilonPolishedCleaned.fasta.EDTA.raw/HaploidAssemblyPilonPolishedCleaned.fasta.TIR.raw.fa -helitron HaploidAssemblyPilonPolishedCleaned.fasta.EDTA.raw/HaploidAssemblyPilonPolishedCleaned.fasta.Helitron.raw.fa -mite HaploidAssemblyPilonPolishedCleaned.fasta.EDTA.raw/HaploidAssemblyPilonPolishedCleaned.fasta.MITE.raw.fa
Use of uninitialized value within @argv in pattern match (m//) at /usr/local/local_sbs/source_files/EDTA/util/cleanup_nested.pl line 41.
Use of uninitialized value $blastplus in string eq at /usr/local/local_sbs/source_files/EDTA/util/cleanup_nested.pl line 49.
Could you help me with this?
The picture shows the final files that were generated.
Shujun,
I've installed the Docker version to our HPC. EDTA progesses through the LTR finding, but then crashes when trying to identify the TIR. I'm pasting in the full slurm below. Any help would be greatly appreciated.
Thanks, Jeff
> ##### Shujun Ou ([email protected]) ####
> ########################################################
>
>
>
> Fri Jan 24 02:09:53 UTC 2020 Dependency checking:
> All passed!
> Fri Jan 24 02:10:03 UTC 2020 Obtain raw TE libraries using various structure-based programs:
> Fri Jan 24 02:10:03 UTC 2020 EDTA_raw: Check files and dependencies, prepare working directories.
>
> Fri Jan 24 02:10:03 UTC 2020 Start to find LTR candidates.
>
> Fri Jan 24 02:10:03 UTC 2020 Identify LTR retrotransposon candidates from scratch.
>
> Warning: LOC list ordered_atriplex_hortensis_04Apr2019_hkF1T_namedcorrectly_clean_00.fasta.mod.ltrTE.veryfalse is empty.
> Fri Jan 24 02:27:35 UTC 2020 Finish finding LTR candidates.
>
> Fri Jan 24 02:27:35 UTC 2020 Start to find TIR candidates.
>
> Fri Jan 24 02:27:35 UTC 2020 Identify TIR candidates from scratch.
>
> Species: others
> 2020-01-24 02:55:41.202424: F tensorflow/python/lib/core/bfloat16.cc:675] Check failed: PyBfloat16_Type.tp_base != nullptr
> Aborted (core dumped)
> cat: '*-+-DTA.fa': No such file or directory
> cat: '*-+-DTC.fa': No such file or directory
> cat: '*-+-DTH.fa': No such file or directory
> cat: '*-+-DTM.fa': No such file or directory
> cat: '*-+-DTT.fa': No such file or directory
> cat: '*-+-NonTIR.fa': No such file or directory
> cat: '*-+-*-+-*.gff3': No such file or directory
> rm: cannot remove '*-+-*-+-*.gff3': No such file or directory
> Traceback (most recent call last):
> File "/EDTA/bin/TIR-Learner2.4/Module3_New/CombineAll.py", line 75, in <module>
> f_m3=removeDupinSingle("%s.gff3"%(genome_Name+spliter+"Module3"))
> File "/EDTA/bin/TIR-Learner2.4/Module3_New/CombineAll.py", line 57, in removeDupinSingle
> f=pd.read_csv(file,header=None,sep="\t") #shujun
> File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 685, in parser_f
> return _read(filepath_or_buffer, kwds)
> File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 457, in _read
> parser = TextFileReader(fp_or_buf, **kwds)
> File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
> self._make_engine(self.engine)
> File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
> self._engine = CParserWrapper(self.f, **self.options)
> File "/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py", line 1917, in __init__
> self._reader = parsers.TextReader(src, **kwds)
> File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
> pandas.errors.EmptyDataError: No columns to parse from file
> multiprocessing.pool.RemoteTraceback:
> """
> Traceback (most recent call last):
> File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 119, in worker
> result = (True, func(*args, **kwds))
> File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
> return list(map(*args))
> File "/EDTA/bin/TIR-Learner2.4/Module3/GetAllSeq.py", line 32, in GetListFromFile
> f=open(file,"r+")
> FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
> """
>
> The above exception was the direct cause of the following exception:
>
> Traceback (most recent call last):
> File "/EDTA/bin/TIR-Learner2.4/Module3/GetAllSeq.py", line 63, in <module>
> pool.map(GetListFromFile,fileList) #shujun
> File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 266, in map
> return self._map_async(func, iterable, mapstar, chunksize).get()
> File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 644, in get
> raise self._value
> FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
> mv: cannot stat 'TIR-Learner/*FinalAnn*.gff3': No such file or directory
> mv: cannot stat 'TIR-Learner/*FinalAnn*.fa': No such file or directory
> Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /EDTA/util/rename_tirlearner.pl line 18.
> Warning: LOC list ordered_atriplex_hortensis_04Apr2019_hkF1T_namedcorrectly_clean_00.fasta.TIR.ext30.list is empty.
>
> Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
> Warning: The TIR result file has 0 bp!
>
> Fri Jan 24 02:55:51 UTC 2020 Start to find Helitron candidates.
>
> Fri Jan 24 02:55:51 UTC 2020 Identify Helitron candidates from scratch.
>
> Fri Jan 24 03:49:16 UTC 2020 Finish finding Helitron candidates.
>
> Fri Jan 24 03:49:16 UTC 2020 Execution of EDTA_raw.pl is finished!
>
> ERROR: Raw TIR results not found in ordered_atriplex_hortensis_04Apr2019_hkF1T_namedcorrectly_clean_00.fasta.EDTA.raw/ordered_atriplex_hortensis_04Apr2019_hkF1T_namedcorrectly_clean_00.fasta.TIR.raw.fa at /EDTA/EDTA.pl line 285.
> cleaning up image```
Hello again :-), I have run into the following error:
243: EDTA/bin/TIR-Learner2.4/TIR-Learner2.4.sh: cp: Argument list too long
the line that dropped the error message in the script is
cp -r $genomeName/$genomeName* temp/
where variable genomeName
is statically assigned to TIR-Learner
at the very beginning of the script.
The temp/
directory is now empty, which I am not sure if it's a problem or not.
Hello,
I used EDTA but I got an error when using the following script:
perl /PARA/pp811/anaconda3/bin/EDTA/EDTA.pl -genome ref.fasta -cds CDS.fasta -anno 1 -evaluate 1
And then I got the following error output:
Can't locate object method "end" via package "Thread::Queue" at /PARA/pp811/anaconda3/bin/EDTA/bin/LTR_FINDER_parallel/LTR_FINDER_parallel line 115, <List> line 1. cat: ref.fasta.finder.combine.scn: No such file or directory Can't locate object method "end" via package "Thread::Queue" at /PARA/pp811/anaconda3/bin/EDTA/bin/LTR_retriever/bin/LTR.identifier.pl line 125. cp: cannot stat
ref.fasta.mod.retriever.scn.adj': No such file or directory
awk: cmd. line:1: fatal: cannot open file ref.fasta.pass.list' for reading (No such file or directory) Warning: LOC list - is empty. Error: Error while loading sequencecp: cannot stat
ref.fasta.LTRlib.fa': No such file or directory
cp: cannot stat `VF36.GPM.fasta.LTRlib.fa': No such file or directory
Error: LTR results not found!
ERROR: Raw LTR results not found in ref.fasta.EDTA.raw/ref.fasta.LTR.raw.fa at /PARA/pp811/anaconda3/bin/EDTA/EDTA.pl line 284.`
Of all the results I've gotten so far, only LTR raw file,both TIR and Helitron raw fasta files are empty.
Any help or suggestion will be appreciated.
Thanks!
Hi, dear professor Shujun,
I want to use EDTA to analysis some animal genome for de nove predict the TE. However, It looks like the EDTA's lib has Rice, I don't find a parameter for specify an animal lib.
Could the EDTA specify an animal lib? And, How about the effect of EDTA work on animal?
ZhangYi.
Hello,
I have yesterday started the EDTA pipeline, and I am very excited. However, I get the error that certain LTR files are not present, after 1 hour of run time. Do you know what is going on? I call the script as follows:
perl /data/modules/python/python-anaconda2-2019.10-EDTA/envs/EDTA/share/EDTA/EDTA.pl -genome ref.fa -species others -step all -curatedlib library7birds.fa -sensitive -repeatmasker /data/biosoftware/RepeatMasker/RepeatMasker/ 1 -anno 1 -evaluate 1 -t 15
The input library and genome are soft links (ln -s).
I then get the following error output:
perl rename_LTR.pl genome.fa target_sequence.fa LTR_retriever.defalse
cp: cannot stat 'ref.fa.LTR.intact.fa.gff3': No such file or directory
cp: cannot stat 'ref.fa.LTRlib.fa': No such file or directory
cp: cannot stat 'ref.fa.LTRlib.fa': No such file or directory
Error: LTR results not found!
ERROR: Raw LTR results not found in ref.fa.EDTA.raw/ref.fa.LTR.raw.fa at /data/modules/python/python-anaconda2-2019.10-EDTA/envs/EDTA/share/EDTA/EDTA.pl line 284.
My conda version is Miniconda2 4.4.10
, whose base
use python2
. conda install python=3.6
damaged the env of base
, following import error for conda:
$ conda -h
Traceback (most recent call last):
File "~/conda/bin/conda", line 7, in <module>
from conda.cli import main
ImportError: No module named conda.cli
So it is better to not install python=3.6
in conda's base
.
Hi,
I copy and pasted the installation instructions from the README and am running the the script in the active EDTA environment. It seems that the EDTA.pl script chokes trying to use TIR-Learner. Looking at my output, all the correct folders and such are there. After crashing, the Helitron, MITE, and TIR folders are empty but the LTR folder is not. The only file in the parent output folder is genome.fasta.LTR.raw.fa
.
Is there a way to run the Perl pipeline script but just not use TIR-Learner, or even just not call TIRs? I'm still interested in the other features, and even if I could just use EDTA for Helitrons, LTRs, MITEs, filtering, consensus calling, and repeat classifying I would be happy.
The lines before the crash start with what's seen in #2 (comment). Then it's a traceback starting from ~/bin/EDTA/bin/TIR-Learner1.12/Module1/Fullcov.py, line 52, in <module> ProcessHomology(genome_Name)
. After that, there's some cryptic errors including
cat: '*DTA-+-select.fa': No such file or directory
cat: '*-+-*-+-*.gff3': No such file or directory
There's a few more error traces after that, with each Traceback followed by various errors from files not being found by rm
, cp
, mv
, cat
.
Lastly, in the last few lines before the crash, I get these lines which tell me that it certainly is a problem with TIR-Learner
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3' mv: cannot stat 'TIR-Learner/*FinalAnn.gff3': No such file or directory mv: cannot stat 'TIR-Learner/*FinalAnn.fa': No such file or directory cp: cannot stat 'TIR-Learner-Result/TIR-Learner_FinalAnn.fa': No such file or directory Error: TIR results not found!
ERROR: Raw TIR results not found in genome.fasta.EDTA.raw/genome.fasta.TIR.raw.fa at ~bin/EDTA/EDTA.pl line 145.
While bug testing I've just been using the first two scaffolds of my genome. That file is attached.
Thanks!
Hello,
I noticed that blastx and TIR learner ignore the -threads settings, and take all the available threads.
EDIT: sorry, was my mistake, closing
Dear Shujun,
When the -cds option is added, it seems like EDTA switches to use python2 for TEsorter. See cleanup_TE.pl line 36.
python2 $TEsorter $cds -p $threads
;
This caused many incompatible issues for like Biopython between python2 and python3.
I installed a separate conda env of python2 for the TEsorter, but still got error in test:
(py2) [qiushi.li@itbioyeaman03 test]$ python ../TEsorter.py rice6.9.5.liban
2019-11-27 07:08:19,201 -WARNING- No module named drmaa
grid computing is not available
2019-11-27 07:08:19,203 -INFO- VARS: {'seq_type': 'nucl', 'min_coverage': 20, 'disable_pass2': False, 'tmp_dir': './tmp', 'processors': 4, 'sequence': 'rice6.9.5.liban', 'no_library': False, 'p2_identity': 80.0, 'no_cleanup': False, 'force_write_hmmscan': False, 'p2_length': 80.0, 'prefix': 'rice6.9.5.liban.rexdb', 'max_evalue': 0.001, 'p2_coverage': 80.0, 'pass2_rule': '80-80-80', 'hmm_database': 'rexdb', 'no_reverse': False}
2019-11-27 07:08:19,203 -INFO- checking dependencies:
2019-11-27 07:08:19,213 -INFO- hmmer 3.2.1 OK
Traceback (most recent call last):
File "../TEsorter.py", line 974, in
pipeline(Args())
File "../TEsorter.py", line 116, in pipeline
Dependency().check_blast()
File "../TEsorter.py", line 920, in check_blast
version = self.check_blast_version(program)
File "../TEsorter.py", line 939, in check_blast_version
version = re.compile(r'blast\S* ([\d.+]+)').search(out).groups()[0]
AttributeError: 'NoneType'
Best,
Qiushi
Dear Shujun,
Please see this error,
Dependency checking: All passed!
Can't find label ALL at /data/home/qiushi_volumn1/programs/EDTA/EDTA.pl line 118.
perl version info:
This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi
Many thanks~
Your loyal fans
Hi, Shujun,
I am testing edta on our school's computer. However, the job is always killed. Here is my script:
module load edta/20190108
module load ltrretriever/1.6
EDTA.pl -genome Zm-I-REFERENCE-FL-1.0.fa -species Maize -threads 2
Here is the error message:
########################################################
########################################################
Tue Aug 6 13:57:01 EDT 2019 Dependency checking:
All passed!
Tue Aug 6 13:57:13 EDT 2019 Obtain raw TE libraries using various structure-based programs:
sh: line 1: 32154 Killed /apps/edta/20190108/edta/bin/genometools-1.5.10/bin/gt suffixerator -db Zm-I-REFERENCE-FL-1.0.fa -indexname Zm-I-REFERENCE-FL-1.0.fa -
Can't locate object method "end" via package "Thread::Queue" at /apps/edta/20190108/edta/bin/LTR_FINDER_parallel/LTR_FINDER_parallel line 115, line 10732.
cat: Zm-I-REFERENCE-FL-1.0.fa.finder.combine.scn: No such file or directory
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any.
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any.
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: Ignoring FASTA modifier(s) found because the input was not expected to have any.
grep: Zm-I-REFERENCE-FL-1.0.fa.retriever.scn: No such file or directory
Argument "" isn't numeric in numeric gt (>) at /apps/edta/20190108/edta/bin/LTR_retriever/LTR_retriever line 355.
ERROR: No candidate is found in the file(s) you specified.
cp: cannot stat ‘Zm-I-REFERENCE-FL-1.0.fa.LTRlib.fa’: No such file or directory
cp: cannot stat ‘Zm-I-REFERENCE-FL-1.0.fa.LTRlib.fa’: No such file or directory
Error: LTR results not found!
ERROR: Raw LTR results not found in Zm-I-REFERENCE-FL-1.0.fa.EDTA.raw/Zm-I-REFERENCE-FL-1.0.fa.LTR.raw.fa at /apps/edta/20190108/edta/EDTA.pl line 170.
slurmstepd: error: Detected 1 oom-kill event(s) in step 40042225.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
I found in LTR folder Zm-I-REFERENCE-FL-1.0.fa.harvest.scn and Zm-I-REFERENCE-FL-1.0.fa.rawLTR.scn are empty.
Looking forward to your reply!
Best,
Ying
Hello,
not sure if it's EDTA or RepeatMasker, but I ran on the EDTA output
RepeatMasker -pa 4 -no_is -norna -nolow -div 40 -lib genome.sixLongest.fa.EDTA.TElib.fa -cutoff 225 -gff genome.FLYE.sixLongest.fa
buildSummary.pl genome.FLYE.sixLongest.fa.out > summary.tbl
and some sequences in the output repeat tables have a "doubled" ID like
TE_00001277_INT-int
while the sequence ID is
>TE_00001277_INT#LTR/Gypsy
Any idea of where the "int" after the "INT" comes from? I concede it seems absolutely benign but I want to be sure it doesn't clue to a bigger problem.
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.