Giter VIP home page Giter VIP logo

hamstr's People

Contributors

ebersber avatar holgerbgm avatar jurudo avatar trvinh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hamstr's Issues

oneSeq not working

Hello,

I followed the example provided in the README but I get the following error:

~/tools/HaMStR$ ./bin/oneSeq.pl -seqFile=data/infile.fa -seqid=P83876 -refspec=HUMAN@9606@1 -minDist=genus -maxDist=kingdom -coreOrth=5 -cleanup -global
Use of uninitialized value $force in numeric eq (==) at ./bin/oneSeq.pl line 1459.

MSYMLPHLHNGWQVDQAILSEEDRVVVIRFGHDWDPTCMKMDEVLYSIAEKVKNFAVIYLVDITEVPDFNKMYELYDPCTVMFFFRNKHIMIDLGTGNNNKINWAMEDKQEMVDIIETVYRGARKGRGLVVSPKDYSTKYRY
Your sequence was named: nlTazcr

No FAS filter for core-orthologs set.
Annotation tools found in /home/ubuntu/annotation_fas!

#########################
--> Running annoFAS.pl

--> acquiring sequence lengths
...done: sequence lengths are calculated for input file.
--> starting: cast
--> starting: decodeanhmm
decodeanhmm 1.1f
Copyright (C) 1998 by Anders Krogh
Tue Feb 11 15:42:51 2020

Model in file "/home/ubuntu/annotation_fas/TMHMM/TMHMM2.0.model" parsed successfully.
--> starting: COILS2
       1 sequences      142 aas        0 in coil
--> starting: signalp.pl
--> starting: seg
--> starting: pfam_scan.pl
Starting hmmscan now...
hmmscan finished.
Creating output file...
finished.
Deleting temporary output file...
--> starting: smart_scan_v4.pl
Starting hmmscan now...
hmmscan finished.
Creating output file...
finished.
Deleting temporary output file...
--> annotation finished.
tool start: 11/02/2020 15:42:51:0
tool end  : 11/02/2020 15:42:54:0
#####################

Building up the taxonomy tree. Start 1581435774

--------------------- WARNING ---------------------
MSG: Could not merge the lineage of 40276 with the rest of the tree
---------------------------------------------------
Can't call method "add_Descendent" on an undefined value at /home/ubuntu/anaconda3/envs/hamstr/lib/site_perl/5.26.2/Bio/Tree/TreeFunctionsI.pm line 529.

Best,
Cata

scoreThreshold

Currently, the scoreThreshold is constitutively on. Implement a switch to de-activate it.

Maybe a bug: Linked source for ....not found

Hello,
I have been using this command for a long time, it woks well at HaMStR v.13.2.10. I installed HaMStR v.13.2.11 by ' conda install -c bionf hamstr' in a new system and then met this error. Is this a bug?
I am using a core_ortholog set trained by ourselves. Since input files are protein sequences, there are only *_prot.fa in blastdir. It seems that this hamstr doesn't recognize option "-protein".
Thank you!

command:
hamstr -hmmset=arthNreg2 -protein -cleartmp -cpu 12 -central -strict -refspec=drome_4 -sequence_file=Paranesidea_sp_croco.fas -taxon=Paranesidea_sp -representative
error:
checking for reference fasta files: Linked source for /home/depengli/HaMStR/blast_dir/drome_4/drome_4.fa not found in /mnt/d/Croco_hamstr_workdir! at /home/depengli/HaMStR/bin/hamstr line 1172.

accelerated mode

uni-directional search using core-specific cutoff (bit, fas,...)

original sequence not found

I add a my species to the reference databases use add addTaxon1s command, but it always show the "original sequence not found" when i choose the new species as ref-species with common "h1s --seqFile infile.fa --seqName test3 --refspec ARIFI@158543@1 --force", can you help me solved it ?

FAS: Input

Reading in large input files with FAS can take a moment. This is not critical when doing single FAS runs but might accumulate to larger runtimes during a oneseq run.
Adding a new input format that can be read faster might be something to work on for future releases, maybe even binary files for the oneseq library species.

hmmsearch ranking

Allow the ranking of hmmsearch hits by either global score (current implementation) or by the score of the best domain.

For loop not working

Hello

I tried running a loop for different proteins using the following command:

for PROTEIN in *fa; do oneSeq -seqFile=$PROTEIN -seqname=$PROTEIN -refspec=HUMAN@9606@1 -minDist=genus -maxDist=phylum -coreOrth=5 -cleanup -global -cpu=20 -countercheck; done

It runs with the first protein until it creates the *.extended.fa file and then it starts doing the second protein without any other output for the first protein. If I run the proteins one by one it works fine.

Best,
Cata

data redundancy and unused script(s)

  • Packages in Bio folder are also exist in BioPerl/Bio folder.
  • taxonomy downloaded from server still contains alt data. Will these old data be needed any where? The tar file taxdump.tar.gz can be also deleted.
  • script /bin/fas/Pfam/Pfam-hmms/count_clan_numbers.pl is unused?

Problem during annotation

Dear Ingo,

During the annotation of one dataset I got the following error:

Problem occurred: 2!
problem 1!
problem 1!
Problem occurred: 2!
problem 1!
problem 1!
problem 1!
problem 1!
problem 1!
Problem occurred: 2!
problem 1!
problem 1!
problem 1!
problem 1!
problem 1!
problem 1!
problem 1!
Problem occurred: 2!
problem 1!
problem 1!
Problem occurred: 2!
problem 1!
problem 1!
Problem occurred: 2!
Problem occurred: 2!
problem 1!
Problem occurred: 2!
problem 1!
problem 1!
problem 1!
problem 1!
Problem occurred: 2!
Problem occurred: 2!
problem 1!
problem 1!
problem 1!
problem 1!
problem 1!
problem 1!
Problem occurred: 2!
problem 1!
Problem occurred: 2!
Problem occurred: 2!
problem 1!
Problem occurred: 2!
problem 1!
Problem occurred: 2!
problem 1!
problem 1!
Problem occurred: 2!
problem 1!
Problem occurred: 2!
problem 1!
problem 1!
Problem occurred: 2!
Problem occurred: 2!
problem 1!
problem 1!
Problem occurred: 2!
Problem occurred: 2!
problem 1!
Problem occurred: 2!
Problem occurred: 2!
Problem occurred: 2!
problem 1!
problem 1!
problem 1!
problem 1!
problem 1!
Problem occurred: 2!
problem 1!
problem 1!
Problem occurred: 2!
Problem occurred: 2!
finished.
Deleting temporary output file...
--> annotation finished.
tool start: 23/02/2020 13:13:19:0
tool end  : 24/02/2020 4:12:16:0
#####################

Any clues on what it means ?

Thanks!

Best,
Cata

Error running sample file

Hello

I installed HaMStR using conda. I ran the setup_hamstr | tee log.txt and it appears that there was no error during the setup.

I tried running the sample file but I get the following error.

There was an error running HaMStR v.13.2.10

VERSION:	HaMStR v.13.2.10

HOSTNAME	lmu-thesis1


USER DEFINED PARAMTERS (inc. default values)

	 -append:	not set
	 -blastpath:	/home/ubuntu/tools/HaMStR/blast_dir/
	 -checkCoorthologsRef:	not set
	 -cleartmp:	not set
	 -concat:	not set
	 -est:	not set
	 -eval_blast:	0.0001
	 -eval_hmmer:	0.0001
	 -filter:	F
	 -hit_limit:	not set
	 -hmm:	not set
	 -hmmpath:	/home/ubuntu/tools/HaMStR/core_orthologs/
	 -hmmset:	ppkdOcJ
	 -intron:	k
	 -longhead:	not set
	 -nonoverlapping_cos:	not set
	 -outpath:	/home/ubuntu/tools/HaMStR/output/ppkdOcJ
	 -protein:	1
	 -rbh:	not set
	 -refspec:	HUMAN@9606@1
	 -relaxed:	not set
	 -representative:	not set
	 -reuse:	not set
	 -sequence_file:	DROME@[email protected]
	 -show_hmmsets:	not set
	 -sort_global_align:	not set
	 -strict:	not set
	 -taxon:	DROME@7227@1
	 -ublast:	0
INFILE PROCESSING

	A modified infile DROME@[email protected] already exists. Using this one
	Newlines from the infile have been removed

CHECKING FOR PROGRAMS

	check for blastp succeeded
	check for hmmsearch succeeded

CHECKING FOR HMMs

	running HaMStR with all hmms in /home/ubuntu/tools/HaMStR/core_orthologs//ppkdOcJ/hmm_dir
	check for /home/ubuntu/tools/HaMStR/core_orthologs//ppkdOcJ/ppkdOcJ.fa succeeded

CHECKING TAXON NAME

	using default taxon DROME@7227@1 for all sequences

CHECKING FOR REFERENCE TAXON

	 Reference species for the re-blast: HUMAN@9606@1

CHECKING FOR BLAST DATABASES

	check for /home/ubuntu/tools/HaMStR/blast_dir//HUMAN@9606@1/HUMAN@9606@1 succeeded

CHECKING FOR REFERENCE FASTA FILES

	Removing line breaks from /home/ubuntu/tools/HaMStR/genome_dir/HUMAN@9606@1/HUMAN@[email protected].
FATAL: Problems running the script nentferner.pl

PROGRAM OPTIONS

	hmmsearch will run with an e-value limit of 0.0001
	re-blast hit_limit: none applied
	Blast will run with an evalue limit of 0.0001

	check for low complexity filter setting succeeded. Chosen value is F
	HaMStR was called without the -representative option. More than one ortholog may be identified per core-ortholog group!Evaluation of predicted HaMStR orthologs.
Error: Could not find /home/ubuntu/tools/HaMStR/data/ppkdOcJ.extended.fa

Do you know what could be causing this error ?

Thank you!

Regards,
Cata

check existing annotation before calculating FAS

I noticed that the FAS calculation was not immediately cancelled even if there is no annotation available for one proteins (either seed or query), which takes unnecessary time for whatever processing. I have some (thousand) cases where it took minutes to do something and gave me nothing :-)
I suggest to check for the availability of the annotations before doing any further processing and give a proper error message to say e.g. which protein has not been annotated.

CAST not working

Hello

I am manually creating a new data set. In the step 6 (Create the annotation files for your taxon with the provided perl script) I get the following error:

Can't exec "/home/ubuntu/annotation_fas/CAST/cast": No such file or directory at /home/ubuntu/anaconda3/envs/hamstr/lib/python3.8/site-packages/greedyFAS/annoFAS.pl line 909.
Use of uninitialized value $castOut in scalar chomp at /home/ubuntu/anaconda3/envs/hamstr/lib/python3.8/site-packages/greedyFAS/annoFAS.pl line 910.
Use of uninitialized value $castOut in concatenation (.) or string at /home/ubuntu/anaconda3/envs/hamstr/lib/python3.8/site-packages/greedyFAS/annoFAS.pl line 911.

I used the following command:

annoFAS --fasta=/home/ubuntu/tools/HaMStR/genome_dir/TEST@00001@1/TEST@[email protected] --path=/home/ubuntu/tools/HaMStR/weight_dir --name=TEST@00001@1

It seems that CAST is not compiled to work with our system. I couldn't find the CAST v1.0 on their webpage.

Best,
Cata

potential of using out-dated data/tools

Provided taxonomy data together with the preprocessed genome data can lead to the using of invalid NCBI taxonomy, which causes problem when visualizing output using PhyloProfile. I suggest we should:

  • download latest NCBI taxonomy for the first time install hamstr
  • create script to check for outdated tax ID in the preprocessed genome data (either remove that species, or replace the invalid ID by the new one)

The same can be applied for annotation tools (pfam, smart, tmhmm,...). But it requires automatic tests for testing the compatible of the new version with the current data (for example check if any using functions got deprecated from the new version).

Only the preprocessed genome data (blast_dir, genome_dir, weight_dir) should be downloaded from our server.

temporary FAS input files

When calculating the FAS score, oneseq extracts the the feature architecture of the found ortholog from the species input creating a lot of temporary files. Because of the seed_id/query_id options of FAS, this is no longer neccessary and should be changed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.