macmanes-lab / oyster_river_protocol Goto Github PK

View Code? Open in Web Editor NEW

16.0 7.0 13.0 59.4 MB

Official Repository of the Oyster River Protocol for Transcriptome Assembly

License: Creative Commons Zero v1.0 Universal

Makefile 26.33% Shell 1.13% Python 0.48% PostScript 3.85% TeX 66.15% Perl 1.13% Dockerfile 0.94%

transcriptome-assembly oyster-river-protocol assembly genomics rnaseq

oyster_river_protocol's Introduction

Oyster River Protocol

Official Repository of the Oyster River Protocol for Transcriptome Assembly

Please see http://oyster-river-protocol.readthedocs.io/en/latest/ and https://hackmd.io/s/SJhOQvkVm# for instructions about how to run and install.

oyster_river_protocol's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger toniwestbrook greatfireball pythseq bluegenes medalibi adamstuckert cnyuanh mdsharma mariapm295 54mu megbowman angrybee

oyster_river_protocol's Issues

ORP 2.3.3 with docker doesn't start

I launch the following command:
/home/orp/Oyster_River_Protocol/oyster.mk main STRAND=RF TPM_FILT=1 MEM=400 CPU=64 READ1=concatenated_R1.normalized.trimmed.fq R2=concatenated_R2.normalized.trimmed.fq RUNOUT=assembly --debug

It doesnt start and with --debug option, it gives me the following output:
File 'main' does not exist.
File 'setup' does not exist.
File '/ORP_wd/assemblies/working' does not exist.
Must remake target '/ORP_wd/assemblies/working'.
Successfully remade target file '/ORP_wd/assemblies/working'.
Must remake target 'setup'.
Successfully remade target file 'setup'.
File 'check' does not exist.
Must remake target 'check'.
Successfully remade target file 'check'.
File 'welcome' does not exist.
Must remake target 'welcome'.

What's the solution to this issue?

Several Samples for Assembly

Hi Mathew,

I would like to be able to provide several PE reads to make the assembly but the "," between samples its not working. I would like to be able to do this is because I have several sets of RNAseq data from different tissues. Is this possible?

Thank you very much for this.

Cheers,
Hector

Error during installation Python >=3.5

Downloading/unpacking scipy
  Downloading scipy-1.3.1.tar.gz (23.6MB): 23.6MB downloaded
  Running setup.py (path:/tmp/pip_build_rafael/scipy/setup.py) egg_info for package scipy
    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "/tmp/pip_build_rafael/scipy/setup.py", line 31, in <module>
        raise RuntimeError("Python version >= 3.5 required.")
    RuntimeError: Python version >= 3.5 required.
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "/tmp/pip_build_rafael/scipy/setup.py", line 31, in <module>

    raise RuntimeError("Python version >= 3.5 required.")

RuntimeError: Python version >= 3.5 required.

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_rafael/scipy
Storing debug log for failure in /home/rafael/.pip/pip.log

Installation is redirecting to Python 2.7 from system and not using Python just inside conda.

Pipeline crashed at transabyss

Here is the CMD output:

CHECKPOINT: Unitig assembly completed.
CMD: bash -euo pipefail -c 'abyss-pe graph=adj --directory=/home/orp/Oyster_River_Protocol/DATA/assemblies/SRR5330501.transabyss k=32 name=SRR5330501.transabyss.fasta j=20 in="/home/orp/Oyster_River_Protocol/DATA/rcorr/SRR5330501.TRIM_1P.cor.fq /home/orp/Oyster_River_Protocol/DATA/rcorr/SRR5330501.TRIM_2P.cor.fq" l=32 s=32 n=2 SIMPLEGRAPH_OPTIONS="--no-scaffold" OVERLAP_OPTIONS="--no-scaffold" MERGEPATH_OPTIONS="--greedy" SRR5330501.transabyss.fasta-6.fa'
The minimum coverage of single-end contigs is 2.
The minimum coverage of merged contigs is 2.
warning: the seed-length should be at least twice k: k=32, s=32
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
Mateless   52894022  100%
Unaligned         0
Singleton         0
FR                0
RF                0
FF                0
Different         0
Total      52894022
abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.
error: `SRR5330501.transabyss.fasta-3.hist': No such file or directory

Everything until then went fine! Do you know what the problem is?

The reads ID is this:

@MG00HS05:491:C7450ACXX:4:1101:1240:2223_forward/1
and
@MG00HS05:491:C7450ACXX:4:1101:1240:2223_reverse/2

These files were produced from fastq-dump. It seems that the problem is the naming, should be identical. So I should remove the forward and reverse part. Do you have a simple way to do that?

Can i restart the pipeline from the checkpoint above? Or do I have to run it from the start?

Issue with SPADES2/Kmer length

Hello! I absolutely love the ORP and have run it successfully with a lot of my reads. However, I'm running into quite the head scratcher for reads that are 75 bp long or less. The program documentation says you can specify your SPADES2_KMER=INT length, which I have done in the following lines:

MAKEDIR := $(dir $(firstword $(MAKEFILE_LIST)))
DIR := ${CURDIR}
CPU=16
BUSCO_THREADS=${CPU}
MEM=110
TRINITY_KMER=25
SPADES1_KMER=55
SPADES2_KMER=35
TRANSABYSS_KMER=32
RCORR := ${shell which rcorrector}

My reads are exactly 75 bp long, but changing the SPADES2_KMER flag does not resolve the issue. I still receive the following error:

IT LOOKS LIKE YOUR READS ARE NOT AT LEAST 75 BP LONG,
PLEASE EDIT YOUR COMMAND USING THE SPADES2_KMER=INT FLAGS,
SETTING THE ASSEMBLY KMER LENGTH LESS THAN YOUR READ LENGTH

/bin/bash: line 8: shell: command not found

I found a discussion about this on the github from 2019: #17

Was this ever resolved/addressed? How can I get ORP to run for reads that are 75bp or less? Hoping to hear back!

Protocol textual error

In the protocol the Salmon index is named salmon.idx, as in:

~/salmon-0.5.1/bin/salmon index -t Rcorr_trinity.Trinity.fasta -i salmon.idx --type quasi -k 31

The quantification step is using an index called transcripts2_index, as in:

~/salmon-0.5.1/bin/salmon quant -p 32 -i transcripts2_index -l MSR -1 file_1.cor.fastq -2 file_2.cor.fastq -o salmon_orig

I assume this needs to be changed to salmon.idx.

Unmatched reads in paired-end files

Hello once again,

I trimmed and filtered some bad quality reads and run Oyster River Protocol, getting this:

[ INFO] 2019-02-24 17:44:47 : Loading assembly: /root/Downloads/Amostra01/orthofuse/Amostra01/merged.fasta
[ INFO] 2019-02-24 17:44:49 : Analysing assembly: /root/Downloads/Amostra01/orthofuse/Amostra01/merged.fasta
[ INFO] 2019-02-24 17:44:49 : Results will be saved in /root/Downloads/Amostra01/orthofuse/Amostra01/merged/merged
[ INFO] 2019-02-24 17:44:49 : Calculating contig metrics...
[ INFO] 2019-02-24 17:44:51 : Contig metrics:
[ INFO] 2019-02-24 17:44:51 : -----------------------------------
[ INFO] 2019-02-24 17:44:51 : n seqs                        26816
[ INFO] 2019-02-24 17:44:51 : smallest                        201
[ INFO] 2019-02-24 17:44:51 : largest                        9134
[ INFO] 2019-02-24 17:44:51 : n bases                    11258629
[ INFO] 2019-02-24 17:44:51 : mean len                     419.85
[ INFO] 2019-02-24 17:44:51 : n under 200                       0
[ INFO] 2019-02-24 17:44:51 : n over 1k                      1390
[ INFO] 2019-02-24 17:44:51 : n over 10k                        0
[ INFO] 2019-02-24 17:44:51 : n with orf                     4528
[ INFO] 2019-02-24 17:44:51 : mean orf percent              77.19
[ INFO] 2019-02-24 17:44:51 : n90                             232
[ INFO] 2019-02-24 17:44:51 : n70                             308
[ INFO] 2019-02-24 17:44:51 : n50                             431
[ INFO] 2019-02-24 17:44:51 : n30                             701
[ INFO] 2019-02-24 17:44:51 : n10                            1793
[ INFO] 2019-02-24 17:44:51 : gc                             0.39
[ INFO] 2019-02-24 17:44:51 : bases n                        2243
[ INFO] 2019-02-24 17:44:51 : proportion n                    0.0
[ INFO] 2019-02-24 17:44:51 : Contig metrics done in 2 seconds
[ INFO] 2019-02-24 17:44:51 : Calculating read diagnostics...
[ERROR] 2019-02-24 17:44:55 : Snap found unmatched read IDs in input fastq files
[ERROR]
Left files contained read id
'M01506:25:000000000-AF8M6:1:1101:14555:1661'
and right files contained read id
'M01506:25:000000000-AF8M6:1:1101:14176:1759'.  
at the same position in the file

/root/ORP/oyster.mk:226: recipe for target '/root/Downloads/Amostra01/orthofuse/Amostra01/orthotransrate.done' failed
make: *** [/root/Downloads/Amostra01/orthofuse/Amostra01/orthotransrate.done] Error 1
^C

Is there a way to run Oyster River Protocol with paired-end files and considerating unmatched paired reads (example (input files): forward_reads.fq + reverse_reads.fq + unmatched_reads.fq?

Thank you

awk: program limit exceeded: maximum number of fields size=32767

Problem in this script:

${DIR}/assemblies/working/${RUNOUT}.unique.ORP.done:${DIR}/assemblies/${RUNOUT}.ORP.diamond.txt
	awk '{print $$2}' ${DIR}/assemblies/${RUNOUT}.ORP.diamond.txt | awk -F "|" '{print $$3}' | cut -d _ -f2 | sort | uniq | wc -l > ${DIR}/assemblies/working/${RUNOUT}.unique.ORP.txt
	touch ${DIR}/assemblies/working/${RUNOUT}.unique.ORP.done

awk: program limit exceeded: maximum number of fields size=32767
	FILENAME="-" FNR=10081 NR=10081

Make error

Hi, @macmanes . I installed ORP and tried to build it by "make" command.
However, errors occured. Please check the following information. I would like you to tell me where is wrong.
We are sorry for any inconvenience this may cause you, but we appreciate your cooperation.

(orp_v2) [iceplant4561@at138 Oyster_River_Protocol]$ make
/bin/bash: /lustre7/home/iceplant4561/Oyster_River_Protocol/software/anaconda/install/bin/conda: No such file or directory
(
source /lustre7/home/iceplant4561/Oyster_River_Protocol/software/anaconda/install/etc/profile.d/conda.sh;
conda activate;
conda update -y -n base conda;
conda config --add channels conda-forge;
conda config --add channels bioconda;
conda install mamba -n base -yc conda-forge;
mamba create -yc bioconda --name orp_spades spades=3.15.2;
mamba create -yc bioconda --name orp_trinity trinity=2.9.1 bwa=0.7.17 bashplotlib seqtk=1.3;
mamba create -yc bioconda --name orp_busco busco=5.1.2;
mamba create -yc bioconda --name orp_transabyss transabyss=2.0.1;
mamba create -yc bioconda --name orp_rcorrector rcorrector=1.0.4;
mamba create -yc bioconda --name orp_trimmomatic trimmomatic=0.39;
mamba create -yc bioconda --name orp_sam samtools=1.12 bwa=0.7.17 seqtk=1.3;
mamba create -yc bioconda --name orp_salmon salmon=1.4.0;
mamba create -yc bioconda --name orp_cdhit cd-hit=4.6.8;
mamba create -yc bioconda --name orp_diamond diamond=2.0.8;
mamba env create -f /lustre7/home/iceplant4561/Oyster_River_Protocol/orp_env.yml python=3.8;
mamba clean -ya;
conda deactivate;
)
/bin/bash: line 1: /lustre7/home/iceplant4561/Oyster_River_Protocol/software/anaconda/install/etc/profile.d/conda.sh: No such file or directory

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

$ conda init <SHELL_NAME>

Currently supported shells are:

bash
fish
tcsh
xonsh
zsh
powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

Collecting package metadata (current_repodata.json): done
Solving environment: done

Package Plan

environment location: /opt/pkg/intel/oneapi/intelpython/latest

added / updated specs:
- conda

The following NEW packages will be INSTALLED:

_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
brotlipy conda-forge/linux-64::brotlipy-0.7.0-py37h5e8e339_1003
charset-normalizer conda-forge/noarch::charset-normalizer-2.0.10-pyhd8ed1ab_0
colorama conda-forge/noarch::colorama-0.4.4-pyh9f0ad1d_0
ld_impl_linux-64 conda-forge/linux-64::ld_impl_linux-64-2.36.1-hea4e1c9_2
libblas conda-forge/linux-64::libblas-3.9.0-12_linux64_openblas
libcblas conda-forge/linux-64::libcblas-3.9.0-12_linux64_openblas
libgfortran-ng conda-forge/linux-64::libgfortran-ng-11.2.0-h69a702a_11
libgfortran5 conda-forge/linux-64::libgfortran5-11.2.0-h5c6108e_11
libgomp conda-forge/linux-64::libgomp-11.2.0-h1d223b6_11
liblapack conda-forge/linux-64::liblapack-3.9.0-12_linux64_openblas
libnsl conda-forge/linux-64::libnsl-2.0.0-h7f98852_0
libopenblas conda-forge/linux-64::libopenblas-0.3.18-pthreads_h8fe5266_0
libzlib conda-forge/linux-64::libzlib-1.2.11-h36c2ea0_1013
python_abi conda-forge/linux-64::python_abi-3.7-2_cp37m
readline conda-forge/linux-64::readline-8.1-h46c0cb4_0

The following packages will be REMOVED:

arrow-cpp-4.0.1-py37h3fd0f77_4
asn1crypto-1.4.0-py37hcd5400e_2
aws-c-common-0.6.8-h14c3975_1
aws-c-event-stream-0.1.6-h9574fa7_1
aws-checksums-0.1.11-h657cfb5_2
aws-sdk-cpp-1.8.185-hadc6d9a_1
brotli-1.0.9-hf484d3e_2
bzip2-1.0.8-hb9a14ef_8
c-ares-1.17.1-hff3d592_1
chardet-4.0.0-py37h7a55b9c_0
cycler-0.10.0-py37h77b6139_8
cython-0.29.24-py37had034fe_0
double-conversion-3.1.5-h79a9dd4_0
dpctl-0.9.0-py37h75156f8_0
dpnp-0.7.1-py37h75156f8_35
freetype-2.10.4-h8e2d9d6_0
funcsigs-1.0.2-py37hef9829d_8
future-0.18.2-py37hfabd62d_0
gflags-2.2.2-h79a9dd4_0
glog-0.4.0-hf484d3e_1
grpc-cpp-1.26.0-hdb13a8a_2
intelpython-2021.4.0-0
kiwisolver-1.3.1-py37h9065164_0
libarchive-3.4.2-ha657d38_7
libcurl-7.78.0-h471713a_2
libedit-3.1.20210714-h8f6a7d7_0
libevent-2.1.10-h5969dba_0
libllvm11-11.0.0-ha451071_0
libpng-1.6.37-7
libprotobuf-3.14.0-h28fbd06_0
libssh2-1.9.0-h4b1ad09_1
libthrift-0.14.1-h36a55ba_0
libxml2-2.9.12-h6441c91_0
llvm-11.0.0-hbdd1d9c_0
llvm-spirv-11.0.0-h4616538_0
llvmlite-0.37.0-py37h188ca7a_0
lz4-c-1.9.3-h688b341_1
lzo-2.10-h309b0be_6
matplotlib-3.1.2-py37h8b3d3d0_10
mkl-service-2.4.0-py37h5c78031_1
mkl_fft-1.3.1-py37hbd9d9a5_0
mkl_random-1.2.2-py37hd708486_1
mkl_umath-0.1.1-py37h244f2d4_9
mpi4py-3.0.3-py37hf484d3e_9
numba-0.54.0-py37h9556e7c_1
numba-dppy-0.15.0-py37h62346ee_0
numexpr-2.7.3-py37hfdc8c6c_1
numpy-base-1.20.3-py37hf707ed8_2
orc-1.6.6-h94e0d16_0
packaging-21.0-py37h1546d3d_3
pandas-1.2.0-py37hef31ef9_2
pyarrow-4.0.1-py37h497cf44_1
pyeditline-2.0.1-py37_1
pyparsing-2.4.7-py37h0618fa2_2
python-dateutil-2.8.2-py37_0
python-libarchive-c-2.8-py37h0d5a7d4_14
pytz-2021.1-py37h854a350_0
pyyaml-5.3.1-py37hae09b62_2
re2-2021.04.01-hf7f620c_0
scikit-ipp-1.2.0-py37he7a3b9b_6
sdc-0.40.0-py37h6eb09a8_0
smp-0.1.4-py37hab8cbcd_1
snappy-1.1.8-hf484d3e_4
spirv-tools-2020.5-h6bb024c_1
sys_check-2021.4-0
tcl-8.6.10-1
thrift-0.14.1-py37h84f2617_0
thrift-compiler-0.14.1-h36a55ba_0
thrift-cpp-0.14.1-0
uriparser-0.9.3-hf484d3e_1
utf8proc-2.6.1-h918b7a9_0
xgboost-1.4.2-1589_g3f5c8bpy37_3
zstd-1.4.5-h3f200d0_0

The following packages will be UPDATED:

ca-certificates conda_channel::ca-certificates-2020.1~ --> conda-forge::ca-certificates-2021.10.8-ha878542_0
certifi conda_channel::certifi-2020.12.5-py37~ --> conda-forge::certifi-2021.10.8-py37h89c1867_1
cffi conda_channel::cffi-1.14.5-py37h30406~ --> conda-forge::cffi-1.15.0-py37h036bc23_0
conda conda_channel::conda-4.10.3-py37hb2eb~ --> conda-forge::conda-4.11.0-py37h89c1867_0
conda-package-han~ conda_channel::conda-package-handling~ --> conda-forge::conda-package-handling-1.7.3-py37h5e8e339_1
cryptography conda_channel::cryptography-3.3.2-py3~ --> pkgs/main::cryptography-36.0.0-py37h9ce1e76_0
joblib conda_channel/linux-64::joblib-1.0.1-~ --> conda-forge/noarch::joblib-1.1.0-pyhd8ed1ab_0
libffi conda_channel::libffi-3.3-13 --> conda-forge::libffi-3.4.2-h7f98852_5
libgcc-ng conda_channel::libgcc-ng-9.3.0-hdf63c~ --> conda-forge::libgcc-ng-11.2.0-h1d223b6_11
libstdcxx-ng conda_channel::libstdcxx-ng-9.3.0-hdf~ --> conda-forge::libstdcxx-ng-11.2.0-he4da1e4_11
ncurses conda_channel::ncurses-6.2-hf61fa16_1 --> conda-forge::ncurses-6.2-h58526e2_4
numpy conda_channel::numpy-1.20.3-py37h2742~ --> conda-forge::numpy-1.21.5-py37hf2998dd_0
openssl conda_channel::openssl-1.1.1k-h14c397~ --> conda-forge::openssl-3.0.0-h7f98852_2
pip conda_channel/linux-64::pip-21.1.1-py~ --> conda-forge/noarch::pip-21.3.1-pyhd8ed1ab_0
pycosat conda_channel::pycosat-0.6.3-py37hf49~ --> conda-forge::pycosat-0.6.3-py37h5e8e339_1009
pycparser conda_channel/linux-64::pycparser-2.2~ --> conda-forge/noarch::pycparser-2.21-pyhd8ed1ab_0
pyopenssl conda_channel/linux-64::pyopenssl-20.~ --> conda-forge/noarch::pyopenssl-21.0.0-pyhd8ed1ab_0
pysocks conda_channel::pysocks-1.7.0-py37h0d5~ --> conda-forge::pysocks-1.7.1-py37h89c1867_4
python conda_channel::python-3.7.11-h5762be8~ --> conda-forge::python-3.7.12-hf930737_100_cpython
requests conda_channel/linux-64::requests-2.25~ --> conda-forge/noarch::requests-2.27.1-pyhd8ed1ab_0
scikit-learn conda_channel::scikit-learn-0.24.2-py~ --> conda-forge::scikit-learn-1.0.2-py37hf9e9bfc_0
scipy conda_channel::scipy-1.6.2-py37h3fee7~ --> conda-forge::scipy-1.7.3-py37hf2a6cf1_0
setuptools conda_channel::setuptools-52.0.0-py37~ --> conda-forge::setuptools-60.5.0-py37h89c1867_0
sqlite conda_channel::sqlite-3.36.0-hb9a14ef~ --> conda-forge::sqlite-3.37.0-h9cd32fc_0
threadpoolctl conda_channel/linux-64::threadpoolctl~ --> conda-forge/noarch::threadpoolctl-3.0.0-pyh8a188c0_0
tk conda_channel::tk-8.6.10-h8e2d9d6_3 --> conda-forge::tk-8.6.11-h27826a3_1
tqdm conda_channel/linux-64::tqdm-4.60.0-p~ --> conda-forge/noarch::tqdm-4.62.3-pyhd8ed1ab_0
urllib3 conda_channel/linux-64::urllib3-1.26.~ --> conda-forge/noarch::urllib3-1.26.8-pyhd8ed1ab_1
wheel conda_channel/linux-64::wheel-0.36.2-~ --> conda-forge/noarch::wheel-0.37.1-pyhd8ed1ab_0
yaml conda_channel::yaml-0.1.7-7 --> conda-forge::yaml-0.2.5-h7f98852_2

The following packages will be SUPERSEDED by a higher-priority channel:

idna conda_channel/linux-64::idna-3.1-py37~ --> conda-forge/noarch::idna-3.1-pyhd3deb0d_0
ruamel_yaml conda_channel::ruamel_yaml-0.15.99-py~ --> conda-forge::ruamel_yaml-0.15.80-py37h5e8e339_1006
six conda_channel/linux-64::six-1.16.0-py~ --> conda-forge/noarch::six-1.16.0-pyh6c4a22f_0
xz conda_channel::xz-5.2.5-h74280d8_2 --> conda-forge::xz-5.2.5-h516909a_1
zlib conda_channel::zlib-1.2.11.1-h1e99aa7~ --> conda-forge::zlib-1.2.11-h36c2ea0_1013

Preparing transaction: done
Verifying transaction: failed

EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
environment location: /opt/pkg/intel/oneapi/intelpython/latest
uid: 5585
gid: 11294

Warning: 'conda-forge' already in 'channels' list, moving to the top
Warning: 'bioconda' already in 'channels' list, moving to the top
Collecting package metadata (current_repodata.json): done
Solving environment: done

Package Plan

environment location: /opt/pkg/intel/oneapi/intelpython/latest

added / updated specs:
- mamba

The following NEW packages will be INSTALLED:

_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
libgomp conda-forge/linux-64::libgomp-11.2.0-h1d223b6_11
libiconv conda-forge/linux-64::libiconv-1.16-h516909a_0
libsolv conda-forge/linux-64::libsolv-0.7.19-h780b84a_5
mamba conda-forge/linux-64::mamba-0.15.3-py37h7f483ca_0
python_abi conda-forge/linux-64::python_abi-3.7-2_cp37m
reproc conda-forge/linux-64::reproc-14.2.3-h7f98852_0
reproc-cpp conda-forge/linux-64::reproc-cpp-14.2.3-h9c3ff4c_0

The following packages will be UPDATED:

Preparing transaction: done
Verifying transaction: failed

EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
environment location: /opt/pkg/intel/oneapi/intelpython/latest
uid: 5585
gid: 11294

/bin/bash: line 7: mamba: command not found
/bin/bash: line 8: mamba: command not found
/bin/bash: line 9: mamba: command not found
/bin/bash: line 10: mamba: command not found
/bin/bash: line 11: mamba: command not found
/bin/bash: line 12: mamba: command not found
/bin/bash: line 13: mamba: command not found
/bin/bash: line 14: mamba: command not found
/bin/bash: line 15: mamba: command not found
/bin/bash: line 16: mamba: command not found
/bin/bash: line 17: mamba: command not found
/bin/bash: line 18: mamba: command not found

CommandNotFoundError: Your shell has not been properly configured to use 'conda deactivate'.
To initialize your shell, run

$ conda init <SHELL_NAME>

Currently supported shells are:

bash
fish
tcsh
xonsh
zsh
powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

make: *** [Makefile:50: orp] Error 1

Orthofinder failure when running test dataset

Hi,
I've tried running ORP through the Conda installation method proposed here, as well as an alternative installation where I manually installed each piece separately following the same steps offered in this repo's Makefile. In each case I get the same error such that the program completes all steps through the spades and transabyss assemblies, but fails once OrthoFinder gets rolling. Then I get this error message:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Assembly generated with Trans-ABySS 2.0.1 :)
Final assembly: /scratch/dro49/myluwork/rnaome/orpwork/assemblies/test.transabyss/test.transabyss.fasta-final.fa
Total wallclock run time: 0 h 0 m 9 s

OrthoFinder version 2.3.9 Copyright (C) 2014 David Emms

2020-03-18 11:27:24 : Starting OrthoFinder
8 thread(s) for highly parallel tasks (BLAST searches etc.)
8 thread(s) for OrthoFinder algorithm

Checking required programs are installed
----------------------------------------
Test can run "mcl -h" - failed

stdout:

stderr:
b"mcl: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by mcl)\n"
ERROR: Cannot run MCL with the command "mcl -h"
Please check MCL is installed and in the system path

ERROR: An error occurred, please review error messages for more information.
make: *** [/scratch/dro49/conda/Oyster_River_Protocol/oyster.mk:267: /scratch/dro49/myluwork/rnaome/orpwork/orthofuse/test/orthofuser.done] Error 1

What's confusing is that MCL is indeed accessible:

$ which mcl

/scratch/dro49/conda/envs/orp/bin/mcl

and MCL can indeed access the help menu (thus the ERROR: Cannot run MCL with the command "mcl -h" is very confusing!

$ mcl -h

________ mcl verbosity modes
--show ....... print MCL iterands (small graphs only)
-v all ....... turn on all -v options
________ on multi-processor systems
-te <i> ...... number of threads to use                                  [0]

further, I know I've loaded the missing glibc module...

$module list

Currently Loaded Modulefiles:
  1) git/2.16.3          2) anaconda3/2019.10   3) binutils/2.28       4) glibc/2.14

Any ideas what could be breaking here?
Thanks!

Computation of the unique genes per assembly provided by the quality report

Hi Mathew, this is more of a question regarding the quality report but I thought I would post it here in case it is of interest to others.
The quality report produces the following:

***** UNIQUE GENES ORP ~~~~~~~~~~~~~~~~~> 10278
***** UNIQUE GENES TRINITY ~~~~~~~~~~~~~> 9393
***** UNIQUE GENES SPADES55 ~~~~~~~~~~~~> 9058
***** UNIQUE GENES SPADES75 ~~~~~~~~~~~~> 8617
***** UNIQUE GENES TRANSABYSS ~~~~~~~~~~> 10617

From reading the manuscript and the documentation, it is not clear to me what the number of unique genes represent and how it is computed. Transrate does provide the number of unigenes (I believe used interchangeably with contigs) so I first thought it represented the number of unique contigs but the final assembly I get from ORP contains a little over 218,000 unique sequences. As such, I imagine the unique genes presented in the report are contigs that could be assigned to a certain database? Clarity on how this output is generated would be greatly appreciated.
Many thanks,
Olivier

fix this to work on bridges

Oyster_River_Protocol/oyster.mk

Line 33 in 2cfed4f

trimmomaticpath := $(shell which trimmomatic 2>/dev/null)

fails at Busco

Test run almost completes, but fails in BUSCO with:

WARNING The dataset you provided does not contain the file dataset.cfg, likely because it is an old version. Default species (fly, eukaryota) will be used as augustus species
ERROR Impossible to read /home/ubuntu/Oyster_River_Protocol/busco_dbs/eukaryota_odb9/
mv: cannot stat ‘run_test*’: No such file or directory
make: *** [/home/jpwares/Oyster_River_Protocol/sampleclone/reports/test.busco.done] Error 1

Any ideas? I did re-download eukaryota_odb9 and re-extract.

assembly_procedure

i have RNA seq datasets of samples containing single ended and paired ended reads. so what would i have to do in script so that all the dataset should assemble together.

Error trinity partition

I've been encountering this error below:

succeeded(2217)   2.81867% completed.    
succeeded(2218)   2.81995% completed.    
succeeded(2219)   2.82122% completed.    

Error encountered::  <!----
CMD: /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/salmon_runner.pl /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/single.fa 2>tmp.27300.1553597564.stderr

Errmsg:
CMD: salmon --no-version-check index -t /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta -i /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta.salmon.idx --type quasi -k 25 -p 1
index ["/home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta.salmon.idx"] did not previously exist  . . . creating it
[2019-03-26 07:52:44.707] [jLog] [info] building index
[2019-03-26 07:52:44.707] [jointLog] [info] [Step 1 of 4] : counting k-mers
Elapsed time: 0.000644187s

[2019-03-26 07:52:44.708] [jointLog] [info] Replaced 0 non-ATCG nucleotides
[2019-03-26 07:52:44.708] [jointLog] [info] Clipped poly-A tails from 0 transcripts
[2019-03-26 07:52:44.708] [jointLog] [info] Building rank-select dictionary and saving to disk
[2019-03-26 07:52:44.708] [jointLog] [info] done
Elapsed time: 1.0819e-05s
[2019-03-26 07:52:44.708] [jointLog] [info] Writing sequence data to file . . . 
[2019-03-26 07:52:44.708] [jointLog] [info] done
Elapsed time: 1.3754e-05s
[2019-03-26 07:52:44.708] [jointLog] [info] Building 32-bit suffix array (length of generalized text is 443)
[2019-03-26 07:52:44.708] [jointLog] [info] Building suffix array . . . 
success
saving to disk . . . done
Elapsed time: 1.823e-05s
done
Elapsed time: 0.000446444s


processed 0 positions[2019-03-26 07:52:44.709] [jointLog] [info] khash had 393 keys
[2019-03-26 07:52:44.709] [jointLog] [info] saving hash to disk . . . 
[2019-03-26 07:52:44.709] [jointLog] [info] done
Elapsed time: 6.3439e-05s
[2019-03-26 07:52:44.709] [jLog] [info] done building index
CMD: salmon --no-version-check quant -i /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta.salmon.idx -l U -r /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/single.fa -o salmon_outdir -p 1 --minAssignedFrags 1 
### salmon (mapping-based) v0.12.0
### [ program ] => salmon 
### [ command ] => quant 
### [ index ] => { /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta.salmon.idx }
### [ libType ] => { U }
### [ unmatedReads ] => { /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/single.fa }
### [ output ] => { salmon_outdir }
### [ threads ] => { 1 }
### [ minAssignedFrags ] => { 1 }
Logs will be written to salmon_outdir/logs
[2019-03-26 07:52:44.734] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2019-03-26 07:52:44.734] [jointLog] [warning] 

NOTE: It appears you are running salmon without the `--validateMappings` option.
Mapping validation can generally improve both the sensitivity and specificity of mapping,
with only a moderate increase in use of computational resources. 
Unless there is a specific reason to do this (e.g. testing on clean simulated data),
`--validateMappings` is generally recommended.

[2019-03-26 07:52:44.734] [jointLog] [info] parsing read library format
[2019-03-26 07:52:44.734] [jointLog] [info] There is 1 library.
[2019-03-26 07:52:44.771] [jointLog] [info] Loading Quasi index
[2019-03-26 07:52:44.771] [jointLog] [info] Loading 32-bit quasi index
[2019-03-26 07:52:44.771] [jointLog] [info] done
[2019-03-26 07:52:44.771] [jointLog] [info] Index contained 2 targets
[2019-03-26 07:52:44.771] [stderrLog] [info] Loading Suffix Array 
[2019-03-26 07:52:44.771] [stderrLog] [info] Loading Transcript Info 
[2019-03-26 07:52:44.771] [stderrLog] [info] Loading Rank-Select Bit Array
[2019-03-26 07:52:44.771] [stderrLog] [info] There were 2 set bits in the bit array
[2019-03-26 07:52:44.771] [stderrLog] [info] Computing transcript lengths
[2019-03-26 07:52:44.771] [stderrLog] [info] Waiting to finish loading hash
[2019-03-26 07:52:44.771] [stderrLog] [info] Done loading index




Error, cmd:
salmon --no-version-check quant -i /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta.salmon.idx -l U -r /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/single.fa -o salmon_outdir -p 1 --minAssignedFrags 1 
 died with ret (256) at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../PerlLib/Process_cmd.pm line 19.
	Process_cmd::process_cmd("salmon --no-version-check quant -i /home/rafael/..."...) called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/salmon_runner.pl line 26

--->

-salmon error reported: Error, cmd: /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/salmon_runner.pl /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/single.fa 2>tmp.27300.1553597564.stderr died with ret 256  at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/PerlLib/Pipeliner.pm line 186.
	Pipeliner::run(Pipeliner=HASH(0x564e71a255d8)) called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../Trinity line 1778
	eval {...} called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../Trinity line 1777
	main::run_Trinity() called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../Trinity line 1345
	eval {...} called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../Trinity line 1344
WARNING - salmon failure mode not recognized by Trinity:
Error, cmd: /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/salmon_runner.pl /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/Trinity.tmp.fasta /home/rafael/.../assemblies/Rmontenegrensis.trinity/read_partitions/Fb_0/CBin_616/c61724.trinity.reads.fa.out/single.fa 2>tmp.27300.1553597564.stderr died with ret 256  at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/PerlLib/Pipeliner.pm line 186.
	Pipeliner::run(Pipeliner=HASH(0x564e71a255d8)) called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../Trinity line 1778
	eval {...} called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../Trinity line 1777
	main::run_Trinity() called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../Trinity line 1345
	eval {...} called at /home/rafael/programas/ORP/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/util/support_scripts/../../Trinity line 1344

 - retaining Trinity transcripts provided as input to salmon, w/o filtering (pre-salmon mode).

succeeded(2220)   2.82249% completed.    
succeeded(2221)   2.82376% completed.    
succeeded(2222)   2.82503% completed.

This error repeats with more following "succeeded" steps. Why Trinity is not partitioning correctly? Paired-ends are 8gb of data, too many reads, compacted in .gz file. Is this a problem?

Verifying transaction: failed

When "make":

Preparing transaction: done
Verifying transaction: failed

PaddingError: Placeholder of length '80' too short in package /root/Downloads/Oyster_River_Protocol-2.1.0/software/anaconda/install/envs/orp_v2/bin/Rscript.
The package must be rebuilt with conda-build > 2.0.

Makefile:44: recipe for target 'orp_v2' failed
make: *** [orp_v2] Error 1

Empty file at Foo.list7

I'm repeatedly running into an error with ORP2.3.1

make: *** [/nobackup/rogers_research/HybridRNASeq/DyakubaReference_carcass_M_ORP2.3.1/assemblies/diamond/DyakubaReference_carcass_M_ORP2.3.1Out.list7] Error 1
make: *** Deleting file `/nobackup/rogers_research/HybridRNASeq/DyakubaReference_carcass_M_ORP2.3.1/assemblies/diamond/DyakubaReference_carcass_M_ORP2.3.1Out.list7'

This happens most often at list7 but occasionally with other list files. It appears to occur only when the list file is empty and no items match the search criteria. I can coerce the program to run by running the grep command from the make file on the command line.

grep -xFvwf ${DIR}/assemblies/diamond/${RUNOUT}.list6 ${DIR}/assemblies/diamond/${RUNOUT}.list5 > ${DIR}/assemblies/diamond/${RUNOUT}.list7

This creates an empty file with the same file name. Then I rerun ORP and it will run to completion.

This has happened with multiple data sets for very different species. I don't know if there might be factors from our HPC settings that cause it to fail to create the file or if it's a product of the code itself. It isn't holding back my work at the moment, but I thought I should report it.

ORP | Can handle single-end files

Hello everyone!

Can ORP handle single-end files?

Thank you in advance ! 👍

make error -- software/anaconda/install/bin/conda: No such file or directory

this prints out after the make:

/bin/bash: /home/mmeel/Bioinfo_software/Oyster_River_Protocol/software/anaconda/install/bin/conda: No such file or directory
/bin/bash: /home/mmeel/Bioinfo_software/Oyster_River_Protocol/software/anaconda/install/bin/conda: No such file or directory
cd /home/mmeel/Bioinfo_software/Oyster_River_Protocol/software/anaconda && curl -LO https://repo.anaconda.com/archive/Anaconda3-5.1.0-Linux-x86_64.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (56) Received HTTP code 404 from proxy after CONNECT
Makefile:36: recipe for target 'conda' failed
make: *** [conda] Error 56

I have miniconda installed; is it possible to use it instead? Thanks!

TransRate-ORP: Which modifications were made?

Hi,

sadly, TransRate is abandonware now. The packaged version of the ORP however is so handy that I rely heavily on it (Thank you for packaging TransRate!). It runs reliably and without any issues. I have one question though: in your publication you mention, that TransRate-ORP has been modified. What kinds of modifications have been made?
Thank you,

Lukas

BUSCO datasets

Hello, do I have to choose which datasets to include, or could I use them all? I am running an analysis on Chelonia Mydas.

The lineage is

Lineage( full )
cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Sauropsida; Sauria; Archelosauria; Testudines; Cryptodira; Durocryptodira; Americhelydia; Chelonioidea; Cheloniidae; Caretta

Should I use Tetrapoda dataset? Should I use Tetrapoda AND eukaryota? Or should I use more?

split-paired-reads.py version

I was just wondering which version of khmer the split-paired-reads.py derives from in this protocol, given when I try the most recent khmer from pip I keep running into issues with unrecognized paired reads in the corrected, interleaved fastq files (either when using rcorrector or bfc).

java memory issue

I am trying to assemble a transciptome on our cluster and am getting errors related to java memory:

java -Xmx10G -Xms1G -Xss1G -XX:ParallelGCThreads=2 -jar /usr/loc
al/apps/gb/ORP/Oyster_River_Protocol/software/anaconda/install/envs/orp_v2/opt/tr
inity-2.8.4/Butterfly/Butterfly.jar -N 100000 -L 200 -F 500 -C /lustre1/keb27269/
noto/assemblies/noto_1.trinity/read_partitions/Fb_0/CBin_90/c9021.trinity.reads.f
a.out/chrysalis/Component_bins/Cbin0/c8.graph --path_reinforcement_distance=25

I am running this job with 300GB mem and 16 threads. It looks like even though the job has access to 300GB of memory, Java is set to only use 10GB.

Any ideas on how to fix this?

Here are my output and error files:
orp_noto1.3.out.txt
orp_noto1.3.err.txt

Thanks!

Installation and Usage questions

Hi Mathew,

I'm finally ready to give it a go to your pipeline. I have a few questions to ask you first, some regarding the installation and some other to the usage.

Re installation:

Is there a reason behind using Anaconda instead of it's minimal sibling Miniconda?
The standard make command did not work because of a path length issue with some conda packages. I couldn't sort it out, because I'm not able to have a shorter path where to compile ORP in (I'm working on a cluster and my initial path is already quite long). Even though I'm pursuing already a different solution, would it be possible to shrink the path from software/anaconda/install to simply software/conda? It doesn't sound like much, but it would have made the difference to me.
I tried creating the docker instance on my own, but I've got the following error:

...
cd /home/orp/Oyster_River_Protocol/software/anaconda && bash Anaconda3-2019.10-Linux-x86_64.sh -b -p install/
PREFIX=/home/orp/Oyster_River_Protocol/software/anaconda/install
Unpacking payload ...
Collecting package metadata (current_repodata.json): ...working... done                                              
Solving environment: ...working... done
...
Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.
echo ". /home/orp/Oyster_River_Protocol/software/anaconda/install/etc/profile.d/conda.sh" >> ~/.bashrc;
source ~/.bashrc;
( \
			conda activate; \
			conda update -y -n base conda; \
			conda deactivate; \
 )
/bin/bash: line 1: conda: command not found
/bin/bash: line 2: conda: command not found
/bin/bash: line 3: conda: command not found
make: *** [orp] Error 127
Makefile:45: recipe for target 'orp' failed
The command '/bin/sh -c cd Oyster_River_Protocol && sudo make' returned a non-zero code: 2

I don't get why it's not working, the source command should be initialising conda, but it's not.

Again regarding creating the docker instance, I would like to make a few changes to it, but I'm not sure how. I would like to use the newer version of Trinity (2.9.0). Because it's not available through conda, I was looking into installing it through your Makefile file, like you are installing transabyss or OrthoFuser. However, I don't have experience in creating makefiles, so I'm not sure what should I be adding to it. Would you be able to help me with this?
When are you planning on releasing the next version? I've noticed you have already made some changes to conda and bumped Spades to version 3.14, which is another thing I was looking into it.

Re usage:

I would like to be able to provide to the pipeline some additional transcriptome assemblies to be incorporated into the assembly merging step (OrthoFuse). Is that possible? I seem to recall having read somewhere that OrthFuse was created in such a way it could take additional assemblies. If this is so, how can I do it? If ORP cannot be setup to run this way, would it be possible to run everything from OrthoFuser onwards providing the additional assemblies? Any other recommendation?

The reason why I would like to be able to do this is because I'm trying to build a comprehensive transcriptome for different fly species (independently), and I have several sets of RNAseq data from different tissues/states/individuals/generations/lines (depending on the specie). I also have a reference genome for some of them. So my idea based on some of your previous comments (on GitHub) would be to assemble each dataset individually, both de novo and reference guided, and then merge them as instructed in your pipeline (OrthoFurse, Transrate, Detonate, etc.). What do you think?

Thank you very much in advance for anything you could help me with.

Cheers,
Santiago

Some general questions

Hi Matthew,

I've seen your paper regarding ORP. I have a couple of questions that maybe you could answer:

have you compared the results of ORP by using the latest version of Trinity (2.8.4) instead of the older one (2.4.0) reported in the paper?
I've noticed you replaced Shannon for TransAbyss, is that an equivalent replacement? I mean, do they behave similarly? Or have you just done a replacement to keep the number of assemblies to 4?
is it possible to run the complete pipeline having independent jobs to run in parallel? If not, have you thought about writing the pipeline in NextFlow?

Thank you very much for your help.

Cheers,
Santiago

PySlice_Unpack error

oyster.mk main fails on the test data set with the following error. Quick google search reveals similar errors in conda, sometimes reported fixed by using newer Python.

======= SPAdes pipeline finished.

SPAdes log can be found here: /home/tc/sampledata/assemblies/test.spades_k55/spades.log

Thank you for using SPAdes!
Traceback (most recent call last):
File "/lab/pr/tc/ORP/software/transabyss/transabyss", line 18, in
from utilities.adj_utils import has_edges
File "/lab/pr/tc/ORP/software/transabyss/utilities/adj_utils.py", line 6, in
import igraph
File "/lab/pr/tc/ORP/software/anaconda/install/envs/orp_v2/lib/python3.6/site-packages/igraph/init.py", line 34, in
from igraph._igraph import *
ImportError: /lab/pr/tc/ORP/software/anaconda/install/envs/orp_v2/lib/python3.6/site-packages/igraph/_igraph.cpython-36m-x86_64-linux-gnu.so: undefine
d symbol: PySlice_Unpack
make: *** [/home/tc/sampledata/assemblies/test.transabyss.fasta] Error 1

sampledata test failed

Hi,

I recently installed ORP v2.3.1 via manually constructed conda environment (according to yml file) and Makefile.
I tried to run sampledata without TPM_FILT option, and finished as follows:

***** QUALITY REPORT FOR: test using the ORP version 2.3.1 ****
***** THE ASSEMBLY CAN BE FOUND HERE: /home/ryosuke/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta ****

*** BUSCO SCORE ~~~~~~~~> C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:255
* TRANSRATE SCORE > 0.3972
* TRANSRATE OPTIMAL SCORE > 0.54134
* UNIQUE GENES ORP ~> 16
* UNIQUE GENES TRINITY ~> 16
* UNIQUE GENES SPADES55 > 16
* UNIQUE GENES SPADES75 > 13
* UNIQUE GENES TRANSABYSS > 15
*** READS MAPPED AS PROPER PAIRS ~> 99.30%

However, I ran sampledata with TPM_FILT (=0.2 or 1) option, and got error as follows:

make: *** [/home/ryosuke/Oyster_River_Protocol/oyster.mk:369: /home/ryosuke/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta] Error 1

I checked the assemblies/working directory, and found that there is no test.donotremove.list file. I also checked test.blasted file. The test.blasted existed, but no output there.

Is this result normal?

Thank you in advance!

Add external assemblies to OrthFuse?

Hi Mathew,

I would like to be able to provide to the pipeline some additional transcriptome assemblies to be incorporated into the assembly merging step (OrthoFuse). Is that possible? I seem to recall having read somewhere that OrthFuse was created in such a way it could take additional assemblies. If this is so, how can I do it? If ORP cannot be setup to run this way, would it be possible to run everything from OrthoFuser onwards providing the additional assemblies? Any other recommendation?

The reason why I would like to be able to do this is because I'm trying to build a comprehensive transcriptome for a fly specie, and I have several sets of RNAseq data from different tissues/states/individuals/generations/lines. I also have a somewhat close reference genome I could use. So my idea based on some of your previous comments (on GitHub) would be to assemble each dataset individually, both de novo and reference guided, and then merge them as instructed in your pipeline (OrthoFurse, Transrate, Detonate, etc.). What do you think?

Thank you very much in advance for anything you could help me with.

Cheers,
Santiago

Reduce orthofuse intermediate file count

Is there some way to greatly reduce the number of files of the pattern $A/orthofuse/$B/.groups and $A/orthofuse/$B/.groups.orthout generated by oyster.mk?

I'm a support analyst at an HPC installation, and I have a user with an ORP job that has generated over half a million files in that one directory. The typical size of a file is just a few hundred bytes. This is absolutely brutal on our shared network filesystem, it has negative performance consequences not just for the user running ORP but for all the users of the filesystem.

Best practice in the HPC world is to expose the filesystem to a few large files rather than a lot of small ones. Is there some way that ORP can be invoked that will do this? If not, can we make it a feature request to find some way to minimize that file-count? Using a database (like SQLite) instead of the filesystem, say, or using incremental tar or dar to stash the *.group and *.group.orthout files?

Thanks,
Ross Dickson PhD, Support Analyst
Compute Canada / ACENET / Dalhousie University

Error 141 on installation test

Hello,

I have installed ORP according to the instructions but when running the 'Test the installation' section in the 'sampledata' folder I get this error:
Total time = 2.32095s
Reported 69 pairwise alignments, 69 HSPs.
15 queries aligned.
make: [/PROJECTS/Oyster_River_Protocol/sampledata/assemblies/diamond/test.list5] Error 141
make: Deleting file `/PROJECTS/Oyster_River_Protocol/sampledata/assemblies/diamond/test.list5'
Any thoughts on how to fix Error 141? Thanks, Matt

READS ARE NOT AT LEAST 75 BP LONG error

ORP fails on the test data set when I trim the reads to 75 bp, with the following error:

IT LOOKS LIKE YOUR READS ARE NOT AT LEAST 75 BP LONG,
 PLEASE EDIT YOUR COMMAND USING THE SPADES2_KMER=INT FLAGS,
 SETTING THE ASSEMBLY KMER LENGTH LESS THAN YOUR READ LENGTH 

/bin/bash: line 8: shell: command not found
make: *** [readcheck] Error 127

The culprit seems to be the following line from oyster.mk:

if [ $$(gzip -cd $${READ1} | head -n 400 | awk '{if(NR%4==2) {count++; bases += length} } END{print int(bases/count)}') -gt 75 ] && [ $$(gzip -cd $${READ2} | head -n 400 | awk '{if(NR%4==2) {count++; bases += length} } END{print int(bases/count)}') -gt 75 ];\

After some digging, it looks like rnaspades.py actually does need k-mer -k to be an odd number less than the read length. This requirement for an odd number should also be added to the error message, which should read "76 bp", not "75 bp". Furthermore, the spades developers evidently recommend -k of about half the read length (ablab/spades#215). There should probably be some discussion of this in the ORP documentation.

My workaround is to change that line to:

if [ $$(gzip -cd $${READ1} | head -n 400 | awk '{if(NR%4==2) {count++; bases += length} } END{print int(bases/count)}') -gt ${SPADES2_KMER} ] && [ $$(gzip -cd $${READ2} | head -n 400 | awk '{if(NR%4==2) {count++; bases += length} } END{print int(bases/count)}') -gt ${SPADES2_KMER} ];\

so that "75" is not hard-coded in there.

orthofuser.mk run failure

Hi @macmanes I'm trying to run orthofuser.mk on a set of transcriptomes (same species) that have been generated using the Oyster River Protocol (ORP). I'm running a docker container that has the most up-to-date version of ORP pulled from the master channel on an Ubuntu 18.04 server. The ORP generated transcriptomes were generated on the same container. Following renaming I tried to run the orthofuser.mk snakemake file, and the program runs for some time before failing. The error I receive isn't so revealing to me:

/home/orp/Oyster_River_Protocol/orthofuser.mk:68: recipe for target '/home/orp/assemblies/merge.orthomerged.fasta' failed
make: *** [/home/orp/assemblies/merge.orthomerged.fasta] Error 1

Please let me know if any additional info might be useful for determining why the script is failing.

Sample Data for test: Which is the correct result?

I run ORP in "sampledata" and got this:

 7|  #
 6| ##
 5| ##
 4| ##
 3| ###
 2| ###       #
 1| ###  ##   #
   -----------

------------------------
|       Summary        |
------------------------
|   observations: 20   |
| min value: -1.000000 |
|   mean : -0.987400   |
| max value: -0.935000 |
------------------------


*****  See the following link for interpretation *****
*****  https://oyster-river-protocol.readthedocs.io/en/latest/strandexamine.html *****



*****  QUALITY REPORT FOR: test using the ORP version 2.1.0 ****
*****  THE ASSEMBLY CAN BE FOUND HERE: /root/ORP/sampledata/assemblies/test.ORP.fasta ****

*****  BUSCO SCORE ~~~~~>               C:0.0%[S:0.0%,D:0.0%],F:0.3%,M:99.7%,n:303
*****  TRANSRATE SCORE ~~~~~>           0.37518
*****  TRANSRATE OPTIMAL SCORE ~~~~~>   0.56393
*****  UNIQUE GENES ORP ~~~~~>          39
*****  UNIQUE GENES TRINITY ~~~~~>      31
*****  UNIQUE GENES SPADES55 ~~~~~>     22
*****  UNIQUE GENES SPADES75 ~~~~~>     23
*****  UNIQUE GENES TRANSABYSS ~~~~~>   35


DeprecationWarning: 'source deactivate' is deprecated. Use 'conda deactivate'.

Which is the correct result for this ORP package example?

"Error 1" possibly related to .newbies.fasta

I'm getting the following error when running the command shown below.
make: *** [/lab/pr/tc/sandbox/orp_test/assemblies/diamond/test.newbies.fasta] Error 1

I've attached the fastq files if you want to try and reproduce it. Maybe there's something wrong with the fastq? In any case, it would be helpful if ORP could output a more informative error message.
fastq_files.zip

/lab/pr/tc/ORP/oyster.mk main \
STRAND=RF \
MEM=15 \
CPU=1 \
READ1=test.r1.fastq \
READ2=test.r2.fastq \
RUNOUT=test \
SPADES2_KMER=39 \
-d

The _salmon.XXXX.stderr file size balloons out of control

When running ORP 2.3.3, the _salmon.XXXX.stderr gets stuck in a seemingly endless loop reaching terabytes in size. Thus completely filling up my HPC partition allotted space.

This issue causing terabytes being used for certain CBins in the read_partitions directory, which should be a few megabytes, is that the Building BooPHF is caught in a seemingly endless loop, constantly posting the elapsed time end estimated finishing time, being printed to the _salmon.XXXX.stderr file.

I have attached a PDF overview of the issue and the "hacky" fix. This is something to look into. It stems from a Salmon dependency BBHash - https://github.com/rizkg/BBHash

Specifically, the BooPHF.h script - https://github.com/rizkg/BBHash/blob/master/BooPHF.h

Thank you for looking into it.

ORP-2.3.3 Ballooning issue..pdf

Run on singularity / without logging into container

Hi!
I would like to try ORP on our cluster, but we only have singularity available. I converted the Docker image, and it works to some extent. However using singularity shell orp.sif does not work quite like docker: I have both file systems available and it wants to load conda envs from the host, also $HOME points to the host /home. How would I start this in Singularity so it acts the same as in Docker? singularity exec orp.sif bash has the same problem.

And also to run it on the cluster I would have to start ORP non interactively from a shell script, how would that look?

macmanes-lab / oyster_river_protocol Goto Github PK

oyster_river_protocol's Introduction

Oyster River Protocol

oyster_river_protocol's People

Contributors

Stargazers

Watchers

Forkers

oyster_river_protocol's Issues

Package Plan

Package Plan

make: *** [/home/ryosuke/Oyster_River_Protocol/oyster.mk:369: /home/ryosuke/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta] Error 1

Recommend Projects

Recommend Topics

Recommend Org