Giter VIP home page Giter VIP logo

lapa's People

Contributors

fairliereese avatar muhammedhasan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

lapa's Issues

Refining transcript 3' and 5' ends with LAPA

Hi @MuhammedHasan

I see that you have 'lapa.correction.Transcript' how would I be able to use it.

To my understanding default LAPA deals with 'gene_id' how would I change the analysis so that it looks at the 'transcript_id' instead?

As always thank you
Mustafa

other PolyA motifs

Dear @MuhammedHasan

Great work on LAPA, planning to start using it very soon. I have a question regarding the poly A motifs, as per preprint you have mentioned that LAPA looks for the canonical AATAAA, is this the only motif it searches for before determining a polyA site usage? Have you considered adding other motifs such as

aataaa
attaaa
agtaaa
tataaa
cataaa
gataaa
aatata
aataca
aataga
aaaaag
actaaa
aagaaa
aatgaa
tttaaa
aaaaca
ggggct

I have done some preliminary analysis with SQANTI3 and the distribution change between cells type from AATAAA being used 70% in one to 46% in another. The second, most used is ATTAAA. The others such as AAAAAG and AGTAAA change the most as percentages between cell types

Kind Regards
Mustafa

ValueError: Invalid win_type gaussian

/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/count.py:594: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. Traceback (most recent call last): File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/bin/lapa", line 33, in <module> sys.exit(load_entry_point('lapa==0.0.5', 'console_scripts', 'lapa')()) File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/main.py", line 112, in cli_lapa File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/lapa.py", line 497, in lapa File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/lapa.py", line 293, in __call__ File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/lapa.py", line 149, in clustering File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 377, in to_df File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 378, in <listcomp> File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 255, in to_dict File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 135, in to_dict File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 118, in peak File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/pandas/core/generic.py", line 11986, in rolling return Window( File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/pandas/core/window/rolling.py", line 165, in __init__ self._validate() File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/pandas/core/window/rolling.py", line 1168, in _validate raise ValueError(f"Invalid win_type {self.win_type}") ValueError: Invalid win_type gaussian

output files

Dear Lapa,

I have run your tool, currently as a test. I was wondering if you could add some documentation/ pass on some wisdom on what output files to expect?

I have bw and bed files for all my conditions. I have polyA_cluster.bed (what do the columns stand for? What does "None@None" in the last columns mean? .

I have no other files ... ... I was wondering if the program terminated early, however, there is no error or warning in any out files via slurm .. (RAM limit was not exceeded)

can you please advise?

Kind regards,

Pete

pyrange error "ValueError: all elements of `new_shape` must be non-negative"

Hi Mihammed,

I got the following pyrange error:

Traceback (most recent call last):
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/bin/lapa", line 8, in
sys.exit(cli_lapa())
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/lapa.py", line 288, in call
df_all_count, sample_counts = self.counting(alignment)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/lapa.py", line 142, in counting
df_all_count, sample_counts = counter.to_df()
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/count.py", line 583, in to_df
df = pd.concat([
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/count.py", line 584, in
self.build_counter(row['path'])
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/count.py", line 142, in to_df
return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'})
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/count.py", line 136, in to_gr
return pr.PyRanges(df).count_overlaps(
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/pyranges/pyranges_main.py", line 1385, in count_overlaps
counts = pyrange_apply(_number_overlapping, self, other, **kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/pyranges/multithreaded.py", line 231, in pyrange_apply
result = call_f(function, nparams, df, odf, kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/pyranges/multithreaded.py", line 21, in call_f
return f.remote(df, odf, **kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/pyranges/methods/coverage.py", line 26, in _number_overlapping
_self_indexes, _other_indexes = oncls.all_overlaps_both(starts, ends, indexes)
File "ncls/src/ncls.pyx", line 74, in ncls.src.ncls.NCLS64.all_overlaps_both
File "ncls/src/ncls.pyx", line 115, in ncls.src.ncls.NCLS64.all_overlaps_both
File "<array_function internals>", line 5, in resize
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 1423, in resize
raise ValueError('all elements of new_shape must be non-negative')
ValueError: all elements of new_shape must be non-negative

If it can help you, here is the format of one bam read :

molecule/4051_GGCAATACTCGTGACC_B900_Tum_B900_Tum 16 chr1 14424 12 406M140N69M757N108M1I44M659N159M92N198M177N56M GATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACTGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGCTGCGGTGGCGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGCTCCTTCTGCTGCTTCTTCTCCAGCTTTCGCTCCTTCATGCTGCGCAGCTTGGCCTTGCCGATGCCCCCAGCTTGGCGGATGGACTCTAGCAGAGTGGCCCAGCCACCGGAGGGGTCAACCACTTCCCTGGGAGCTCCCTGGACTGAAGGAGACGCGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGGTGGGAGTGGGGGTGCACTGGCCAGCACCTCAGGAGCTGGGGGTGGTGGTGGGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAGGGGCAGAGGGGGCAATGCCGGGGCCCAGGTCGGCAATGTACATGAGGTCGTTGGCAATGCCGGGCAGGTCAGGCAGGTAGGATGGAACATCAATCTCAGGCACCTGGCCCAGGTCTGGCACATAGAAGTAGTTCTCTGGGACCTGCTGTTCCAGCTGCTCTCTCTTGCTGATGGACAAGGGGGCATCAAACAGCTTCT * NM:i:3 ms:i:1031 AS:i:87nn:i:0 ts:A:+ tp:A:P cm:i:307 s1:i:987 s2:i:975 de:f:0.0029 rl:i:0

Let me know if you need any further detail.

Thanks for the help

Fix sorted-nearest version 0.0.33

Dear Lapa,

FIX: this is a fix for all the blah below:

mamba install -c bioconda pyranges (it seems the pip version of pyranges does not have all that is required).

I have installed lap via pip install lapa (but I have to also do pip install cython). I have this in a conda environment running python 3.8.

If I run:

$ lapa
Traceback (most recent call last):
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/bin/lapa", line 5, in
from lapa.main import cli
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/init.py", line 1, in
from lapa.main import lapa
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/main.py", line 2, in
from lapa.lapa import lapa
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 3, in
import pyranges as pr
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/init.py", line 137, in
import pyranges.genomicfeatures as gf
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/genomicfeatures.py", line 7, in
from sorted_nearest.src.introns import find_introns
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/sorted_nearest/init.py", line 7, in
from sorted_nearest.src.k_nearest_ties import get_all_ties, get_different_ties
ImportError: cannot import name 'get_all_ties' from 'sorted_nearest.src.k_nearest_ties' (/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/sorted_nearest/src/k_nearest_ties.cpython-38-x86_64-linux-gnu.so)

Then if I force it to use python3.8 with a lapa.py version from github: I get the same error.

$ python3.8 "/mnt/shared/scratch/pthorpe/private/mustafa/lapa/lapa/lapa/lapa.py"
Traceback (most recent call last):
File "/mnt/shared/scratch/pthorpe/private/mustafa/lapa/lapa/lapa/lapa.py", line 3, in
import pyranges as pr
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/init.py", line 137, in
import pyranges.genomicfeatures as gf
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/genomicfeatures.py", line 7, in
from sorted_nearest.src.introns import find_introns
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/sorted_nearest/init.py", line 7, in
from sorted_nearest.src.k_nearest_ties import get_all_ties, get_different_ties
ImportError: cannot import name 'get_all_ties' from 'sorted_nearest.src.k_nearest_ties' (/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/sorted_nearest/src/k_nearest_ties.cpython-38-x86_64-linux-gnu.so)

If I then run pip install pyranges. It says it is already satisfied ..

error: "Only a column name can be used for the key in a dtype mappings argument"

Hi Muhammed,
I'm trying to test lapa with RNAseq short reads. I'm using hisat2 for the mapping ( I built the hg38 with transcript index using the files suggested in the lapa tutorial). And my python version is 3.9

After fixing the gtf file and gave it the right format to all the inputs. Lapa failed after trying to process the bam for the first sample with the following error:

$ lapa --alignment samples.csv --fasta genome.fa --annotation genome_utr.gtf --chrom_sizes chrom_sizes --output_dir lapa_test
Traceback (most recent call last):
File "/home/eortiz/.local/bin/lapa", line 8, in
sys.exit(cli_lapa())
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 288, in call
df_all_count, sample_counts = self.counting(alignment)
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 142, in counting
df_all_count, sample_counts = counter.to_df()
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 583, in to_df
df = pd.concat([
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 584, in
self.build_counter(row['path'])
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 142, in to_df
return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'})
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/pandas/core/generic.py", line 5791, in astype
raise KeyError(
KeyError: 'Only a column name can be used for the key in a dtype mappings argument.'

I know this error is generated when the names in the columns don't match exactly, but I'm not so sure how to fix it.
Any suggestion is welcome.

Thanks.

latest version issue: RuntimeError: The entries you tried to add are out of order, precede already added entries, or otherwise use illegal values. Please correct this and try again.

Hi Muhammed,

Well done with the documentation updates! This is great. I have upgraded to the latest, as suggested. However, I have come across an issue: (full error at the bottom)

lapa command: lapa --alignment samples.csv --fasta GRCh38.primary_assembly.genome.fa --annotation hg39.utr_fixed.gtf --chrom_sizes chrom_sizes --output_dir lapa_c_vs_t

(these are the same input files which worked with the previous version, except I fixed the UTR, which was in the docs:

#gencode_utr_fix --input_gtf mm10.gtf --output_gtf mm10.utr_fixed.gtf

wget -O - https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.annotation.gtf.gz | gunzip -c > hg38.gtf

gencode_utr_fix --input_gtf hg38.gtf --output_gtf hg39.utr_fixed.gtf

gencode_utr_fix --input_gtf gencode.v39.primary_assembly.annotation.gtf --output_gtf hg39.utr_fixed.gtf

Both of these fail in with the main lapa command

.....
.....
[E::idx_find_and_load] Could not retrieve index file for '/home/pthorpe/scratch/mustafa/lapa/reads_bams/R6_Trt_LONG.fastq.gz.temp.mapped.bam'
Traceback (most recent call last):
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/bin/lapa", line 8, in
sys.exit(cli_lapa())
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 288, in call
df_all_count, sample_counts = self.counting(alignment)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 143, in counting
counter._to_bigwig(df_all_count, sample_counts, self.chrom_sizes,
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 561, in to_bigwig
save_count_bw(df_all, output_dir, chrom_sizes, f'all
{prefix}')
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 197, in save_count_bw
BaseCounter._to_bigwig(df, chrom_sizes, output_dir, prefix)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 153, in _to_bigwig
bw_from_pyranges(
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/utils/io.py", line 153, in bw_from_pyranges
gr['-'].to_bigwig(bw_neg_file, chromosome_sizes=chrom_sizes,
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/pyranges.py", line 5339, in to_bigwig
result = _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/out.py", line 203, in _to_bigwig
bw.addEntries(chromosomes, starts, ends=ends, values=values)
RuntimeError: The entries you tried to add are out of order, precede already added entries, or otherwise use illegal values.
Please correct this and try again.

Would you be able to help?

regards,

Pete

AttributeError: module 'numpy' has no attribute 'int'

Following the google colab jupyter notebook, I ran all the code prior to prepare config and gtf and fa successfully however when I ran the following

! lapa  --alignment sample_config.csv \
        --fasta /home/mustafa/projects/ReferenceGenomes/gencode/v41/GRCh38.primary_assembly.genome.fa \
        --annotation gencode.v41.primary_assembly.annotation.utr_fixed.gtf \
        --chrom_sizes gencode.v41.chrom_sizes \
        --output_dir LAPA_PolyAClusterCalling

After a while i get the error below

/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:594: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  df_all = df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
Traceback (most recent call last):
  File "/root/miniconda3/envs/LAPA/bin/lapa", line 8, in <module>
    sys.exit(cli_lapa())
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa
    lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa
    _lapa(alignment)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/lapa.py", line 288, in __call__
    df_all_count, sample_counts = self.counting(alignment)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/lapa.py", line 143, in counting
    counter._to_bigwig(df_all_count, sample_counts, self.chrom_sizes,
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py", line 561, in _to_bigwig
    save_count_bw(df_all, output_dir, chrom_sizes, f'all_{prefix}')
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py", line 197, in save_count_bw
    BaseCounter._to_bigwig(df, chrom_sizes, output_dir, prefix)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py", line 153, in _to_bigwig
    bw_from_pyranges(
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/utils/io.py", line 153, in bw_from_pyranges
    gr['+'].to_bigwig(bw_pos_file, chromosome_sizes=chrom_sizes,
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/pyranges.py", line 5381, in to_bigwig
    result = _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/out.py", line 189, in _to_bigwig
    gr = self.to_rle(rpm=rpm, strand=False, value_col=value_col).to_ranges()
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/pyranges.py", line 5745, in to_rle
    return _to_rle(self, value_col, strand=strand, rpm=rpm, nb_cpu=nb_cpu)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/methods/to_rle.py", line 22, in _to_rle
    result = pyrange_apply_single(coverage, ranges, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/multithreaded.py", line 382, in pyrange_apply_single
    result = call_f_single(function, nparams, df, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/multithreaded.py", line 31, in call_f_single
    return f.remote(df, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyrle/methods.py", line 167, in coverage
    runs, values = _coverage(_df.Position.values, _df.Value.values)
  File "pyrle/src/coverage.pyx", line 67, in pyrle.src.coverage._coverage
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'int'

lapa was installed with pip in a new conda environment python=3.8.

Any help would very much be welcomed

Value error need at least one array to concatenate

Hi,
I am running LAPA with the command -
lapa --alignment /data/salomonis-archive/FASTQs/Grimes/RNA/scRNASeq/10X-Genomics/LGCHMC53-17GEX/PacbioPBMC/PacbioPBMC/outs/possorted_genome_bam.bam --fasta hg38.fa --annotation hg38.gtf --chrom_sizes hg38.chrom_sizes --output_dir pbmc_pacbio_1

I am getting this error -
a3-2020/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 288, in call
df_all_count, sample_counts = self.counting(alignment)
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 142, in counting
df_all_count, sample_counts = counter.to_df()
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 583, in to_df
df = pd.concat([
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 584, in
self.build_counter(row['path'])
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 142, in to_df
return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'})
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 136, in to_gr
return pr.PyRanges(df).count_overlaps(
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/pyranges.py", line 1322, in count_overlaps
counts = pyrange_apply(_number_overlapping, self, other, **kwargs)
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/multithreaded.py", line 236, in pyrange_apply
result = call_f(function, nparams, df, odf, kwargs)
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/multithreaded.py", line 23, in call_f
return f.remote(df, odf, **kwargs)
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/methods/coverage.py", line 27, in _number_overlapping
_self_indexes, _other_indexes = oncls.all_overlaps_both(
File "ncls/src/ncls32.pyx", line 76, in ncls.src.ncls32.NCLS32.all_overlaps_both
File "ncls/src/ncls32.pyx", line 122, in ncls.src.ncls32.NCLS32.all_overlaps_both
File "<array_function internals>", line 5, in resize
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1417, in resize
a = concatenate((a,) * n_copies)
File "<array_function internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

Please guide what could be the error due to ?

Thanks

Installation error

Hello,

I am encountering problems installing lapa. Here is the error I am getting

#0 1.082 Collecting lapa                                                                                                                                                                                                                                                                      
#0 1.115   Downloading lapa-0.0.5-py3-none-any.whl (36 kB)                                                                                                                                                                                                                                    
#0 1.132 Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (from lapa) (68.0.0)                                                                                                                                                                            
#0 1.132 Requirement already satisfied: tqdm in /opt/conda/lib/python3.10/site-packages (from lapa) (4.65.0)                                                                                                                                                                                  
#0 1.360 Collecting numpy<=1.23 (from lapa)
#0 1.372   Downloading numpy-1.23.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.0 MB)
#0 1.565      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.0/17.0 MB 70.9 MB/s eta 0:00:00
#0 1.615 Collecting click (from lapa)
#0 1.623   Downloading click-8.1.6-py3-none-any.whl (97 kB)
#0 1.630      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 kB 17.3 MB/s eta 0:00:00
#0 1.781 Collecting pandas (from lapa)
#0 1.790   Downloading pandas-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
#0 1.889      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 113.1 MB/s eta 0:00:00
#0 1.932 Collecting pybigwig (from lapa)
#0 1.945   Downloading pyBigWig-0.3.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209 kB)
#0 1.954      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.7/209.7 kB 31.3 MB/s eta 0:00:00
#0 2.080 Collecting scipy (from lapa)
#0 2.092   Downloading scipy-1.11.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.3 MB)
#0 2.376      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.3/36.3 MB 63.5 MB/s eta 0:00:00
#0 2.444 Collecting bamread>=0.0.10 (from lapa)
#0 2.458   Downloading bamread-0.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (650 kB)
#0 2.470      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 650.3/650.3 kB 60.0 MB/s eta 0:00:00
#0 2.497 Collecting pyranges>=0.0.71 (from lapa)
#0 2.508   Downloading pyranges-0.0.129-py3-none-any.whl (1.5 MB)
#0 2.527      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 87.0 MB/s eta 0:00:00
#0 2.552 Collecting sorted-nearest==0.0.33 (from lapa)
#0 2.565   Downloading sorted_nearest-0.0.33.tar.gz (1.2 MB)
#0 2.580      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 98.2 MB/s eta 0:00:00
#0 2.643   Installing build dependencies: started
#0 8.952   Installing build dependencies: finished with status 'done'
#0 8.954   Getting requirements to build wheel: started
#0 9.657   Getting requirements to build wheel: finished with status 'error'
#0 9.666   error: subprocess-exited-with-error
#0 9.666   
#0 9.666   × Getting requirements to build wheel did not run successfully.
#0 9.666   │ exit code: 1
#0 9.666   ╰─> [173 lines of output]
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       @cython.boundscheck(False)
#0 9.666       @cython.wraparound(False)
#0 9.666       @cython.initializedcheck(False)
#0 9.666       cpdef annotate_clusters64(const long [::1] starts, const long [::1] ends, int slack):
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:23:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       @cython.wraparound(False)
#0 9.666       @cython.initializedcheck(False)
#0 9.666       cpdef annotate_clusters64(const long [::1] starts, const long [::1] ends, int slack):
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666           cpdef int max_end = ends[0]
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:24:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       @cython.initializedcheck(False)
#0 9.666       cpdef annotate_clusters64(const long [::1] starts, const long [::1] ends, int slack):
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666           cpdef int max_end = ends[0]
#0 9.666           cpdef int i = 0
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:25:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       cpdef annotate_clusters64(const long [::1] starts, const long [::1] ends, int slack):
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666           cpdef int max_end = ends[0]
#0 9.666           cpdef int i = 0
#0 9.666           cpdef int n_clusters = 1
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:26:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666           cpdef int max_end = ends[0]
#0 9.666           cpdef int i = 0
#0 9.666           cpdef int n_clusters = 1
#0 9.666           cpdef int length = len(starts)
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:27:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       @cython.boundscheck(False)
#0 9.666       @cython.wraparound(False)
#0 9.666       @cython.initializedcheck(False)
#0 9.666       cpdef annotate_clusters32(const int32_t [::1] starts, const int32_t [::1] ends, int slack):
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:55:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       @cython.wraparound(False)
#0 9.666       @cython.initializedcheck(False)
#0 9.666       cpdef annotate_clusters32(const int32_t [::1] starts, const int32_t [::1] ends, int slack):
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666           cpdef int max_end = ends[0]
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:56:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       @cython.initializedcheck(False)
#0 9.666       cpdef annotate_clusters32(const int32_t [::1] starts, const int32_t [::1] ends, int slack):
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666           cpdef int max_end = ends[0]
#0 9.666           cpdef int i = 0
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:57:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       cpdef annotate_clusters32(const int32_t [::1] starts, const int32_t [::1] ends, int slack):
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666           cpdef int max_end = ends[0]
#0 9.666           cpdef int i = 0
#0 9.666           cpdef int n_clusters = 1
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:58:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       
#0 9.666       Error compiling Cython file:
#0 9.666       ------------------------------------------------------------
#0 9.666       ...
#0 9.666       
#0 9.666           cpdef int min_start = starts[0]
#0 9.666           cpdef int max_end = ends[0]
#0 9.666           cpdef int i = 0
#0 9.666           cpdef int n_clusters = 1
#0 9.666           cpdef int length = len(starts)
#0 9.666                 ^
#0 9.666       ------------------------------------------------------------
#0 9.666       
#0 9.666       sorted_nearest/src/annotate_clusters.pyx:59:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
#0 9.666       Compiling sorted_nearest/src/sorted_nearest.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/max_disjoint_intervals.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/k_nearest.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/k_nearest_ties.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/clusters.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/annotate_clusters.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/cluster_by.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/merge_by.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/introns.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/windows.pyx because it changed.
#0 9.666       Compiling sorted_nearest/src/tiles.pyx because it changed.
#0 9.666       [ 1/11] Cythonizing sorted_nearest/src/annotate_clusters.pyx
#0 9.666       Traceback (most recent call last):
#0 9.666         File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
#0 9.666           main()
#0 9.666         File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
#0 9.666           json_out['return_val'] = hook(**hook_input['kwargs'])
#0 9.666         File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
#0 9.666           return hook(config_settings)
#0 9.666         File "/tmp/pip-build-env-zg_oj5sl/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
#0 9.666           return self._get_build_requires(config_settings, requirements=['wheel'])
#0 9.666         File "/tmp/pip-build-env-zg_oj5sl/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
#0 9.666           self.run_setup()
#0 9.666         File "/tmp/pip-build-env-zg_oj5sl/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 487, in run_setup
#0 9.666           super(_BuildMetaLegacyBackend,
#0 9.666         File "/tmp/pip-build-env-zg_oj5sl/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
#0 9.666           exec(code, locals())
#0 9.666         File "<string>", line 79, in <module>
#0 9.666         File "/tmp/pip-build-env-zg_oj5sl/overlay/lib/python3.10/site-packages/Cython/Build/Dependencies.py", line 1134, in cythonize
#0 9.666           cythonize_one(*args)
#0 9.666         File "/tmp/pip-build-env-zg_oj5sl/overlay/lib/python3.10/site-packages/Cython/Build/Dependencies.py", line 1301, in cythonize_one
#0 9.666           raise CompileError(None, pyx_file)
#0 9.666       Cython.Compiler.Errors.CompileError: sorted_nearest/src/annotate_clusters.pyx
#0 9.666       [end of output]
#0 9.666   
#0 9.666   note: This error originates from a subprocess, and is likely not a problem with pip.
#0 9.667 error: subprocess-exited-with-error
#0 9.667 
#0 9.667 × Getting requirements to build wheel did not run successfully.
#0 9.667 │ exit code: 1
#0 9.667 ╰─> See above for output.
#0 9.667 
#0 9.667 note: This error originates from a subprocess, and is likely not a problem with pip.
------
Dockerfile:11
--------------------
   9 |     RUN apt-get clean all
  10 |     
  11 | >>> RUN pip3 install lapa
--------------------
ERROR: failed to solve: process "/bin/sh -c pip3 install lapa" did not complete successfully: exit code: 1

Thank you.

ValueError: new categories must not include old categories

Hi, I am using lapa for the DRS and cDNA ONT data. While it runs smoothly in DRS, in case of the cDNA reads, it throws an error at the clustering stage.

I used the following command:
lapa --alignment alignment.csv --fasta /references/reference/ucsc/rn7.fa --annotation /references/reference/ucsc/lapa_utrs_ncbiRefSeq.gtf --chrom_sizes /references/reference/ucsc/chrom_sizes.txt --output_dir /ANALYSES/rat/cDNA/LAPA

And here is the traceback:
Traceback (most recent call last):
File "/usr/local/software/lapa/eb16fee/bin/lapa", line 11, in
load_entry_point('lapa==0.0.5', 'console_scripts', 'lapa')()
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/main.py", line 122, in cli_lapa
non_replicates_read_threhold=non_replicates_read_threhold)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 297, in call
df_cluster = self.annotate_cluster(df_cluster)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 155, in annotate_cluster
df = self.create_genomic_regions().annotate(gr)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/genomic_regions.py", line 67, in annotate
gr_gtf, strandedness='same', how='left')
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/pyranges.py", line 2257, in join
dfs = pyrange_apply(_write_both, self, other, **kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/multithreaded.py", line 236, in pyrange_apply
result = call_f(function, nparams, df, odf, kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/multithreaded.py", line 23, in call_f
return f.remote(df, odf, **kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 129, in _write_both
scdf, ocdf = _both_dfs(scdf, ocdf, how=how)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 83, in _both_dfs
oh = null_types(ocdf.head(1))
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 67, in null_types
tmp_cat = tmp_cat.cat.add_categories("-1")
File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/accessor.py", line 89, in f
return self._delegate_method(name, *args, **kwargs)
File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2403, in _delegate_method
res = method(*args, **kwargs)
File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 1023, in add_categories
raise ValueError(msg.format(already_included=already_included))
ValueError: new categories must not include old categories: {'-1'}

I would be grateful for solving the issue.

AssertionError: Can only do stranded operations when both PyRanges contain strand info

I use LAPA on aligned pacbio data from minimap2

I got the following error:

File "/usr/nzx-cluster/apps/lapa/python3.8.11/bin/lapa", line 8, in
sys.exit(cli_lapa())
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/lapa.py", line 297, in call
df_cluster = self.annotate_cluster(df_cluster)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/lapa.py", line 155, in annotate_cluster
df = self.create_genomic_regions().annotate(gr)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/genomic_regions.py", line 66, in annotate
gr_ann = pr.PyRanges(gr.df, int64=True).join(
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/pyranges/pyranges_main.py", line 2433, in join
dfs = pyrange_apply(_write_both, self, other, **kwargs)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/pyranges/multithreaded.py", line 207, in pyrange_apply
assert (
AssertionError: Can only do stranded operations when both PyRanges contain strand info

Pychopper

Hi @MuhammedHasan

Would you recommend using pychopper upstream for ONT reads. Just as an additional QC step or do you think it wouldn't make a difference because of the way LAPA looks for PolyA signal?

Mustafa

Support for non stranded paired ended data

Hi there, I encountered this error when running LAPA on single short read BAM file. What do you advise to solve this? Thanks!

lapa --alignment ${illumina_bam_dir}/${bamfile} --fasta ${reference_genome_fa} --annotation ${reference_gtf} --chrom_sizes ${chrom_sizes} --output_dir ${outdir}/vb_annot/${samplename}_illumina        

Error:

Traceback (most recent call last): File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/bin/lapa", line 8, in sys.exit(cli_lapa()) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/main.py", line 112, in cli_lapa lapa(alignment, fasta, annotation, chrom_sizes, output_dir, File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/lapa.py", line 497, in lapa _lapa(alignment) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/lapa.py", line 288, in call df_all_count, sample_counts = self.counting(alignment) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/lapa.py", line 142, in counting df_all_count, sample_counts = counter.to_df() File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/count.py", line 583, in to_df df = pd.concat([ File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/count.py", line 584, in self.build_counter(row['path']) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/count.py", line 142, in to_df return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'}) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/pandas/core/generic.py", line 6212, in astype raise KeyError( KeyError: "Only a column name can be used for the key in a dtype mappings argument. 'Chromosome' not found in columns."

My chrom.sizes file for Anopheles gambiae looks like this, in case that helps (it was generated using samtools faidx as instructed)

AgamP4_2L 49364325
AgamP4_2R 61545105
AgamP4_3L 41963435
AgamP4_3R 53200684
AgamP4_UNKN 42389979
AgamP4_X 24393108
AgamP4_Y_unplaced 237045
AAAB01000047 21505
AAAB01000163 28420
AAAB01000448 22809
AAAB01000791 62303
(..more contigs..)
AgamP4_Mt 15363

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.