comprna / mosea Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 10.0 1.84 MB

Motif Scan and Enrichment Analysis (MoSEA)

License: ISC License

Python 72.91% Shell 15.32% R 11.77%

motif-enrichment-analysis

mosea's People

Contributors

Stargazers

Watchers

Forkers

bioxiao computationalrnabiology guanminxiao youngorchuang babisingh wenmm wangdi2014 h-wen standardgalactic sivkri

mosea's Issues

options --match_len 1 --len_ext 20

Dear All,

In the demo of MoSEA is mentioned that we can run the enrich part for detecting enriched motifs using the two options --match_len 1 --len_ext 20.

When I run the tool with the above options I got an error saying they are unrecognized arguments.

Could you please tell me whether they were omitted from the mosea.py script? if so, how can I add them to be able to control for length differences between the bg and regulated sequences?

Thank you so much in advance!
Best regards,
Jamal.

mosea.py scan

Hello I have been trying to run MoSEA/mosea.py scan on the test files and I get this error.
python MoSEA/mosea.py scan --pfm --pfm_path MoSEA/test_files/motifs/pfms/ --fasta fafile_reg --out_dir fmopfm_outdir --count
scanning Motifs on file: fafile_reg
121/121[==================================================] 100%
Scanned 121 motif(s). Output saved in dir: fmopfm_outdir
('fafile_reg', 'MoSEA/test_files/motifs/pfms/', 'fmopfm_outdir')
ERROR:
Counting Motifs on file: fafile_reg
1/38[= ] 2%
Error in parsing: "['sequence name'] not in index"
I understand that the issue must be with parsing pfm files as the error comes from the count_motif function but I don’t understand why.

error trying plot_script.R

Dear all,
Thank you for sharing with us the R script to make perfect visualization of MoSEA outputs.

While trying the script on the test file I got the following error:
Attaching package: ‘reshape2’

The following objects are masked from ‘package:reshape’:

colsplit, melt, recast

cancer RBP_id regulation location robust_zscore exp_log2fold

633 LUSC CELF1 positive down 2.195866 0.3741492
129 LUSC CELF1 positive up 2.194584 0.3741492
634 LUSC CELF2 positive down 2.195866 -2.9128859
130 LUSC CELF2 positive up 2.194584 -2.9128859
635 LUSC CELF3 positive down 2.195866 -0.3538104
131 LUSC CELF3 positive up 2.194584 -0.3538104
cancer RBP_id regulation location robust_zscore exp_log2fold
633 LUSC CELF1 positive down 2.19586557294325 0.3741492
129 LUSC CELF1 positive up 2.19458434304226 0.3741492
634 LUSC CELF2 positive down 2.19586557294325 -2.9128859
130 LUSC CELF2 positive up 2.19458434304226 -2.9128859
635 LUSC CELF3 positive down 2.19586557294325 -0.3538104
131 LUSC CELF3 positive up 2.19458434304226 -0.3538104
group
633 Upregulated
129 Upregulated
634 Downregulated
130 Downregulated
635 Downregulated
131 Downregulated
Error in -(robust_zscore) : invalid argument to unary operator
Calls: with -> with.default -> eval -> eval -> ifelse
Execution halted

I run the following command while specifying the minimum zscore at 1.96
Rscript plot_script.R ./test_file.tab ../plot_script 1 3 0.2 ./test_plot.png test_heatmap

Any help would be much appreciated!
Thank you in advance!

Best regards,
Jamal.

variable event support

Will variable SUPPA events be supported?

strand information for MoSEA not using SUPPA2 events

Dear team,
May this email finds you all fine.

I have a question related to using MoSEA using coordinates from a tool different than SUPPA2.
my question is should I provide the strand information when extracting the sequences ? My bed file looks lik the following:
chr18 63127035 63128759 BCL2_E1
chr18 63123346 63127034 BCL2_E2
chr18 63126835 63127035 BCL2_U1
chr18 63126834 63127034 BCL2_U2

Providing the above file to I got the sequences. When I scan the sequences for the occurrence of the RBP binding motif i got something like:
#pattern name sequence name start stop strand score p-value q-value matched sequence
HNRNPL_00091 BCL2_E1 794 800 + 8.13415 0.000743 ACACAAT
HNRNPL_00091 BCL2_E1 1260 1266 + 10.0671 7.11e-05 ACACGAA
HNRNPL_00091 BCL2_E2 87 93 + 10.0549 0.000159 ACACAAA
HNRNPL_00091 BCL2_E2 1122 1128 + 9.96951 0.000413 ACACAAG
HNRNPL_00091 BCL2_E2 1426 1432 + 8.2378 0.000536 ACACCAC
HNRNPL_00091 BCL2_E2 1877 1883 + 7.56098 0.000996 ACACAGA

Looking at the strand column the tool reports a sequence on the positive strand, while my gene BCL2 is on the reverse strand "ensembl location: Chromosome 18: 63,123,346-63,320,128 reverse strand".
I'm using the hg38 genome assembly to extract sequences.

I will be very thankful if you can help me to fix this issue.
Thank you so much in advance!
Kind regards,
Jamal.

ctrl_events_ids & reg_events_ids files

Dear Eduardo,

First of all, thank you so much for sharing your expertise with us!

I have a simple question about the two input files used in step A of the anlysis.
these files in this file : MoSEA/test_files/infile/
control_events_chr22.ids
reg_events_chr22.ids

I know that they contain event ids detected by SUPPA, but I just want make sure I correctly understood the difference between them:
Does this file "control_events_chr22.ids" contain all SE events regardless if they are or not significant?
Does the second file "reg_events_chr22.ids" contain regulated SE events found to be significant according to SUPPA pipeline?

if not, could you please clarify what these files are?
Thank you so much in advance!
Respectfully,
Jamal.

comprna / mosea Goto Github PK

mosea's People

Contributors

Stargazers

Watchers

Forkers

mosea's Issues

options --match_len 1 --len_ext 20

mosea.py scan

error trying plot_script.R

variable event support

strand information for MoSEA not using SUPPA2 events

ctrl_events_ids & reg_events_ids files

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent