brendelgroup / movrs Goto Github PK
View Code? Open in Web Editor NEWMotif set Reduction and Validation (MoVRs) - a workflow for the derivation of genome-wide DNA motif detection and vetting
License: GNU General Public License v2.0
Motif set Reduction and Validation (MoVRs) - a workflow for the derivation of genome-wide DNA motif detection and vetting
License: GNU General Public License v2.0
We probably should create a good logfile that traces the consensus motif generation in this script. This would include untangling the final names - the current suffixes were meant for a bit of tracing during development, but this is inadequate for code distribution.
We also need to think about some edge cases of motif merging. In principle, it could be that, for example, we have two 8-mer motifs that overlap by a 6-mer, with 2-mer overhangs on each end: do we report the 6-mer as a consensus motif (which may be found everywhere) or the 10-mer (which may not be found anywhere) or else? The code is flexible enough to go any direction, but we need to review what it is doing currently and document choices.
Coding assigned to me, but issue formulation and example collection is an issue for the team.
I think it may be more conservative to uses "spaces" instead of "tabs" to indent lines in python code.
To wit, vi *pl seems to equate tab with 8 spaces, whereas vi *py sets tab to 4 spaces. Thus "tabs" appear definition-dependent and may cause problems when we edit the files with different editors.
No doubt this is all well known, and of course I know very little about python programming.
But it's worth following up, thus this "issue".
When running a test MOVRs job I noticed an error: AttributeError: module 'networkx' has no attribute 'connected_component_subgraphs
.
It appears this error is due to connected_component_subgraphs
being deprecated in networkx v. 2.4.
A note about this causing an error in another package is here.
To address this I have installed networkx v 2.3 (the last version where the definition is found) as follows:
pip install --user 'networkx==2.3'
. I will run the same test with this version installed and report back.
If we don't want to specify a specific version of the networkx package, this solution may be appropriate for our case.
In Step 7, we need to document candidate MoVRs motif occurrence statistics in the validation sets. We derived motifs based on presence in some form (!) in some minimal number of the training sets - but how does that work out in the validation sets with the derived, consensus motifs? The thought of the whole workflow is that we have at this point solid motifs that indeed are genome-wide over-represented. We should test that.
Something Yao can explore - it's a simple enough addition to the MoVRs script, but testing and summary statistics will take a bit of effort.
I recently encountered an apparent error when running the sample test using the newest build of MoVRs:
MoVRs -a testpeakfile --genome ./TestGenome -o test_0906 --size [-60,40] -p 8 >& errTEST_0906
The error reads as follows:
python /home/rtraborn/development/MoVRs/scripts/MoVRs_ExtractMotif.py -i AllMotifs.meme -t 1e-10 -o FilteredMotifs.meme File "/home/rtraborn/development/MoVRs/scripts/MoVRs_ExtractMotif.py", line 27 tmp_return = [x[0] for x in tmp_sorted] ^ TabError: inconsistent use of tabs and spaces in indentation Fatal error running MoVRs_ExtractMotif.py. Please check.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.