wassermanlab / jaspar-ucsc-tracks Goto Github PK
View Code? Open in Web Editor NEWCode and data used to create the JASPAR UCSC Genome Browser tracks data hub
License: MIT License
Code and data used to create the JASPAR UCSC Genome Browser tracks data hub
License: MIT License
Hi,
Thank you for the development of JASPAR-UCSC-tracks
I am very interested in using this program. However, I try to install the program by the bash execution of the script install-pwmscan.sh. However, the following error shows every time:
gcc -fPIC -O3 -std=gnu99 -W -Wall -o hashtable.o -c hashtable.c
make: gcc: Command not found
make: *** [Makefile:42: hashtable.o] Error 127
Could you help me?
Thank you!
It would be great if you could also provide tfbs prediction tracks for mm10
While comparing
http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2020/JASPAR2020_danRer11.bb
http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/JASPAR2022_danRer11.bb
I find that the 2022 bigbed files display the matrix identifiers while those for 2020 display the TF name, at least as displayed in IGV, as pictured below.
I expect this is in error.
Shouldn't this part from the fetch* script not be "from jaspar2pfm.py" rather than from "jaspar2meme.py"?
parser.add_option("-p", action="store", type="string", dest="profiles_dir", help="Profiles directory (from jaspar2meme.py)", metavar="<profiles_dir>")
In JASPAR UCSC tracks I read that scores in the bigbed files are p-values which have benn
(scaled between 0-1000, where 0 corresponds to p-value = 1 and 1000 to p-value ≤ 10-10)
Can I use R's rescale function to recover the p-values from the scores? For instance, a score of 950 comes from a p-value of .05
library(scales)
rescale(950,c(1,10**-10),c(0,1000))
.05
In any case, I am having trouble effectively interpreting the p_value.
In PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix I read
The P-value of a PWM score x is defined as the probability that a random k-mer sequence of the length of the PWM has a binding score ≥ x given the base composition of the genome.
I would hope to find some measure of how well the sequence at a candidate (or putative) binding site identified by PWMScan matches the motif PWM. This does not seem to provide that. Or am I mistaken?
I would like possibility of being more stringent in selection of candidates from this trace by setting a threshold on the score. However, I am hesitant to adopt this approach as thresholding on the scaled P_value could introduce a bias toward a subset of the universe of motifs. Is my reasoning suspect here?
edit: Perhaps another way of getting at this is to ask: do the motifs have the same distribution of P_values as each other?. If they do, then thresholding across the board at any given P_value should remove an equal fraction of each motif's hits. Do you know if they do?
Hello! thank you for always answering the Issues.
I have had some problems with the installation of JASPAR, but I was thinking that I finally covered it. However, when I was trying to run the example command:
./scan-sequence.py genomes/sacCer3/sacCer3.fa profiles/ -o tracks/sacCer3/ --threads 4 --taxon fungi
The error ValueError: cannot convert float NaN to an integer.
My process of installation was:
Git clone https://github.com/wassermanlab/JASPAR-UCSC-tracks
bash install-pwmscan.sh
conda env create -f ./conda/environment.yml
mv pwmscan/* JASPAR-UCSC-tracks/ (I did this due to there was an error with where is matrix_scan
run ./scan-sequence.py genomes/sacCer3/sacCer3.fa profiles/ -o tracks/sacCer3/ --threads 4 --taxon fungi
I'm using JASPAR version 1.0
Thank you so much!
The provided binaries of matrix_scan and matrix_probe do not work on any of my Linux systems. (RH7 and Ubuntu 16.0x-20.0x
Do you have a working Linux binary available?
Hi, thank you for the great resources.
I'm looking at hg38 http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/
and found some motifs that seem unrelated to Vertebrata, for example, MA2020 is from Arabidopsis thaliana. MA1879 is from Ciona intestinalis (these are just examples and there are more).
I see the option --taxon vertebrate
for hg38 in your code https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/scan-sequences.sh
, so I assumed that only motifs linked to vertebrates are included.
Since other genome versions like hg19 or mm10 also include the non-vertebrate motif annotation, I wondered if it's intended or if there are some mistakes.
Thank you!
Nana
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.