wassermanlab / jaspar-ucsc-tracks Goto Github PK

View Code? Open in Web Editor NEW

11.0 8.0 5.0 311.04 MB

Code and data used to create the JASPAR UCSC Genome Browser tracks data hub

License: MIT License

Python 67.52% Perl 22.67% Shell 9.82%

gene-regulation tfbs-discovery

jaspar-ucsc-tracks's People

Contributors

Stargazers

Watchers

Forkers

maggishaggy tixii malcook whorton-j-a hengbingao

jaspar-ucsc-tracks's Issues

About installation

Hi,
Thank you for the development of JASPAR-UCSC-tracks

I am very interested in using this program. However, I try to install the program by the bash execution of the script install-pwmscan.sh. However, the following error shows every time:

gcc -fPIC -O3 -std=gnu99 -W -Wall -o hashtable.o -c hashtable.c
make: gcc: Command not found
make: *** [Makefile:42: hashtable.o] Error 127

Could you help me?

Thank you!

mm10 build?

It would be great if you could also provide tfbs prediction tracks for mm10

2022 v 2020 danRer11 tracks changed from using TF name to matrix identifier possibly in error

While comparing

http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2020/JASPAR2020_danRer11.bb
http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/JASPAR2022_danRer11.bb

I find that the 2022 bigbed files display the matrix identifiers while those for 2020 display the TF name, at least as displayed in IGV, as pictured below.

I expect this is in error.

fetch -p parameter

@oriolfornes

Shouldn't this part from the fetch* script not be "from jaspar2pfm.py" rather than from "jaspar2meme.py"?

parser.add_option("-p", action="store", type="string", dest="profiles_dir", help="Profiles directory (from jaspar2meme.py)", metavar="<profiles_dir>")

p_value clarification needed on their scaling and interpretation

In JASPAR UCSC tracks I read that scores in the bigbed files are p-values which have benn

(scaled between 0-1000, where 0 corresponds to p-value = 1 and 1000 to p-value ≤ 10-10)

Can I use R's rescale function to recover the p-values from the scores? For instance, a score of 950 comes from a p-value of .05

library(scales)
rescale(950,c(1,10**-10),c(0,1000))
.05

In any case, I am having trouble effectively interpreting the p_value.

In PWMScan: a fast tool for scanning entire genomes with a position-specific weight matrix I read

The P-value of a PWM score x is defined as the probability that a random k-mer sequence of the length of the PWM has a binding score ≥ x given the base composition of the genome.

I would hope to find some measure of how well the sequence at a candidate (or putative) binding site identified by PWMScan matches the motif PWM. This does not seem to provide that. Or am I mistaken?

I would like possibility of being more stringent in selection of candidates from this trace by setting a threshold on the score. However, I am hesitant to adopt this approach as thresholding on the scaled P_value could introduce a bias toward a subset of the universe of motifs. Is my reasoning suspect here?

edit: Perhaps another way of getting at this is to ask: do the motifs have the same distribution of P_values as each other?. If they do, then thresholding across the board at any given P_value should remove an equal fraction of each motif's hits. Do you know if they do?

Error cannot convert float NaN to integer

Hello! thank you for always answering the Issues.

I have had some problems with the installation of JASPAR, but I was thinking that I finally covered it. However, when I was trying to run the example command:
./scan-sequence.py genomes/sacCer3/sacCer3.fa profiles/ -o tracks/sacCer3/ --threads 4 --taxon fungi

The error ValueError: cannot convert float NaN to an integer.

My process of installation was:
Git clone https://github.com/wassermanlab/JASPAR-UCSC-tracks
bash install-pwmscan.sh
conda env create -f ./conda/environment.yml
mv pwmscan/* JASPAR-UCSC-tracks/ (I did this due to there was an error with where is matrix_scan

run ./scan-sequence.py genomes/sacCer3/sacCer3.fa profiles/ -o tracks/sacCer3/ --threads 4 --taxon fungi

I'm using JASPAR version 1.0

Thank you so much!

Provided binaries do not work

The provided binaries of matrix_scan and matrix_probe do not work on any of my Linux systems. (RH7 and Ubuntu 16.0x-20.0x

Do you have a working Linux binary available?

mismatched taxon

Hi, thank you for the great resources.

I'm looking at hg38 http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/
and found some motifs that seem unrelated to Vertebrata, for example, MA2020 is from Arabidopsis thaliana. MA1879 is from Ciona intestinalis (these are just examples and there are more).

I see the option --taxon vertebrate for hg38 in your code https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/scan-sequences.sh
, so I assumed that only motifs linked to vertebrates are included.

Since other genome versions like hg19 or mm10 also include the non-vertebrate motif annotation, I wondered if it's intended or if there are some mistakes.

Thank you!
Nana

wassermanlab / jaspar-ucsc-tracks Goto Github PK

jaspar-ucsc-tracks's People

Contributors

Stargazers

Watchers

Forkers

jaspar-ucsc-tracks's Issues

About installation

mm10 build?

2022 v 2020 danRer11 tracks changed from using TF name to matrix identifier possibly in error

fetch -p parameter

p_value clarification needed on their scaling and interpretation

Error cannot convert float NaN to integer

Provided binaries do not work

mismatched taxon

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent