slowkow / homerkit Goto Github PK

View Code? Open in Web Editor NEW

15.0 3.0 3.0 17 KB

Read HOMER motif analysis output in R.

License: GNU General Public License v2.0

R 100.00%

bioinformatics transcription-factors enrichment rstats

homerkit's Introduction

homerkit

homerkit is an R package that implements functions to read HOMER output files.

Installation

How to install HOMER: http://homer.salk.edu/homer/download.html

install.packages("devtools")
devtools::install_github("slowkow/homerkit")

Usage

1. Run HOMER findMotifs.pl on your target genes

head -n3 target_genes.txt
ENSG00000003989
ENSG00000017427
ENSG00000028277

gene_file="target_genes.txt"
bg_file="background_genes.txt"
out_dir="output"

mkdir -p $out_dir
# Find motifs that are enriched in the promoters of your target genes.
findMotifs.pl $gene_file human $out_dir \
  -bg $bg_file &> ${out_dir}/run_homer.log

2. Run HOMER annotatePeaks.pl on every motif

# Find the target genes for each motif.
for motif in $out_dir/*/*.motif; do
  if [[ ! -f ${motif}.tsv ]]
  then
    annotatePeaks.pl tss hg38 \
      -size -500,250 -m $motif -list $gene_file \
      1> ${motif}.tsv 2> ${motif}.tsv.log
  fi
done

3. Read all of the HOMER output files with homerkit

# install.packages("devtools")
# devtools::install_github("slowkow/homerkit")

library(homerkit)
h <- read_homer_output("output")

Novel motif target genes:

head(split(h$novel_motif_peaks$gene_name, h$novel_motif_peaks$motif), 3)

$motif1
[1] "RERG"  "CSF3"  "CXCL6" "CXCL1" "CXCL5" "CXCL3" "CXCL2" "CSF2"  "ELF3" 

$motif10
[1] "IER3"  "MT1X"  "MMP3"  "CCL20"

$motif11
 [1] "IL6"    "CCL7"   "CXCL6"  "CXCL1"  "CXCL5"  "CXCL3"  "CXCL2"  "GPR183"
 [9] "NR4A2"  "PLD1"

Possible transcription factors that match motif1:

subset(h$novel_motif_tfs, motif == "motif1")

# A tibble: 10 × 8
                                                 match_name match_rank offset
                                                      <chr>      <dbl>  <dbl>
1  NFkB-p65-Rel(RHD)/ThioMac-LPS-Expression(GSE23622)/Homer          1      2
2                                      RELA/MA0107.1/Jaspar          2      2
3                                 MF0003.1_REL_class/Jaspar          3      2
4        NFkB-p65(RHD)/GM12787-p65-ChIP-Seq(GSE19485)/Homer          4      1
5                                       REL/MA0101.1/Jaspar          5      2
6                                     NFKB2/MA0778.1/Jaspar          6      1
7                                    PB0012.1_Elf3_1/Jaspar          7      4
8                                    NFATC1/MA0624.1/Jaspar          8      5
9                                    NFATC3/MA0625.1/Jaspar          9      5
10                                    NFKB1/MA0105.4/Jaspar         10      1
# ... with 5 more variables: orientation <chr>, score <dbl>, motif <chr>,
#   alignment1 <chr>, alignment2 <chr>

Known motif target genes:

head(split(h$known_motif_peaks$gene_name, h$known_motif_peaks$motif), 3)

$known1
 [1] "MAP3K8" "CFB"    "CSF3"   "CXCL8"  "CXCL6"  "CXCL1"  "CXCL5"  "CXCL3" 
 [9] "CXCL2"  "NR4A2"  "ELF3"   "PID1"  

$known10
 [1] "RERG"            "SPECC1L-ADORA2A" "IER3"            "CFB"            
 [5] "SLC11A2"         "NR4A1"           "IL23A"           "MT1L"           
 [9] "CXCL8"           "CXCL1"           "CXCL3"           "CXCL2"          
[13] "FLVCR2"          "STEAP1"          "SERPINA9"        "AVPI1"          
[17] "GPR183"          "MMP3"            "PTGS2"           "ELF3"           
[21] "HSD11B1"         "CCL20"          

$known11
 [1] "CSF3"     "PIM2"     "MT1X"     "GAB2"     "SERPINA9" "IGF1"    
 [7] "IL1B"     "TNFAIP6"  "STAT4"    "ELF3"     "ACKR3"

Known transcription factors:

head(unique(h$known_motif_peaks[,c("motif", "best_guess")]), 3)

# A tibble: 3 × 2
   motif                                               best_guess
   <chr>                                                    <chr>
1 known1 NFkB-p65-Rel(RHD)/ThioMac-LPS-Expression(GSE23622)/Homer
2 known2       NFkB-p65(RHD)/GM12787-p65-ChIP-Seq(GSE19485)/Homer
3 known3                             TATA-Box(TBP)/Promoter/Homer

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository.

Related work

https://github.com/MalteThodberg/homeR

homerkit's People

Contributors

Stargazers

Watchers

Forkers

doaneas tools-jusue404 tliu76

homerkit's Issues

Understand the R object create with read_homer_output

Hi,

First of all thanks for this Homer implementation, it's a great work. I was wondering the differences between the two table called :

known motifs Peaks
known motifs table

I means that I don't understand the differences between the "best_guest" of the known motifs peaks and the "motif_name" of the known motifs table? Can you explain me this two different output?
If I'm not clear, don't hesitate to ask me more information.

Thanks In advance.

Issue with read_homer_output()

Hi,

I'm having an issue where I get the following error when running read_homer_output():

Found 28 motif.tsv[.gz] files in pDecrease_output/homerResults
Reading 28 of them...
Found 28 motif*.info.html files in pDecrease_output/homerResults
Reading 28 of them...
The following named parsers don't match the column names:
# of Target Sequences with Motif(of 60), # of Background Sequences with Motif(of 490)

Not sure what's going on - it looks like those column names should be commented out? Thanks!

read_homer_output not reading files

I am passing the path to the folder from the current working directory and the command "read_homer_output()" is not reading any motif tsv files, although there are such files. Should the folder name with paths should be passed differently?

h <- read_homer_output("Question2_Homer/q2_group1_translated_genes_MPvsLSK-f7vstot_both_up_g")
h
list()
names(h)
NULL
list.files("Question2_Homer/q2_group1_translated_genes_MPvsLSK-f7vstot_both_up_g/homerResults/", pattern = ".motif")
[1] "motif1.motif" "motif1.motif.tsv" "motif10.motif" "motif10.motif.tsv"
[5] "motif10.similar1.motif" "motif10.similar1.motif.tsv" "motif10RV.motif" "motif10RV.motif.tsv"
[9] "motif11.motif" "motif11.motif.tsv" "motif11RV.motif" "motif11RV.motif.tsv"
[13] "motif12.motif" "motif12.motif.tsv" "motif12RV.motif" "motif12RV.motif.tsv"
[17] "motif13.motif" "motif13.motif.tsv" "motif13RV.motif" "motif13RV.motif.tsv"
[21] "motif14.motif" "motif14.motif.tsv" "motif14RV.motif" "motif14RV.motif.tsv"
[25] "motif15.motif" "motif15.motif.tsv" "motif15RV.motif" "motif15RV.motif.tsv"
[29] "motif1RV.motif" "motif1RV.motif.tsv" "motif2.motif" "motif2.motif.tsv"
[33] "motif2RV.motif" "motif2RV.motif.tsv" "motif3.motif" "motif3.motif.tsv"
[37] "motif3.similar1.motif" "motif3.similar1.motif.tsv" "motif3RV.motif" "motif3RV.motif.tsv"
[41] "motif4.motif" "motif4.motif.tsv" "motif4RV.motif" "motif4RV.motif.tsv"
[45] "motif5.motif" "motif5.motif.tsv" "motif5RV.motif" "motif5RV.motif.tsv"
[49] "motif6.motif" "motif6.motif.tsv" "motif6.similar1.motif" "motif6.similar1.motif.tsv"
[53] "motif6RV.motif" "motif6RV.motif.tsv" "motif7.motif" "motif7.motif.tsv"
[57] "motif7RV.motif" "motif7RV.motif.tsv" "motif8.motif" "motif8.motif.tsv"
[61] "motif8RV.motif" "motif8RV.motif.tsv" "motif9.motif" "motif9.motif.tsv"
[65] "motif9RV.motif" "motif9RV.motif.tsv"

[Question] Parsing Motif Names

First off, thanks for this! Your packages are always really helpful and simple to get working.

This isn't so much a question about your package, but its downstream usage. Do you have a preferred way for parsing the motif_name assigned by HOMER to convert back to a gene name? The leading gene name seems to inconsistently match to a conventional name. Just regex parsing the first chunk of the string, you often end up with things like "Stat3+il21" or "AP-2alpha" which aren't really easily batch converted to standard ensembl gene symbols.

I'd like to check to see if any of the enriched motifs assigned to specific factors correspond with any expression change in those factors in my corresponding RNA-seq dataset, which would require intersection of the output tables. Do you know if there exists a simpler way to achieve this? Thanks!