Giter VIP home page Giter VIP logo

homerkit's Introduction

homerkit

homerkit is an R package that implements functions to read HOMER output files.

Installation

How to install HOMER: http://homer.salk.edu/homer/download.html

install.packages("devtools")
devtools::install_github("slowkow/homerkit")

Usage

1. Run HOMER findMotifs.pl on your target genes

head -n3 target_genes.txt
ENSG00000003989
ENSG00000017427
ENSG00000028277
gene_file="target_genes.txt"
bg_file="background_genes.txt"
out_dir="output"

mkdir -p $out_dir
# Find motifs that are enriched in the promoters of your target genes.
findMotifs.pl $gene_file human $out_dir \
  -bg $bg_file &> ${out_dir}/run_homer.log

2. Run HOMER annotatePeaks.pl on every motif

# Find the target genes for each motif.
for motif in $out_dir/*/*.motif; do
  if [[ ! -f ${motif}.tsv ]]
  then
    annotatePeaks.pl tss hg38 \
      -size -500,250 -m $motif -list $gene_file \
      1> ${motif}.tsv 2> ${motif}.tsv.log
  fi
done

3. Read all of the HOMER output files with homerkit

# install.packages("devtools")
# devtools::install_github("slowkow/homerkit")

library(homerkit)
h <- read_homer_output("output")

Novel motif target genes:

head(split(h$novel_motif_peaks$gene_name, h$novel_motif_peaks$motif), 3)
$motif1
[1] "RERG"  "CSF3"  "CXCL6" "CXCL1" "CXCL5" "CXCL3" "CXCL2" "CSF2"  "ELF3" 

$motif10
[1] "IER3"  "MT1X"  "MMP3"  "CCL20"

$motif11
 [1] "IL6"    "CCL7"   "CXCL6"  "CXCL1"  "CXCL5"  "CXCL3"  "CXCL2"  "GPR183"
 [9] "NR4A2"  "PLD1" 

Possible transcription factors that match motif1:

subset(h$novel_motif_tfs, motif == "motif1")
# A tibble: 10 × 8
                                                 match_name match_rank offset
                                                      <chr>      <dbl>  <dbl>
1  NFkB-p65-Rel(RHD)/ThioMac-LPS-Expression(GSE23622)/Homer          1      2
2                                      RELA/MA0107.1/Jaspar          2      2
3                                 MF0003.1_REL_class/Jaspar          3      2
4        NFkB-p65(RHD)/GM12787-p65-ChIP-Seq(GSE19485)/Homer          4      1
5                                       REL/MA0101.1/Jaspar          5      2
6                                     NFKB2/MA0778.1/Jaspar          6      1
7                                    PB0012.1_Elf3_1/Jaspar          7      4
8                                    NFATC1/MA0624.1/Jaspar          8      5
9                                    NFATC3/MA0625.1/Jaspar          9      5
10                                    NFKB1/MA0105.4/Jaspar         10      1
# ... with 5 more variables: orientation <chr>, score <dbl>, motif <chr>,
#   alignment1 <chr>, alignment2 <chr>

Known motif target genes:

head(split(h$known_motif_peaks$gene_name, h$known_motif_peaks$motif), 3)
$known1
 [1] "MAP3K8" "CFB"    "CSF3"   "CXCL8"  "CXCL6"  "CXCL1"  "CXCL5"  "CXCL3" 
 [9] "CXCL2"  "NR4A2"  "ELF3"   "PID1"  

$known10
 [1] "RERG"            "SPECC1L-ADORA2A" "IER3"            "CFB"            
 [5] "SLC11A2"         "NR4A1"           "IL23A"           "MT1L"           
 [9] "CXCL8"           "CXCL1"           "CXCL3"           "CXCL2"          
[13] "FLVCR2"          "STEAP1"          "SERPINA9"        "AVPI1"          
[17] "GPR183"          "MMP3"            "PTGS2"           "ELF3"           
[21] "HSD11B1"         "CCL20"          

$known11
 [1] "CSF3"     "PIM2"     "MT1X"     "GAB2"     "SERPINA9" "IGF1"    
 [7] "IL1B"     "TNFAIP6"  "STAT4"    "ELF3"     "ACKR3"

Known transcription factors:

head(unique(h$known_motif_peaks[,c("motif", "best_guess")]), 3)
# A tibble: 3 × 2
   motif                                               best_guess
   <chr>                                                    <chr>
1 known1 NFkB-p65-Rel(RHD)/ThioMac-LPS-Expression(GSE23622)/Homer
2 known2       NFkB-p65(RHD)/GM12787-p65-ChIP-Seq(GSE19485)/Homer
3 known3                             TATA-Box(TBP)/Promoter/Homer

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository.

Related work

homerkit's People

Contributors

slowkow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

homerkit's Issues

Understand the R object create with read_homer_output

Hi,

First of all thanks for this Homer implementation, it's a great work. I was wondering the differences between the two table called :

  • known motifs Peaks
  • known motifs table

I means that I don't understand the differences between the "best_guest" of the known motifs peaks and the "motif_name" of the known motifs table? Can you explain me this two different output?
If I'm not clear, don't hesitate to ask me more information.

Thanks In advance.

Issue with read_homer_output()

Hi,

I'm having an issue where I get the following error when running read_homer_output():

Found 28 motif.tsv[.gz] files in pDecrease_output/homerResults
Reading 28 of them...
Found 28 motif*.info.html files in pDecrease_output/homerResults
Reading 28 of them...
The following named parsers don't match the column names:
# of Target Sequences with Motif(of 60), # of Background Sequences with Motif(of 490)

Not sure what's going on - it looks like those column names should be commented out? Thanks!

read_homer_output not reading files

I am passing the path to the folder from the current working directory and the command "read_homer_output()" is not reading any motif tsv files, although there are such files. Should the folder name with paths should be passed differently?

h <- read_homer_output("Question2_Homer/q2_group1_translated_genes_MPvsLSK-f7vstot_both_up_g")
h
list()
names(h)
NULL
list.files("Question2_Homer/q2_group1_translated_genes_MPvsLSK-f7vstot_both_up_g/homerResults/", pattern = ".motif")
[1] "motif1.motif" "motif1.motif.tsv" "motif10.motif" "motif10.motif.tsv"
[5] "motif10.similar1.motif" "motif10.similar1.motif.tsv" "motif10RV.motif" "motif10RV.motif.tsv"
[9] "motif11.motif" "motif11.motif.tsv" "motif11RV.motif" "motif11RV.motif.tsv"
[13] "motif12.motif" "motif12.motif.tsv" "motif12RV.motif" "motif12RV.motif.tsv"
[17] "motif13.motif" "motif13.motif.tsv" "motif13RV.motif" "motif13RV.motif.tsv"
[21] "motif14.motif" "motif14.motif.tsv" "motif14RV.motif" "motif14RV.motif.tsv"
[25] "motif15.motif" "motif15.motif.tsv" "motif15RV.motif" "motif15RV.motif.tsv"
[29] "motif1RV.motif" "motif1RV.motif.tsv" "motif2.motif" "motif2.motif.tsv"
[33] "motif2RV.motif" "motif2RV.motif.tsv" "motif3.motif" "motif3.motif.tsv"
[37] "motif3.similar1.motif" "motif3.similar1.motif.tsv" "motif3RV.motif" "motif3RV.motif.tsv"
[41] "motif4.motif" "motif4.motif.tsv" "motif4RV.motif" "motif4RV.motif.tsv"
[45] "motif5.motif" "motif5.motif.tsv" "motif5RV.motif" "motif5RV.motif.tsv"
[49] "motif6.motif" "motif6.motif.tsv" "motif6.similar1.motif" "motif6.similar1.motif.tsv"
[53] "motif6RV.motif" "motif6RV.motif.tsv" "motif7.motif" "motif7.motif.tsv"
[57] "motif7RV.motif" "motif7RV.motif.tsv" "motif8.motif" "motif8.motif.tsv"
[61] "motif8RV.motif" "motif8RV.motif.tsv" "motif9.motif" "motif9.motif.tsv"
[65] "motif9RV.motif" "motif9RV.motif.tsv"

[Question] Parsing Motif Names

First off, thanks for this! Your packages are always really helpful and simple to get working.

This isn't so much a question about your package, but its downstream usage. Do you have a preferred way for parsing the motif_name assigned by HOMER to convert back to a gene name? The leading gene name seems to inconsistently match to a conventional name. Just regex parsing the first chunk of the string, you often end up with things like "Stat3+il21" or "AP-2alpha" which aren't really easily batch converted to standard ensembl gene symbols.

I'd like to check to see if any of the enriched motifs assigned to specific factors correspond with any expression change in those factors in my corresponding RNA-seq dataset, which would require intersection of the output tables. Do you know if there exists a simpler way to achieve this? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.