johnmyleswhite / ml_for_hackers Goto Github PK

View Code? Open in Web Editor NEW

3.7K 478.0 2.2K 118.29 MB

Code accompanying the book "Machine Learning for Hackers"

Home Page: http://shop.oreilly.com/product/0636920018483.do

R 99.91% CSS 0.09%

ml_for_hackers's People

Contributors

Stargazers

Watchers

Forkers

jcdny chengjun cjbayesian abramsm biomunky jfung zhanglin1988 sunchengjie arunenigma kgpai josephmisiti bjzu bigfei stewartwatts kevinakwok infynyxx arun-rama quake0day guosongchen jiangfeng1124 joskid guangchuangyu pthinker wangmang rahuldave tempredirect moutai danielmc999 arielevnine lgatto wilsonfreitas sdqali tolleiv rthornton fzhang612 buruzaemon truncs irwenqiang kishoreyekkanti dkullmann xydinesh jun9 richallensf saolsen pluraldj bhattisatish albertoconti tarunanand banzaiman foxet cjdd3b absolutelynowarranty ryaninhust souri johncoogan freestatman nethi xgess tusheet kdiverson andrewlarkin vgoklani bkjackson mohala562 pcp135 dasfaha samarthbhaskar sleberknight tjennings mozacoval rkegg ptwoms travis-sun springcoil cswanghan null-exception kenergy ddclark ceekr sun-li iterion revskill10 falconzyx shidh thomascabrol robertsnapp linuxster hannic seanxsheng richarddunks jamesbconner davisdaddy vkuznet kjazz15 edmore ballacky13 ramcsingh odinlin shihongzhi edgester

ml_for_hackers's Issues

glmnet() wants a 2 column matrix

In code snippet #20 in chapter06.R, glmnet() wants a matrix with 2 or more columns but throws an error since the matrix itself is only one column. Thought about wrangling it into a two column matrix but that might not be inline with the original intent of that snippet.

Getting more done in GitHub with ZenHub

Hola! @fpcMotif has created a ZenHub account for the johnmyleswhite organization. ZenHub is the only project management tool integrated natively in GitHub – created specifically for fast-moving, software-driven teams.

How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

Real-time, customizable task boards for GitHub issues;
Multi-Repository burndown charts, estimates, and velocity tracking based on GitHub Milestones;
Personal to-do lists and task prioritization;
Time-saving shortcuts – like a quick repo switcher, a “Move issue” button, and much more.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @fpcMotif.

data download

How can i get the data used in this book?

Ch. 11 Social Network Data

Since the Google Social Graph API is not longer available, can you recommend any other currently active sources of data for 'practicing' the network analysis methods discussed in Chapter 11?

Thanks.

Chapter 11 Post-Google Social Graph API

Google's social graph API no longer exists. I love this book and have used it as a template to start many other projects.

I need to make social graphs now and having skimmed this chapter I'm not sure if it's usable given the decommissioning of Google's API, as well as the changes to Twitter.

I'm currently trying to get started based on some other random social graph tut's online. I'm wondering if someone who is more familiar with this chapter can let me know if it is still worth reading completely and adapting the code to work around these changes or if that would be a waste of time?

Thanks!

Chapter 3 Only classifying data as ham and not spam


library(tm)
library(ggplot2)

#tm is the text mining package of R
#ggplot is for visualization
#there are 2 sets of files for each type of mail and one will be used for training while other will be for testing

spam.path<-"data/spam/"
spam2.path<-"data/spam_2/"
easyham.path<-"data/easy_ham/"
easyham2.path<-"data/easy_ham_2/"
hardham.path<-"data/hard_ham//"
hardham2.path<-"data/hard_ham_2/"

get.msg<-function(path){
  print(path)
  connection<-file(path,open="rt", encoding="Latin1")
  
  text<-readLines(connection)
  #the message begins after a full line break
   
  t<-which(text=="")[1]+1
  print(length(text))
  print(t)
  msg<-text[seq(t, length(text))]
 #print(msg) 
 
  close(connection)
  return (paste(msg, collapse="\n"))
  
}

#tdm=term document matrix

get.tdm<-function(doc.vec){
  doc.corpus<-Corpus(VectorSource(doc.vec))
  control<-list(stopwords=TRUE, removePunctuation=TRUE, removeNumbers=TRUE, minDocFreq=2)
  doc.dtm<-TermDocumentMatrix(doc.corpus, control)
  return (doc.dtm)
  
}



# create a vector of emails
#use apply function

spam.docs<-dir(spam.path)
#this returns a list of file names in the directory
spam.docs<-spam.docs[seq(1,length(spam.docs)-1)]
#spam.docs<-spam.docs[which(spam.docs!="")]
#cmds file is a UNIX file which we dont need
#spam.docs<-spam.docs[!startsWith(spam.docs, "cmds")]

all.spam<-sapply(spam.docs, function(p) get.msg(paste(spam.path,p, sep="")))

spam.tdm<-get.tdm(all.spam)

#use the command below for inspection
#head(all.spam)
#z<-TermDocumentMatrix(Corpus(VectorSource(all.spam)), list(stopwords=TRUE, removeNumbers=TRUE, removePunctuation=TRUE, minDocFreq=2))

spam.matrix<- as.matrix(spam.tdm)
spam.counts<-rowSums(spam.matrix)
spam.df<-data.frame(cbind(names(spam.counts), as.numeric(spam.counts)), stringAsFactors=FALSE)
names(spam.df)<-c("term", "frequency")
spam.df$frequency<-as.numeric(spam.df$frequency)
spam.occurence<-sapply(1:nrow(spam.matrix)
                       , function(i){
                          length(which(spam.matrix[i,]>0))/ncol(spam.matrix)
                       })
spam.density<-spam.df$frequency/sum(spam.df$frequency)
spam.df<-transform(spam.df, density=spam.density, occurence=spam.occurence)

head(spam.df[with(spam.df,order(-occurence)), ])
#constructuon of Ham dataset













easy_ham.docs<-dir(easyham.path)
#this returns a list of file names in the directory
easy_ham.docs<-easy_ham.docs[seq(1,500)]
#spam.docs<-spam.docs[which(spam.docs!="")]
#cmds file is a UNIX file which we dont need
#spam.docs<-spam.docs[!startsWith(spam.docs, "cmds")]

all.easy_ham<-sapply(easy_ham.docs, function(p) get.msg(paste(easyham.path,p, sep="")))

easy_ham.tdm<-get.tdm(all.easy_ham)


#use the command below for inspection
#head(all.spam)
#z<-TermDocumentMatrix(Corpus(VectorSource(all.spam)), list(stopwords=TRUE, removeNumbers=TRUE, removePunctuation=TRUE, minDocFreq=2))

easy_ham.matrix<- as.matrix(easy_ham.tdm)
easy_ham.counts<-rowSums(easy_ham.matrix)
easy_ham.df<-data.frame(cbind(names(easy_ham.counts), as.numeric(easy_ham.counts)), stringAsFactors=FALSE)
names(easy_ham.df)<-c("term", "frequency")
easy_ham.df$frequency<-as.numeric(easy_ham.df$frequency)
easy_ham.occurence<-sapply(1:nrow(easy_ham.matrix)
                       , function(i){
                         length(which(easy_ham.matrix[i,]>0))/ncol(easy_ham.matrix)
                       })
easy_ham.density<-easy_ham.df$frequency/sum(easy_ham.df$frequency)
easy_ham.df<-transform(easy_ham.df, density=easy_ham.density, occurence=easy_ham.occurence)
easy_ham.df$NA.<-NULL
head(easy_ham.df[with(easy_ham.df,order(-occurence)), ])


#Classification function

classify.email<-function(path, training.df, prior=0.5, c=1e-6){
  msg<-get.msg(path)
  msg.tdm<-get.tdm(msg)
  msg.freq<-rowSums(as.matrix(msg.tdm))
  #Find intersection of words
  msg.match<-intersect(names(msg.freq), training.df$term)
  if(length(msg.match)<1){
    return (prior*c^(length(msg.freq)))
    
  }
  else{
    match.probs<-training.df$occurence[match(msg.match, training.df$term)]
    return (prior*prod(match.probs) * c^(length(msg.freq)-length(msg.match)))
  }
}





hardham.docs<-dir(hardham.path)
hardham.docs<-hardham.docs[seq(1:length(hardham.docs))]

hardham.spamtest<-sapply(hardham.docs, function(p) classify.email(paste(hardham.path,p, sep=""), 
                                                                  training.df = easy_ham.df))

hardham.hamtest<-sapply(hardham.docs, function(p) classify.email(paste(hardham.path, p, sep=""), training.df = easy_ham.df))

hardham.res<-ifelse(hardham.spamtest>hardham.hamtest, TRUE, FALSE)
summary(hardham.res)

This code only returns false for all values

arm package on Mac

Are there any known issues in using the arm package on Mac?

On running package_installer.R, there were errors in compiling BRugs, which isn't a direct dependency of the arm package, but is suggested for R2WinBUGS, which is.

Are there issues in running any of the book's code on Mac, if it's depending on OpenBUGS and WinBUGS, which seem to be Windows-specific libraries? Or are these libraries never used?

Chapter 3 - melt function requires reshape library

I added library(reshape) before running melt.

library(reshape)
from.weight <- melt(with(priority.train, table(From.EMail)), 
                    value.name="Freq")

Then melt fn worked.

Hacker work

classify.email() typo? (Chapter 3)

On lines 123 and 128 of the code from Chapter 3 you have a constant, c, being exponentiation.

I can't follow the logic behind this and I see in the user-contributed unconfirmed errata (http://www.oreilly.com/catalog/errataunconfirmed.csp?isbn=0636920018483) there's an entry suggesting that the ^ operator should be replaced by the * operator.

Could you confirm if this is accurate?

Thanks

Ch06 Text Regularization

In chapter 6 I am at the part where I am supposed to be executing

dtm <- DocumentTermMatrix(corpus)

However it fails out with the following error:

Error in UseMethod("meta", x) : 
  no applicable method for 'meta' applied to an object of class "try-error"
In addition: Warning message:
In mclapply(unname(content(x)), termFreq, control) :
  all scheduled cores encountered errors in user code

StackOverflow suggested installing SnowballC and also trying
corpus <- tm_map(corpus, content_transformer(tolower), lazy = TRUE)

Neither of these solutions worked and I am thus flummoxed.

Here is my session info:

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] glmnet_1.9-8  Matrix_1.1-4  mgcv_1.8-4    nlme_3.1-118  plyr_1.8.1    ggplot2_1.0.0 tm_0.6             NLP_0.1-5    

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.8     grid_3.1.2       gtable_0.1.2     labeling_0.3     lattice_0.20-29      MASS_7.3-35      munsell_0.4.2    parallel_3.1.2   proto_0.3-10    
[11] Rcpp_0.11.3      reshape2_1.4.1   scales_0.2.4     slam_0.1-32      stringr_0.6.2    tools_3.1.2

ggplot 0.9.x warnings and errors

For every ggplot() function call the "legend = FALSE" setting/parameter needs to be changed to guide="none"

Also when plotting figure 9-4 in chapter 9 the latest ggplot now requires that library(scales) be loaded and it means that the scale_size() needs to be changed from:
scale_size(to=c(2,2))
to
scale_size(range=c(2,2))

Chapter 3: error occurs when reading spam.

the code are running from Rstudio with R 3.3.0 under osx 10.11

issue 1:

line 48 in email_classify.R:

geom_hline(aes(yintercept = c(10,30)), linetype = 2)

yintercept need to be put outside aes function , like this :

geom_hline(yintercept = c(10,30), linetype = 2)

issue 2:

error occurs when reading msg by sapply at line 139-140 ..

all.spam <- sapply(spam.docs,
function(p) get.msg(file.path(spam.path, p)))

here is the traceback

Error in seq.default(which(text == "")[1] + 1, length(text), 1) :
'from' cannot be NA, NaN or infinite
7 stop("'from' cannot be NA, NaN or infinite")
6 seq.default(which(text == "")[1] + 1, length(text), 1)
5 seq(which(text == "")[1] + 1, length(text), 1)
4 get.msg(file.path(spam.path, p))
3 FUN(X[[i]], ...)
2 lapply(X = X, FUN = FUN, ...)
1 sapply(spam.docs, function(p) get.msg(file.path(spam.path, p)))

seems some file does not have a blank line

Can you give me some ml acc from Philippines

Chapter 3 motivating plot

`> spam.path <- file.path("C:\03-Classification\data", "spam")

spam2.path <- file.path("C:\03-Classification\data", "spam_2")
easyham.path <- file.path("C:\03-Classification\data", "easy_ham")
easyham2.path <- file.path("C:\03-Classification\data", "easy_ham_2")
hardham.path <- file.path("C:\03-Classification\data", "hard_ham")
hardham2.path <- file.path("C:\03-Classification\data", "hard_ham_2")
x <- runif(1000, 0, 40)
y1 <- cbind(runif(100, 0, 10), 1)
y2 <- cbind(runif(800, 10, 30), 2)
y3 <- cbind(runif(100, 30, 40), 1)
val <- data.frame(cbind(x, rbind(y1, y2, y3)),

```
              stringsAsFactors = TRUE)
```

ex1 <- ggplot(val, aes(x, V2)) +

geom_jitter(aes(shape = as.factor(V3)),

              position = position_jitter(height = 2)) +

scale_shape_discrete(guide = "none", solid = FALSE) +
geom_hline(aes(yintercept = c(10,30)), linetype = 2) +
theme_bw() +
xlab("X") +
ylab("Y")

ggsave(plot = ex1,

   filename = file.path("C:\\03-Classification\\images", "00_Ex1.pdf"),

```
   height = 10,
```
```
   width = 10)
```

Error in grDevices::pdf(..., version = version) :
cannot open file 'C:\03-Classification\images/00_Ex1.pdf'

ggsave(plot = ex1,

   filename = file.path("C:\\03-Classification\\images\\00_Ex1.pdf"),

```
   height = 10,
```
```
   width = 10)
```

Error: Aesthetics must be either length 1 or the same as the data (1000): yintercept

getwd()
[1] "C:/Users/mm/Documents"

why I can't get the same result with summary(hardham.res)

when I repeat the code ,I just get the result as follows:
...
hardham.res <- ifelse(hardham.spamtest > hardham.hamtest,
TRUE,
FALSE)
summary(hardham.res)
...
the result is ：
Mode FALSE TRUE NA's
logical 243 6 0

I also try：
hardham.res <- ifelse(hardham.spamtest == hardham.hamtest,
TRUE,
FALSE)
the result is:
Mode FALSE TRUE NA's
logical 21 228 0

that means most of the results is equal .

so i double if it's the floating overflow fault. then I change the classify.email function as below:
classify.email <- function(path, training.df, prior = 0.5, c = 1e-6)
{

Here, we use many of the support functions to get the

email text data in a workable format

msg <- get.msg(path)
msg.tdm <- get.tdm(msg)
msg.freq <- rowSums(as.matrix(msg.tdm))

Find intersections of words

msg.match <- intersect(names(msg.freq), training.df$term)

Now, we just perform the naive Bayes calculation

if(length(msg.match) < 1)
{
return((log10(prior)+length(msg.freq)_log10(c))) # return(prior * c ^ (length(msg.freq)))
}
else
{
match.probs <- training.df$occurrence[match(msg.match, training.df$term)]
return((log10(prior)+sum(log10(match.probs)) + (length(msg.freq) - length(msg.match))_log10(c))) # return(prior * prod(match.probs) * c ^ (length(msg.freq) - length(msg.match)))
}
}

this time I get the result:

hardham.res <- ifelse(hardham.spamtest > hardham.hamtest,

TRUE,

FALSE)
summary(hardham.res)
Mode FALSE TRUE NA's
logical 80 169 0

my god the conclusion is just error.
who has encounter the same problem ?
where have I make the mistake？

hi

Hi my name nii

Chapter 12 libraries

The SVM examples require you to load library('e1071') but this library is not in the list of packages in chapter 1 table 1-2. I only noticed as I'd installed the packages by hand from that table rather than using the install script provided.

Alos, chapter 12 uses the melt() function but library('reshape') isn't mentioned in either that chapter text nor is it loaded at the head of the chapter12.R script.

How to solve this problem for following function,please!!

I have problem in following code:

get.msg <- function(path)
{
con <- file(path, open = "rt", encoding = "latin1")
text <- readLines(con)

The message always begins after the first full line break

msg <- text[seq(which(text == "")[1] + 1, length(text), 1)]
close(con)
return(paste(msg, collapse = "\n"))
}

How can i do , please some body help me!!

fast_check.R failing

I'm new to R, so I do not have a great ability to debug issues yet. After setting up the R environment on Xubuntu and OSX, I keep running into the same issues when running fast_check.R as well as the script for the first chapter.

Checking Chapter 1 - Introduction

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Error in strsplit(unitspec, " ") : non-character argument
Calls: source ... fullseq.Date -> seq -> floor_date -> parse_unit_spec -> strsplit
In addition: Warning message:
Removed 1 rows containing non-finite values (stat_bin). 
Execution halted

Here's my R version:
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"

Is there a preferred environment/version for running the sample code, or am I really missing something?

Chapter 3: Contents of spam.df don't match output in book

In Chapter 3 we construct a spam filter based on the data in the folder:

ML_for_Hackers/03-Classification/data/spam

In the book, the terms in these emails are ordered by occurrence with the command below. The book lists the following table with html at the top:

head(spam.df[with(spam.df, order(-occurrence)),])

	term	frequency	density	occurrence
2122	html	377	0.005665595	0.338
538	body	324	0.004869105	0.298
4313	table	1182	0.017763217	0.284
1435	email	661	0.009933576	0.262
1736	font	867	0.013029365	0.262
1942	head	254	0.003817138	0.246

When running the code directly, this does not match the output I get with email at the top:

	term	frequency	density	occurrence
7781	email	813	0.005853680	0.566
18809	please	425	0.003060042	0.508
14720	list	409	0.002944840	0.444
27309	will	828	0.005961681	0.422
3060	body	379	0.002728837	0.408
9457	free	539	0.003880853	0.390

This seems to be explained by the way the document vectors are processed with the removePunctuation setting. This punctuation is removed and any terms which were separated would now be a new term. For example, becomes htmlhead. The result is that instead of html being listed as a common term in many of the emails, we have lots of low frequency combination of html with other HTML tag keywords.

no hard_ham_2 from spamassassin?

Hi! Looking at @drewconway's email_classify.R, I follow the reference to http://spamassassin.apache.org/publiccorpus/ to find the data, but there doesn't seem to be a hard_ham_2 anywhere. Is there hard_ham_2?

Chapter 3 - Error executing get.msg()

Hello guys,

Great book :-)
Right now, I am in the 3rd chapter (e-mail classification).
I am executing the R commands one by one andi am having a problem getting the list of spam documents (page 81).
The command is : all.spam <- sapply(spam.docs, function(p) get.msg(paste(spam.path,p,sep="")))

and the error i get is
Error in seq.default(which(text == "")[1] + 1, length(text), 1) :
invalid (to - from)/by in seq(.)

Any clue?
Thank you very much

Chapter 3: error executing ggsave to create final plot of results

The following command in email_classify.R fails

ggsave(plot = class.plot,
filename = file.path("images", "03_final_classification.pdf"),
height = 10,
width = 10)

Throws this error:
Error in seq.default(min, max, by = by) :
invalid (to - from)/by in seq(.)

This is using R version 2.15.0 and OSX 10.7.3

filter out malformed date data of ufo

It's about the first chapter that you use string length of 8 to deal with malformed date data. After using string length to filter out malformed data, I found "19940000" in DateOccurred and it will be transformed to "NA" by using "ufo$DateOccurred<-as.Date(ufo$DateOccurred, format="%Y%m%d")" after converting date strings. Isn't it also malformed data? And I also found that the way R read the input has an error: like the line 756:

19950704 19950706 Orlando, FL 4-5 min I would like toreport three yellow oval lights which passed over Orlando,Florida on July 4, 1995 at aproximately 21:30 (9:30 pm). These were the sizeof Venus (which they passed close by). Two of them traveled one after the otherat exactly the same speed and path heading south-southeast. The third oneappeared about a minute later following the same path as the other two. Thewhole sighting lasted about 4-5 minute. There were 4 other witnesses oldenough to report the sighting. My 4 year old and 5 year old children were theones who called my attention to the "moving stars". These objects moved fasterthan an airplane and did not resemble anaircraft, and were moving much slowerthan a shooting star. As for them being fireworks, their path was too regularand coordinated. If anybody else saw this phenomenon, please contact me at: [email protected]

After reading in by the function in the book:

> ufo <- read.delim(file.path("data", "ufo", "ufo_awesome.tsv"),
+                   sep = "\t",
+                   stringsAsFactors = FALSE,
+                   header = FALSE,
+                   na.strings = "")

it's separated into two lines:

> ufo[756,]
                      V1   V2   V3   V4   V5   V6
756 [email protected] <NA> <NA> <NA> <NA> <NA>
> ufo[755,]
          V1       V2           V3   V4      V5
755 19950704 19950706  Orlando, FL <NA> 4-5 min
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   V6
755 I would like to report three yellow oval lights which passed over Orlando,Florida on July 4,     1995 at aproximately 21:30 (9:30 pm). These were the sizeof Venus (which they passed close by).     Two of them traveled one after the otherat exactly the same speed and path heading south-    southeast. The third oneappeared about a minute later following the same path as the other two.     Thewhole sighting lasted about 4-5 minutes. There were 4 other witnesses oldenough to report the     sighting. My 4 year old and 5 year old children were theones who called my attention to the     &quot;moving stars&quot;. These objects moved fasterthan an airplane and did not resemble an     aircraft, and were moving much slowerthan a shooting star. As for them being fireworks, their path     was too regularand coordinated. If anybody else saw this phenomenon, please contact me at:

Converting R -> Python

Hello,

Now that ggplot for Python has been around for a while (a few months anyways) - I am personally, for fun, converting the R examples into Python (via IPython Notebooks) using the expected libs: numpy, scipy, pandas, ggplot, statsmodels, etc. (maybe a few others).

My questions are the following:

Is anyone is interested in helping?
Are John/Drew interested in merging this into the master branch?

Depending on the answer to (2), I will try to document my code accordingly. I'm currently done with Chapters 1+2 and 1/2 of 3. I suspect the rest of the code might take me another two weeks if I am doing it by myself.

Thanks,

Joe Misiti
@josephmisiti

[Introduction:ufo] Strsplit

In the following sentence, strsplit won't feedback a error when no comma in ufo$Location.
As a result, we cannot extract the "'City, State'" from "City" by the trycatch-strsplit method.

split.location <- tryCatch(strsplit(l, ",")[[1]],
error = function(e) return(c(NA, NA)))

Suggest to revised to:

get.location<-function(l)
{
split.location<-strsplit(l,",")[[1]]
clean.location <- gsub("^ ","",split.location)
if(length(clean.location)!=2)
{
return(c(NA,NA))
}
else
{
return(clean.location)
}
}

Weak

It is really weak that the repository does not have the source code that is the one talked about in the book. Instead of actually learning when working through examples, I have to sit down and search your repository for definitions that have changed (e.g. 'abb.state' in Chapter 1). Just mindnumbingly weak.

digest package on OSX

While trying to install the necessary packages using source('package_installer.R') I run into these errors,

Error in library.dynam(lib, package, package.lib) : 
  shared object ‘digest.so’ not found
ERROR: lazy loading failed for package ‘memoise’
* removing ‘/usr/local/Cellar/r/2.14.1/R.framework/Versions/2.14/Resources/library/memoise’
ERROR: dependency ‘memoise’ is not available for package ‘ggplot2’
* removing ‘/usr/local/Cellar/r/2.14.1/R.framework/Versions/2.14/Resources/library/ggplot2’

Is there an issue in my R install, that may be stopping the digest package dependency from installing correctly?