renozao / nmf Goto Github PK

View Code? Open in Web Editor NEW

131.0 131.0 40.0 108.43 MB

NMF: A Flexible R package for Nonnegative Matrix Factorization

R 93.22% TeX 3.29% MATLAB 0.63% C++ 2.64% C 0.22%

nmf's People

Contributors

Stargazers

Watchers

nmf's Issues

NMF heatmaps error on plotting tracks

The most important part of NMF, visualization, described in the http://cran.r-project.org/web/packages/NMF/vignettes/heatmaps.pdf don't work.

Running the examples verbatim (with understanding what's going on):
install.packages("NMF")
library(NMF)

random data that follow an 3-rank NMF model (with quite some noise: sd=2)

X <- syntheticNMF(100, 3, 20, noise = 2)

row annotations and covariates

n <- nrow(X)
d <- rnorm(n)
e <- unlist(mapply(rep, c("X", "Y", "Z"), 10))
e <- c(e, rep(NA, n - length(e)))
rdata <- data.frame(Var = d, Type = e)

column annotations and covariates

p <- ncol(X)
a <- sample(c("alpha", "beta", "gamma"), p, replace = TRUE)

define covariates: true groups and some numeric variable

c <- rnorm(p)

gather them in a data.frame

covariates <- data.frame(a, X$pData, c)
res <- nmf(X, 3, nrun = 10)

coefmap from multiple run fit: includes a consensus track

coefmap(res)

generates

Error in process_tracks(x, tracks, annRow, annCol) :
Invalid special annotation track name ['basis', 'consensus:']. Should partially match one of 'basis', 'consensus:', ':basis', 'basis:'.

This is strange, because the error says annotation track name IS 'basis'. 'basismap' generates the same error. Obviously, coefmap(res, tracks=NA) works, but the tracks are the most important.

Usual sessionInfo()

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel graphics grDevices datasets utils stats methods base

other attached packages:
[1] doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2 NMF_0.20.5 Biobase_2.26.0 BiocGenerics_0.12.1
[7] cluster_2.0.1 rngtools_1.2.4 pkgmaker_0.22 registry_0.2 ggplot2_1.0.1 dplyr_0.4.1
[13] BiocInstaller_1.16.4

loaded via a namespace (and not attached):
[1] assertthat_0.1 codetools_0.2-11 colorspace_1.2-6 compiler_3.1.2 DBI_0.3.1 digest_0.6.8 grid_3.1.2
[8] gridBase_0.4-7 gtable_0.1.2 magrittr_1.5 MASS_7.3-40 munsell_0.4.2 plyr_1.8.2 proto_0.3-10
[15] RColorBrewer_1.1-2 Rcpp_0.11.6 reshape2_1.4.1 scales_0.2.4 stringi_0.4-1 stringr_1.0.0 tools_3.1.2
[22] xtable_1.7-4

labels seem to have gone missing

Should they be included by default?

Rdata

load(Rdata)
hm4 <- aheatmap(data.to.plot4, annLegend=F,annCol=pheno2[,c("MaxLVWT","EF","LVOT_GT30")],annRow=cna.gene.matrix[is.candidate.cna,c("chr"),drop=F],annColors=list(chr=chr.palette),cexRow=2,cexCol=2,main="Candidate Modifier - CNA",color=c("Inversion"="yellow","Loss"="dodgerblue1","Neutral"="grey97","Gain"="red"),border=list(annLeg=T,matrix=T,cell=list(lwd=0.6,col="white"),annRow=F,annCol=T),layout = '_*')

summary(NMF.models) does not return "cophenetic"

I hope this issue is not related to my previous post. I see, the summary(NMF.models) does not return "cophenetic" score;

I build my NMF object as following,
NMF.models <- nmf(m.2, seq(k.min, k.max + 1, k.incr), nrun = number_of_runs, method="brunet", seed=9876)

Also I have to say,
for the plot I get the following warning
plot(NMF.models)
Warning message:
In .local(x, y, ...) : NMFfit object has no residuals track

I tried to calculate the cophenetic score separately,

cophcor(NMF.models)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘cophcor’ for signature ‘"NMFfit"’

nmfEstimateRank:: Error in (function (...) : All the runs produced an error

All of the runs with different rank parameters converged but produced an error generating NMF objects. Could you please help to understand the error and any suggestion to resolve this issue.
Even though .option for parallel computation using multiple cores was not relevant here with single run, it may help running NMF with different ranks using multiple cores. Is there a way to kicked in parallel computation for different rank-runs.

System information:
NMF_0.20.5 ; system: x86_64, linux-gnu; "R version 3.1.2 (2014-10-31)"

library(NMF);
str(TrainDSN)
'data.frame': 8953732 obs. of 23 variables:
$ AC : int 0 0 0 0 0 0 0 0 0 0 ...
$ DE : int 0 0 1 0 0 0 0 0 0 0 ...
$ DI : int 0 0 0 0 0 0 0 0 0 0 ...
$ EN : int 0 0 0 0 0 0 0 0 0 0 ...
$ ENT : int 0 0 0 0 0 0 0 0 0 0 ...
res.estimate <- nmfEstimateRank(TrainDSN, seq(4,12), method='brunet', nrun=1, seed='nndsvd', .option="vP14")

NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 400/2000
DONE (converged at 420/2000 iterations)
NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 400/2000
DONE (converged at 440/2000 iterations)
NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 400/2000
DONE (converged at 420/2000 iterations)
NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 450/2000 Iterations: 500/2000 Iterations: 500/2000
DONE (converged at 520/2000 iterations)
NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 400/2000
DONE (converged at 420/2000 iterations)
NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 450/2000 Iterations: 450/2000
DONE (converged at 460/2000 iterations)
NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 400/2000
DONE (converged at 420/2000 iterations)
NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 400/2000
DONE (converged at 420/2000 iterations)
NMF algorithm: 'brunet'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 400/2000
DONE (converged at 420/2000 iterations)
Error in (function (...) : All the runs produced an error:
-#1 [r=4] -> (list) object cannot be coerced to type 'double' [in call to '.local']
-#2 [r=5] -> (list) object cannot be coerced to type 'double' [in call to '.local']
-#3 [r=6] -> (list) object cannot be coerced to type 'double' [in call to '.local']
-#4 [r=7] -> (list) object cannot be coerced to type 'double' [in call to '.local']
-#5 [r=8] -> (list) object cannot be coerced to type 'double' [in call to '.local']
-#6 [r=9] -> (list) object cannot be coerced to type 'double' [in call to '.local']
-#7 [r=10] -> (list) object cannot be coerced to type 'double' [in call to '.local']
-#8 [r=11] -> (list) object cannot be coerced to type 'double' [in call to '.local']
-#9 [r=12] -> (list) object cannot be coerced to type 'double' [in call to '.local']

installation procedure (Cygwin)

The installation procedure for the NMF R package fails on Cygwin (vers. 2.1.0).
The (first) error is:

preparing package for lazy loading
Error in if (parallel::detectCores() > 1) "par" else "seq" :
missing value where TRUE/FALSE needed
Error : unable to load R code in package ‘NMF’
ERROR: lazy loading failed for package ‘NMF’
removing ‘/usr/lib/R/site-library/NMF’

The problem is on the procedure to detect the number of CPU cores on the machine.
In the following files:
/NMF/R/options.R (line 81)
/NMF/R/parallel.R (line 19)
the function detectCores() is called with no arguments.
However, in order to take into account different kind of OSs, the function call should be:
detectCores(TRUE)
see ...
https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/detectCores.html
After this change, the installation on Cygwin works.

aheatmap fails if annRow or annCol is an ordered factor

I can pass an unordered factor to annRow or annCol, but if I pass an ordered factor, aheatmap fails with an error message because class(x) returns a vector c('ordered', 'factor') and the code in aheatmap.R expects a scalar value, not a vector.

A concise example:

library(NMF)
X <- syntheticNMF(100, 3, 20, noise = 2)
p <- ncol(X)
a <- sample(c("one", "two", "three"), p, replace = TRUE)
b <- as.factor(a)
c <- factor(a, levels = c("one", "two", "three"), ordered = TRUE)
aheatmap(X, annCol = a)  # This succeeds.
aheatmap(X, annCol = b)  # This succeeds.
aheatmap(X, annCol = c)  # This fails.

The error message is:

Error in Math.factor(c(1L, 3L), 0) : ‘round’ not meaningful for factors
In addition: Warning messages:
1: In if (class(annotation[[i]]) %in% c("character", "factor")) { :
the condition has length > 1 and only the first element will be used
2: In if (class(annotation[[i]]) %in% c("character", "factor")) { :
the condition has length > 1 and only the first element will be used

Suggested fix: replace if (class(annotation[[i]]) %in% c('character', 'factor')) with if (any(class(annotation[[i]]) %in% c('character', 'factor'))) in aheatmap.R.

R CMD check NOTEs

I see

checking R code for possible problems ... NOTE
devnmf: no visible global function definition for ‘load_all’
nmfReport: no visible global function definition for ‘knit2html’
posICA: no visible binding for global variable ‘fastICA’
posICA: no visible global function definition for ‘fastICA’
runit.lsnmf: no visible global function definition for ‘checkTrue’
setupLibPaths: no visible global function definition for ‘load_all’
test.match_atrack : .check: no visible global function definition for
  ‘checkEquals’

That usually implies you've either missed an imports, or you should explicitly refer to functions with the foo::bar() form

Resize row labels in heatmap

This is my code:


aheatmap(x = as.matrix(dat), scale = 'row', legend = F, fontsize = 12, cexRow = .9, cexCol = .9, main = 'Heatmap', filename = 'plot.pdf', width = 5, height = 10)

I am trying to make heatmaps of different row sizes, using the above code. When the number of rows is small (<100), the row labels are clear & big in size. But when you try to make a heatmap with many rows (>100), the row labels become really small & are almost illegible.

Also posted on: http://stackoverflow.com/questions/29678312/unable-to-resize-row-labels-in-aheatmap

border around plot

Is there a way to add a border around the main part of the heatmap. I'm plotting relatively sparse data with white as the neutral/null value. The result is that the plot looks like some floating blobs.

Happy to work using some of the low level code if that helps. I spent 3 years working on heatmap code with lattice that's stuck in another institute.

Display row labels on the left side

Since I do not need to calculate or show dendrograms, is it possible to move the row labels to the left side of a plot, instead of displaying them always on the right?

Thanks, and congrats for your hard work and your nice package.

how consensus matrix is calculated

How does the consensus matrix is calculated ?
I understand the part that NMF generates two matrix W and H of rank r.
But how are we calculate consensus matrix from W and H ?
Thanks !

predict returns empty factor or break on NA-filled basis/coef row/column

This is because whic.max returns an empty vector in such case: needs to return NA.

Temporary results folder

Hi,

I've been using NMF on server R (in shiny app), but don't have the user rights to create folders in an app directory (say: srv/shiny/nmf/NMF_08145320).

Is there any way to specify where to create the temporary results or better, not to create it at all? I tried setting parallel=FALSE, but no success?

My error message in a screenshot

Thanks for any suggestions!

Problem with bigger annotations in aheatmap

I was trying to generate annotated heatmaps using aheatmap function. My data is 2192 by 335 and annotations is 335 by 6. However, when I try to plot aheatmap by calling

aheatmap(mydata, Rowv=NA, Colv=hcd, annCol=myAtto)

I get a heatmap with trauncated attonations. I wanted six attrributes in annotation but heatmap does not display all annotations properly. For example, Fertile and Tissue.1 in the right bottomn corner are not properly displayed. What could be the issue here?

Since I have many annotations, is it possible to move some of the annotations' annLegend () to left of the figure. Waiting for your reply. Thanks!

Transpose operator does not swap fixed terms

Fixed terms in NMFstd models are not swapped by the default method t.NMF

installation from github fails

Hi @renozao,

Installation from github forces me to install pkgmaker from CRAN, which is out of date for OSX so NMF can't be installed. If I install pkgmaker from github and then install NMF from github to pull in the stringr fixes, the NMF install forces a reinstall of the outdated pkgmaker from CRAN. NMF seems like is broken using stringr 1.0, so I'm a little bit stuck. Do you have a suggestion for a workaround? Thanks!

annotation legend layout

Is there a way to change the layout for the annotation legend? I have quite a few types and categories (i.e. chromosome) and they run off the page. Enabling two columns would help.

New issue after pulling devel to fix #14

After the fix for the computation of the silhouette for large matrices, I receive an error when the parameter nrun is larger then one. From sessionInfo() I have: NMF_0.23 & pkgmaker_0.25.9

generate a synthetic dataset with known classes: 100000 samples

n <- 100000; counts <- c(3, 2, 3);
x <- syntheticNMF(n+100, counts,noise=TRUE)
indx <- rowSums(x)==0
x <- x[!indx,];x=x[1:n,]
estim <- nmf(x, 2:4, nrun = 2, seed = 123456,.opt = "vp")
Compute NMF rank= 2 ... + measures ...
ERROR
Compute NMF rank= 3 ... + measures ...
ERROR
Compute NMF rank= 4 ... + measures ...
ERROR
Error in (function (...) : All the runs produced an error:
-#1 [r=2] -> argument is not interpretable as logical [in call to 'if']
-#2 [r=3] -> argument is not interpretable as logical [in call to 'if']
-#3 [r=4] -> argument is not interpretable as logical [in call to 'if']

Note that this runs ok when nrun=1. Seems like the change of the parameter dmatrix from Locical to string ('silhouette') does not work well with the predict method for NMFfitX. It crashes on:

          if( dmatrix ){
            attr(res, 'dmatrix') <- 1 - consensus(object) 
          }

R CMD check failures with dev version of devtools

I can't see how this related to devtools, but I thought you should know about it.

checking package dependencies ... NOTE
Packages suggested but not available for checking:
  ‘RcppOctave’ ‘doMPI’ ‘Biobase’

checking R code for possible problems ... NOTE
.wrapResult: no visible global function definition for ‘exprs’

checking Rd cross-references ... NOTE
Packages unavailable to check Rd xrefs: ‘RcppOctave’, ‘Biobase’

checking data for non-ASCII characters ... NOTE
  Error in .requirePackage(package) : 
    unable to find required package 'Biobase'
  Calls: <Anonymous> ... .extendsForS3 -> extends -> getClassDef -> .requirePackage
  Execution halted

checking examples ... ERROR
Running examples in ‘NMF-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: nmfModel
> ### Title: Factory Methods NMF Models
> ### Aliases: nmfModel nmfModel,data.frame,data.frame-method
> ###   nmfModel,formula,ANY-method nmfModel,matrix,ANY-method
> ###   nmfModel,matrix,matrix-method nmfModel-methods
> ###   nmfModel,missing,ANY-method nmfModel,missing,missing-method
> ###   nmfModel,NULL,ANY-method nmfModel,numeric,matrix-method
> ###   nmfModel,numeric,missing-method nmfModel,numeric,numeric-method
> ###   nmfModels
> ### Keywords: methods
> 
> ### ** Examples
> 
> ## Don't show: 
> # roxygen generated flag
> options(R_CHECK_RUNNING_EXAMPLES_=TRUE)
> ## End(Don't show)
> 
> #----------
> # nmfModel,numeric,numeric-method
> #----------
> # data
> n <- 20; r <- 3; p <- 10
> V <- rmatrix(n, p) # some target matrix
> 
> # create a r-ranked NMF model with a given target dimensions n x p as a 2-length vector
> nmfModel(r, c(n,p)) # directly
<Object of class:NMFstd>
features: 20 
basis/rank: 3 
samples: 10 
> nmfModel(r, dim(V)) # or from an existing matrix <=> nmfModel(r, V)
<Object of class:NMFstd>
features: 20 
basis/rank: 3 
samples: 10 
> # or alternatively passing each dimension separately
> nmfModel(r, n, p)
<Object of class:NMFstd>
features: 20 
basis/rank: 3 
samples: 10 
> 
> # trying to create a NMF object based on incompatible matrices generates an error
> w <- rmatrix(n, r)
> h <- rmatrix(r+1, p)
> try( new('NMFstd', W=w, H=h) )
Error in validObject(.Object) : 
  invalid class “NMFstd” object: Dimensions of W and H are not compatible [ncol(W)= 3 != nrow(H)= 4 ]
> try( nmfModel(w, h) )
Error in .local(rank, target, ...) : 
  nmfModel - Invalid number of columns in the basis matrix [3]: it should match the number of rows in the mixture coefficient matrix [4]
> try( nmfModel(r+1, W=w, H=h) )
Error in .local(rank, target, ...) : 
  nmfModel - Objective rank [4] is greater than the number of columns in W [3]
> # The factory method can be force the model to match some target dimensions
> # but warnings are thrown
> nmfModel(r, W=w, H=h)
Warning in .local(rank, target, ...) :
  nmfModel - Objective rank [3] is lower than the number of rows in H [4]: only the first 3 rows of H  will be used
<Object of class:NMFstd>
features: 20 
basis/rank: 3 
samples: 10 
> nmfModel(r, n-1, W=w, H=h)
Warning in .local(rank, target, ...) :
  nmfModel - Number of rows in target is lower than the number of rows in W [20]: only the first 19 rows of W will be used
Warning in .local(rank, target, ...) :
  nmfModel - Objective rank [3] is lower than the number of rows in H [4]: only the first 3 rows of H  will be used
<Object of class:NMFstd>
features: 19 
basis/rank: 3 
samples: 10 
> 
> #----------
> # nmfModel,numeric,missing-method
> #----------
> ## Empty model of given rank
> nmfModel(3)
<Object of class:NMFstd>
features: 0 
basis/rank: 3 
samples: 0 
> 
> #----------
> # nmfModel,missing,ANY-method
> #----------
> nmfModel(target=10) #square
<Object of class:NMFstd>
features: 10 
basis/rank: 0 
samples: 10 
> nmfModel(target=c(10, 5))
<Object of class:NMFstd>
features: 10 
basis/rank: 0 
samples: 5 
> 
> #----------
> # nmfModel,missing,missing-method
> #----------
> # Build an empty NMF model
> nmfModel()
<Object of class:NMFstd>
features: 0 
basis/rank: 0 
samples: 0 
> 
> # create a NMF object based on one random matrix: the missing matrix is deduced
> # Note this only works when using factory method NMF
> n <- 50; r <- 3;
> w <- rmatrix(n, r)
> nmfModel(W=w)
<Object of class:NMFstd>
features: 50 
basis/rank: 3 
samples: 0 
> 
> # create a NMF object based on random (compatible) matrices
> p <- 20
> h <- rmatrix(r, p)
> nmfModel(H=h)
<Object of class:NMFstd>
features: 0 
basis/rank: 3 
samples: 20 
> 
> # specifies two compatible matrices
> nmfModel(W=w, H=h)
<Object of class:NMFstd>
features: 50 
basis/rank: 3 
samples: 20 
> # error if not compatible
> try( nmfModel(W=w, H=h[-1,]) )
Error in .local(rank, target, ...) : 
  nmfModel - Invalid number of columns in the basis matrix [3]: it should match the number of rows in the mixture coefficient matrix [2]
> 
> #----------
> # nmfModel,numeric,matrix-method
> #----------
> # create a r-ranked NMF model compatible with a given target matrix
> obj <- nmfModel(r, V)
> all(is.na(basis(obj)))
[1] TRUE
> 
> #----------
> # nmfModel,matrix,matrix-method
> #----------
> ## From two existing factors
> 
> # allows a convenient call without argument names
> w <- rmatrix(n, 3); h <- rmatrix(3, p)
> nmfModel(w, h)
<Object of class:NMFstd>
features: 50 
basis/rank: 3 
samples: 20 
> 
> # Specify the type of NMF model (e.g. 'NMFns' for non-smooth NMF)
> mod <- nmfModel(w, h, model='NMFns')
> mod
<Object of class:NMFns>
features: 50 
basis/rank: 3 
samples: 20 
theta: 0.5 
> 
> # One can use such an NMF model as a seed when fitting a target matrix with nmf()
> V <- rmatrix(mod)
> res <- nmf(V, mod)
> nmf.equal(res, nmf(V, mod))
[1] TRUE
> 
> # NB: when called only with such a seed, the rank and the NMF algorithm
> # are selected based on the input NMF model.
> # e.g. here rank was 3 and the algorithm "nsNMF" is used, because it is the default
> # algorithm to fit "NMFns" models (See ?nmf).
> 
> #----------
> # nmfModel,matrix,ANY-method
> #----------
> ## swapped arguments `rank` and `target`
> V <- rmatrix(20, 10)
> nmfModel(V) # equivalent to nmfModel(target=V)
<Object of class:NMFstd>
features: 20 
basis/rank: 0 
samples: 10 
> nmfModel(V, 3) # equivalent to nmfModel(3, V)
<Object of class:NMFstd>
features: 20 
basis/rank: 3 
samples: 10 
> 
> #----------
> # nmfModel,formula,ANY-method
> #----------
> # empty 3-rank model
> nmfModel(~ 3)
<Object of class:NMFstd>
features: 0 
basis/rank: 3 
samples: 0 
> 
> # 3-rank model that fits a given data matrix
> x <- rmatrix(20,10)
> nmfModel(x ~ 3)
<Object of class:NMFstd>
features: 20 
basis/rank: 3 
samples: 10 
> 
> # add fixed coefficient term defined by a factor
> gr <- gl(2, 5)
> nmfModel(x ~ 3 + gr)
<Object of class:NMFstd>
features: 20 
basis/rank: 5 
samples: 10 
fixed coef [2]:
  gr = <1, 2>
> 
> # add fixed coefficient term defined by a numeric covariate
> nmfModel(x ~ 3 + gr + b, data=list(b=runif(10)))
<Object of class:NMFstd>
features: 20 
basis/rank: 6 
samples: 10 
fixed coef [3]:
  gr = <1, 2>
  b = 0.0101301828399301, 0.21454192395322, ..., 0.767450851621106
> 
> # 3-rank model that fits a given ExpressionSet (with fixed coef terms)
> e <- ExpressionSet(x)
Error: could not find function "ExpressionSet"
Execution halted

checking re-building of vignette outputs ... NOTE
Error in re-building vignettes:
  ...
Quitting from lines 385-398 (NMF-vignette.Rnw) 
Error: processing vignette 'NMF-vignette.Rnw' failed with diagnostics:
unable to find required package 'Biobase'
Execution halted

DONE
Status: 1 ERROR, 5 NOTEs

aheatmap() color and breaks

In the function aheatmap(), the color and breaks parameters are
implemented incorrectly.

My goal is to use the color black, for example, for NA (or 0) values and
palette colors for all other values. Currently, that seems to be impossible
because the color and breaks parameters do not work correctly.

This code works as expected:

mat2 = matrix(runif(15), nrow=5)
aheatmap(mat2, color = 'YlOrBr:3')

This code produces a heatmap with 50 colors, despite my effort to get
3 colors:

aheatmap(mat2,
         color = brewer.pal(n = 3, name = 'YlOrBr'))

This code also fails because only the first 3 of the 50 colors are used:

aheatmap(mat2,
         color = brewer.pal(n = 3, name = 'YlOrBr'),
         breaks = seq(min(mat2), max(mat2), length.out = 3))

(Some of) the code with the incorrect implementation is in colorcode.R:

https://github.com/renozao/NMF/blob/master/R/colorcode.R#L213

Notice the line with if( is_NA(n) ) n <- 50. This ensures that the
expression n <- length(x) will never be executed. This explains
why I see 50 colors instead of the number I want.

large gap between column labels and legend

Thanks for all the recent edits, very appreciated.

Adding column labels and putting the legend on the bottom has resulted in a significant gap between the two. Any way to reduce the spacing?

hm4 <- aheatmap(data.to.plot4, annLegend=F,annCol=pheno2[,c("MaxLVWT","EF","LVOT_GT30")],annRow=cna.gene.matrix[is.candidate.cna,c("chr"),drop=F],annColors=list(chr=chr.palette),cexRow=2,cexCol=2,main="Candidate Modifier - CNA",color=c("Inversion"="yellow","Loss"="dodgerblue1","Neutral"="grey97","Gain"="red"),border=list(annLeg=T,matrix=T,cell=list(lwd=0.6,col="white"),annRow=F,annCol=T),layout = '_*')

aheatmap: Legend text for log scale

I've log scaled the data before plotting, and next to the legend I'd like to display the text "log_10(TPM)" (transcripts per million). Is that possible?

R CMD check failure with dev version of stringr

checking R code for possible problems ... NOTE
devnmf: no visible global function definition for ‘load_all’
nmfReport: no visible global function definition for ‘knit2html’
posICA: no visible binding for global variable ‘fastICA’
posICA: no visible global function definition for ‘fastICA’
runit.lsnmf: no visible global function definition for ‘checkTrue’
setupLibPaths: no visible global function definition for ‘load_all’
test.match_atrack : .check: no visible global function definition for
  ‘checkEquals’

checking examples ... ERROR
Running examples in ‘NMF-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: NMF-package
> ### Title: Algorithms and framework for Nonnegative Matrix Factorization
> ###   (NMF).
> ### Aliases: NMF NMF-package
> ### Keywords: package
> 
> ### ** Examples
> 
> ## Don't show: 
> # roxygen generated flag
> options(R_CHECK_RUNNING_EXAMPLES_=TRUE)
> ## End Don't show
> 
> # generate a synthetic dataset with known classes
> n <- 50; counts <- c(5, 5, 8);
> V <- syntheticNMF(n, counts)
> 
> # perform a 3-rank NMF using the default algorithm
> res <- nmf(V, 3)
> 
> basismap(res)
Error in process_tracks(x, tracks, annRow, annCol) : 
  Invalid special annotation track name ['basis']. Should partially match one of 'basis', ':basis', 'basis:'.
Calls: basismap -> basismap -> .local -> process_tracks
Execution halted

Could you please figure out why this is failing and if it's my fault? Thanks!

draw non-equidifferent legend in a heatmap

R package "NMF" aheatmap function, how can I draw non-equidifferent legend according to my own data; besides, the color was not completely the same in the aheatmap function and barplot function while I use these two functions to draw a picture with the same color parameter.

nmf() breaks on data with zero-filled columns

This will be handled by removing these rows/columns from the target and re-fill the model after fitting, with zero-filled basis/coef rows/columns.

Unsupervised classifications

Hi,

I have a matrix(mat) of gene expression data with patients(417 patients) as columns and genes (180 genes) as rows. I want to classify the patients(not the gene expression pattern) based on their gene expressions into four classes. Using following command:
res <- nmf(mat, 4, nrun = 200, seed = 123456)

Do you think it is a correct way of classifying the patients?
Using aheatmap command I can see that there exists four separate basises. I do not know how to get the barcodes of patients for each basis? I used the "basisnames" command:
basisnames(res)
but I got NULL.

How can I know which patients are grouped together?
Thanks for the help.

node stack overflow error

I am getting following error while using plotting the calculated results from NMF package.

dim(data)
#12831 36

res <- nmf(x = data, rank = 2)

basismap(res)

Error in match(x, table, nomatch = 0L) : node stack overflow

sessionInfo()

R version 3.1.3 (2015-03-09)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] RColorBrewer_1.1-2 doParallel_1.0.8 iterators_1.0.7
[4] foreach_1.4.2 NMF_0.20.6 synchronicity_1.1.4
[7] bigmemory_4.4.6 BH_1.58.0-1 bigmemory.sri_0.1.3
[10] Biobase_2.26.0 BiocGenerics_0.12.1 cluster_2.0.2
[13] rngtools_1.2.4 pkgmaker_0.22 registry_0.3

loaded via a namespace (and not attached):
[1] codetools_0.2-11 colorspace_1.2-6 compiler_3.1.3 digest_0.6.8
[5] ggplot2_1.0.1 grid_3.1.3 gridBase_0.4-7 gtable_0.1.2
[9] magrittr_1.5 MASS_7.3-42 munsell_0.4.2 plyr_1.8.3
[13] proto_0.3-10 Rcpp_0.11.6 reshape2_1.4.1 scales_0.2.5
[17] stringi_0.5-5 stringr_1.0.0 tools_3.1.3 xtable_1.7-4

NMF temporary directory and file writing are unsuitable for Shiny apps?

I'm not exactly sure, but it appears that since NMF likes to use temporary directories, it somehow fails to give itself the permission to serialize to a file in that directory.

This also seems to be relevant: Shiny and writing to files

Error: node stack overflow

Hi, I'm trying to plot the consensusmap for a data set with 2 clusters, 1 with less than100 samples, and another with over 900 samples. I'm getting an error, which seems to be coming from the dendrogram.

consensusmap(res, Colv=TRUE, Rowv=FALSE) 
Error in lapply(args, is.character) : node stack overflow  
Error in dev.flush() : node stack overflow                           
Error in par(opar) : node stack overflow                            
Error in upViewport() : node stack overflow

I managed to get a plot if I turn off both Colv and Rowv.

consensusmap(res, Colv=FALSE, Rowv=FALSE)

But it would be nice to have the dendrograms.

I'm running NMF on R version 3.0.2, Platform: x86_64-unknown-linux-gnu (64-bit).
I googled the error but have not found a solution. Any help appreciated!

Thanks,
Carolyn

space between dendrogram and annotation

Anyway to shrink the gap. I need all the real estate I can get so the labels are readable. Maybe a general padding argument with list input for people to tweak.

aheatmap annCol error with factors/character vectors

Not sure what I'm doing wrong here. If I try and plot a covariate bar using a factor I get an error, if I convert it to character only a single level "1" is plotted, but if I convert the factor to numeric it works as expected. Anything come to mind?

I'll try and put together a reproducible example.

aheatmap(
    x,
    annCol=list(cats=f_col_an)
    )

Error in Math.factor(c(1L, 13L), 0) : ‘round’ not meaningful for factors
In addition: Warning messages:
1: In if (class(annotation[[i]]) %in% c("character", "factor")) { :
  the condition has length > 1 and only the first element will be used
2: In if (class(annotation[[i]]) %in% c("character", "factor")) { :
  the condition has length > 1 and only the first element will be used

> traceback()
7: stop(gettextf("%s not meaningful for factors", sQuote(.Generic)))
6: Math.factor(c(1L, 13L), 0)
5: round.pretty(rg)
4: setNames(rev(sequential_hcl(2, h, l = c(50, 95))), round.pretty(rg))
3: generate_annotation_colours(annotation, annotation_colors)
2: renderAnnotations(annCol_processed, annRow_processed, annotation_colors = annotation_colors, 
       verbose = verbose)
1: aheatmap(x, annCol = list(cats = f_col_an))

Parallel backend is broken for doParallel >1.0.6

doParallel package version > 1.0.6 doesn't load as the parallel backend.

> library(NMF)
> data(esGolub)
> nmf(esGolub, 3, nrun=4, .opt="vP")
NMF algorithm: 'brunet'
Multiple runs: 4
Error: Foreach computation aborted: object 'info' not found

The error message refers to this line:
https://github.com/renozao/NMF/blob/master/R/parallel.R#L316

object$info <- doParallel:::info

The internal variable doParallel:::info has been removed since version 1.0.7

Customize legend (color bar) ticks

Is it possible to customize the ramp colorbar to display the numeric scale exactly at the desired points? Currently I think the scale granularity depends on the concrete values found in the data, e.g. when displaying several heatmaps in the same window, the corresponding legend colorbars are sometimes labeled with -5, 0, 5 but other times as -8, -6, -4, ... , 4, 6, even though the data of all those heatmaps are in the same range. I attach a small image showing what I mean.

R NMF Package

What would be the reason that NMF method keeps complaining the below error

Error: NMF::nmf - Input matrix x contains at least one null or NA-filled row.

Though I have double checked the matrix that I am passing for Null Values. The matrix is created from Document Term matrix which in turn was created with tm package

Cheers!
Verditer

Is it possible to pre-define columns into lower ranks / clusters in NMF?

Say, I have 200000 x 20 matrix. I want to run NMF with rank=5. But I also want to pre-define at least 5 of these 20 columns into 5 different column-rank. In other words, I want to pre-define, based on prior knowledge, at least column#3, column#7, column#9, column#10, column#11 into 5 different ranks/clusters.

I would appreciate any hints for implementing this. Thanks.

Possible bug in NMF version 0.20.5 with estimating the rank

I am running NMF 0.20.5 on R version 3.1.1 (2014-07-10) Windows and also tried on R version 3.1.0 (2014-04-10) Lunix with the same results:

estim <- nmf(x, 2:4, nrun = 10, seed = 123456,.opt = "vp")
Compute NMF rank= 2 ... + measures ...
ERROR
Compute NMF rank= 3 ... + measures ...
ERROR
Compute NMF rank= 4 ... + measures ...
ERROR
Error in (function (...) : All the runs produced an error:
-#1 [r=2] -> cannot allocate vector of size 1533.5 Gb
-#2 [r=3] -> cannot allocate vector of size 1533.5 Gb
-#3 [r=4] -> cannot allocate vector of size 1533.5 Gb
size of X = 453678 x 32. I was able to run this rank problem in the past. Note that I can run individual nmf's just fine:

res <- nmf(x,rank=6,nrun=20,.options='vp')
NMF algorithm: 'brunet'
Multiple runs: 20
Mode: parallel (23/24 core(s))
Runs: | | 0%
Runs: |==================================================| 100%
System time:
user system elapsed
49871.061 146.240 3347.401

Regards,
Gabe Haarsma

feature request: add labels for annotations and ability to cut dendrogram with different colors

it's really nice to see legend for column/row annotations, but once one has many column annotations, it's not easy to find corresponding annotation row for a factor/variable. Is it possible to add label for each annotation variable/factor?

Another feature I found useful from heatmap3 package is the ability to cut dendrogram and separate clusters with different background colors. Would it be possible to add such a feature into aheatmap?

Thanks,
Jay

aheatmap: Stagger row labels

I have 100 rows, which makes the font size of the row labels about 5 pt. Do you have any suggestions on how to make the printed row labels legible? Perhaps they could be staggered like so, which would allow doubling the font size to 10 pt, a big improvement over 5 pt.

.	.
row1
	row2
row3
	row4
row5
…	…

Alternatives to consensushc for predict from NMFfitX

Hi Renaud,

I'm trying to reproduce a clustering similar to the one described in this paper. I believe that the first part of their algorithm is identical to a nmf run, but their approach to derive clusters from multiple runs seems different. To quote:

_Cluster_
A partition-clustering algorithm was applied to the set of matrices S_P to cluster the data into N clusters. A variation of k-means, where each signature for ∀P∈Sp is assigned to exactly one cluster, was used to partition the data. Similarities between mutational signatures were calculated using a cosine similarity (see below) whereas the N centroids were calculated by averaging the signatures belonging to each cluster. The iteration-averaged matrix P was formed by combining the N centroid vectors ordered by their reproducibility (see Step 6). The error bars reported for each mutation type in each signature in P were calculated as the SD of the corresponding mutation type in each centroid over the I iterations. Note that clustering the data in S_P effectively results in clustering S_E as each signature unambiguously corresponds to exactly one exposure, thus allowing derivation of E.
_Evaluate_
The reproducibility of the derived average signatures P is evaluated by examining the tightness and separation of the clusters used to form the centroids in P (see Step 5). More specifically, using cosine similarity, the average silhouette width for each of the N clusters is calculated. An average silhouette width of 1.00 is equivalent to consistently deciphering the same mutational signature, whereas a low silhouette width indicates lack of reproducibility of the solution. The average silhouette width (Rousseeuw, 1987) of the N clusters is used as a measure of reproducibility for the whole solution.

I freely admit that I don't really understand the math behind these two paragraphs, but it seems to me to be not quite the same as the hierarchical clustering that is performed in the predict function. Do you think this alternative approach could be implemented in the NMF package in the future? FWIW, the Matlab code for their method is available here and actually somewhat readable.

Thanks in any case for having provided this fantastic library which I use almost daily. Best wishes,

Ben

NMF residuals: final objective value is NA

The NMF model converged in 410 iteration but yielded NA objective function value. We have been seeing this issue very often with highly sparse data. I am wondering if there is any option in the nmf() function or specific choice of method that helps to resolve this issue.

System information:
NMF_0.20.5 ; system: x86_64, linux-gnu; "R version 3.1.2 (2014-10-31)"

str(Train2_DSN)
'data.frame': 10193493 obs. of 168 variables:
$ X00 : int 0 0 0 0 0 0 0 0 0 0 ...
$ X1N : int 1 1 0 0 0 0 1 1 0 1 ...
$ X1U : int 0 0 0 0 0 0 0 0 0 0 ...
$ X1X : int 0 0 0 0 0 0 0 0 0 0 ...
$ X1Y : int 0 0 0 0 0 0 0 0 0 0 ...

system.time(NMF_res2<-nmf(Train2_DSN, rank=8, nrun=1, method="ns", theta=0.3, seed='nndsvd'));

NMF algorithm: 'nsNMF'
NMF seeding method: nndsvd
Iterations: 50/2000 Iterations: 100/2000 Iterations: 150/2000 Iterations: 200/2000 Iterations: 250/2000 Iterations: 300/2000 Iterations: 350/2000 Iterations: 400/2000 Iterations: 400/2000
DONE (converged at 410/2000 iterations)
user system elapsed
74499.012 200.859 74712.015

Warning message:
In .local(x, rank, method, ...) :
NMF residuals: final objective value is NA

Feature request - symbols in heatmap

We had this in our code base and it was very helpful for plots. Just thought I'd mention it.

Every cell can have a symbol and the symbols can be of various size. Cell border control would also be needed.

Two examples include.

genomic oncoprints
dotmaps

Typo in vignette

Typos found in vignettes/aheatmaps.Rnw and vignettes/aheatmaps.tex:
intall.pacakges should be install.packages.

aheatmap: Line width of the tree

Is it possible to set the line width of the tree of aheatmap? Any other tips for producing figures for publication?

Error in process_tracks

I am puzzled how to fix the following error,
basically, I do something as following wotj R 3.2.0 amd NMF 0.20.5

NMF.models <- nmf(m.2, seq(k.min, k.max + 1, k.incr), nrun = number_of_runs, method="brunet", seed=9876)
consensusmap(NMF.models)

Error in process_tracks(x, tracks, annRow, annCol) :
Invalid special annotation track name ['basis:', 'consensus:', 'silhouette:']. Should partially match one of 'basis:', 'consensus:', 'silhouette:'.

I was using this code/library a year ago and I don't think, I have this problem back then !

R CMD check failures with stringr 1.0.0

Could you please take a look?

checking dependencies in R code ... NOTE
'library' or 'require' calls in package code:
  ‘Biobase’ ‘bigmemory’ ‘devtools’ ‘knitr’ ‘synchronicity’
  Please use :: or requireNamespace() instead.
  See section 'Suggested packages' in the 'Writing R Extensions' manual.

checking R code for possible problems ... NOTE
.wrapResult: no visible global function definition for ‘exprs’
devnmf: no visible global function definition for ‘load_all’
nmfReport: no visible global function definition for ‘knit2html’
posICA: no visible binding for global variable ‘fastICA’
posICA: no visible global function definition for ‘fastICA’
runit.lsnmf: no visible global function definition for ‘checkTrue’
setupLibPaths: no visible global function definition for ‘load_all’
test.match_atrack : .check: no visible global function definition for
  ‘checkEquals’

checking Rd cross-references ... NOTE
Packages unavailable to check Rd xrefs: ‘Biobase’, ‘RcppOctave’

checking data for non-ASCII characters ... NOTE
  Error in .requirePackage(package) : 
    unable to find required package 'Biobase'
  Calls: <Anonymous> ... .extendsForS3 -> extends -> getClassDef -> .requirePackage
  Execution halted

checking examples ... ERROR
Running examples in ‘NMF-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: NMF-package
> ### Title: Algorithms and framework for Nonnegative Matrix Factorization
> ###   (NMF).
> ### Aliases: NMF NMF-package
> ### Keywords: package
> 
> ### ** Examples
> 
> ## Don't show: 
> # roxygen generated flag
> options(R_CHECK_RUNNING_EXAMPLES_=TRUE)
> ## End Don't show
> 
> # generate a synthetic dataset with known classes
> n <- 50; counts <- c(5, 5, 8);
> V <- syntheticNMF(n, counts)
> 
> # perform a 3-rank NMF using the default algorithm
> res <- nmf(V, 3)
> 
> basismap(res)
Error in process_tracks(x, tracks, annRow, annCol) : 
  Invalid special annotation track name ['basis']. Should partially match one of 'basis', ':basis', 'basis:'.
Calls: basismap -> basismap -> .local -> process_tracks
Execution halted

NMF Question

I am hoping to use your NMF package for chemical fingerprinting. Basically chemical data from a series of samples is used to determine the end member compositions and the contribution of each end member to each sample. For the input matrix, rows are samples and columns are the chemical concentrations. The concentrations are normalized, meaning the sum of each row is 1. The nmf output should exhibit closure, meaning the rows of h should sum to 1. I have been reviewing the source code and have been unable to determine if an option is available to impose closure (rows in h sum to 1). Is this implemented? if not any advice of how best to attempt this myself? I would consider myself and intermediate R user.

Thanks for you time
Mike

stringr version

Trying to install NMF I get an error about stringr versions
I first tried without success to fix on R 3.1.2 (Centos, Mac and Ubuntu) but have now done a clean install on Centos of R 3.2.1 and after updating stringr get

R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> packageVersion('stringr')
[1] ‘1.0.0’
> library(stringr)
> install.packages("NMF")
Installing package into ‘/home/ashley/R/x86_64-unknown-linux-gnu-library/3.2’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
trying URL 'http://cran.rstudio.com/src/contrib/NMF_0.20.6.tar.gz'
Content type 'application/x-gzip' length 1764466 bytes (1.7 MB)
==================================================
downloaded 1.7 MB

* installing *source* package ‘NMF’ ...
** package ‘NMF’ successfully unpacked and MD5 sums checked
** libs
g++ -I/usr/local/lib64/R/include         -I/usr/local/include    -fpic  -O2        -c distance.cpp -o distance.o
g++ -I/usr/local/lib64/R/include         -I/usr/local/include    -fpic  -O2        -c divergence.cpp -o divergence.o
g++ -I/usr/local/lib64/R/include         -I/usr/local/include    -fpic  -O2        -c euclidean.cpp -o euclidean.o
g++ -I/usr/local/lib64/R/include         -I/usr/local/include    -fpic  -O2        -c utils.cpp -o utils.o
g++ -shared -L/usr/local/lib64/R/lib -L/usr/local/lib64 -o NMF.so distance.o divergence.o euclidean.o utils.o -L/usr/local/lib64/R/lib -lR
installing to /home/ashley/R/x86_64-unknown-linux-gnu-library/3.2/NMF/libs
** R
** data
** demo
** inst
** preparing package for lazy loading
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  namespace ‘stringr’ 0.6.2 is already loaded, but >= 1.0.0 is required
ERROR: lazy loading failed for package ‘NMF’
* removing ‘/home/ashley/R/x86_64-unknown-linux-gnu-library/3.2/NMF’

The downloaded source packages are in
    ‘/tmp/Rtmp9oK6kO/downloaded_packages’
Warning message:
In install.packages("NMF") :
  installation of package ‘NMF’ had non-zero exit status

NMF with individual weight

Hi,

I'am currently working on a data with 1624 individuals and 43 variables. I would like to run an analysis with a weight per individual. Is it possible in your package? Have you an idea about how to take into account the individual weight, as PCA or factorial analysis does in some other packages ? My data came from a sample which has to be representative of the whole population studied, thanks to this variable "weight".

Thanks,
Rozenn

maxIter error

Hi, I'm running 0.20.6, and am getting an error related to maxIter:

> ret <- nmf(cg_full, rank=rank, method='ls-nmf', weight=as.numeric(cg_full>0), nrun=nrun,
+            maxIter=100,
+            seed="random", 
+            .options = list(parallel=TRUE, track=TRUE, verbose=1)
+ ) 
NMF algorithm: 'ls-nmf'
NMF seeding method: random
Error in obj.fun(x, y, ...) : unused argument (maxIter = 100)

What could be causing this?

display discrete values i.e. copy number

Best heatmaps available!

I haven't had much luck displaying various discrete data such as copy number or mutation presence absence. Specifically, using breaks and col can you limit the plot to display only three values (blue, white, red), avoid the ramp in the legend and change the legend labels from numeric to something informative (loss, neutral, gain).

On a related note, any recommendation on the appropriate clustering for discrete data would be helpful. In the past I've used jaccard and ward.