Giter VIP home page Giter VIP logo

multivariatestats.jl's People

Contributors

alexmorley avatar andreasnoack avatar ararslan avatar bicycle1885 avatar dkarrasch avatar eahenle avatar francis-gagnon avatar iainnz avatar jiahao avatar jonathanbieler avatar juliatagbot avatar kescobo avatar kolaru avatar kronosthelate avatar lindahua avatar mateuszbaran avatar maximsch2 avatar moritzketzer avatar mrkrause avatar nalimilan avatar nico202 avatar okonsamuel avatar oxinabox avatar palday avatar pallharaldsson avatar pnavaro avatar rofinn avatar timholy avatar tk3369 avatar wildart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multivariatestats.jl's Issues

The example for multivariate linear regression in the docs does not work.

This example does not work because of DimensionMismatch

using MultivariateStats
X = rand(1000, 3)
A0, b0 = rand(3, 5), rand(1, 5)
Y = (X * A0 .+ b0) + 0.1 * randn(1000, 5)

# solve using llsq
sol = llsq(X, Y)

# extract results
A, b = sol[1:end-1,:], sol[end,:]

# do prediction
Yp = X * A .+ b

[PkgEval] MultivariateStats may have a testing issue on Julia 0.4 (2015-01-17)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.4

  • On 2015-01-16 the testing status was Tests pass.
  • On 2015-01-17 the testing status changed to Tests fail, but package loads.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Tests fail, but package loads. means that PackageEvaluator found the tests for your package, executed them, and they didn't pass. However, trying to load your package with using worked.

This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

>>> 'Pkg.add("MultivariateStats")' log
INFO: Installing MultivariateStats v0.1.3
INFO: Package database updated

>>> 'using MultivariateStats' log
Julia Version 0.4.0-dev+2756
Commit 4b20e10 (2015-01-17 03:18 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

>>> test log
Running tests:
 * lreg.jl
 * whiten.jl

ERROR: LoadError: LoadError: UndefVarError: Triangular not defined
 in cov_whitening at /home/idunning/pkgtest/.julia/v0.4/MultivariateStats/src/whiten.jl:9
 in include at ./boot.jl:248
 in include_from_node1 at ./loading.jl:128
 in anonymous at no file:15
 in include at ./boot.jl:248
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:312
 in _start at ./client.jl:396
while loading /home/idunning/pkgtest/.julia/v0.4/MultivariateStats/test/whiten.jl, in expression starting on line 23
while loading /home/idunning/pkgtest/.julia/v0.4/MultivariateStats/test/runtests.jl, in expression starting on line 12
INFO: Testing MultivariateStats
==========================[ ERROR: MultivariateStats ]==========================

failed process: Process(`/home/idunning/julia04/usr/bin/julia --check-bounds=yes --code-coverage=none --color=no /home/idunning/pkgtest/.julia/v0.4/MultivariateStats/test/runtests.jl`, ProcessExited(1)) [1]

================================================================================
INFO: No packages to install, update or remove
ERROR: MultivariateStats had test errors
 in error at error.jl:19
 in test at pkg/entry.jl:717
 in anonymous at pkg/dir.jl:28
 in cd at ./file.jl:20
 in cd at pkg/dir.jl:28
 in test at pkg.jl:68
 in process_options at ./client.jl:234
 in _start at ./client.jl:396


>>> end of log

Returning eigenvalues from classical_mds?

Hello.
It would be very nice if classical_mds function returns not only the configuration matrix for a given distance matrix, but also the eigenvalues of the Gram matrix of the configurations, just like MATLAB's cmdscale function does. By examining those eigenvalues, we can see what would be a good embedding dimension for the classical MDS. It should be easy since inside classical_mds, those eigenvalues are computed anyways.
Thanks!
BVPs

Varimax rotation in Factor Analysis?

I'm wondering whether there's any plan to implement the so-called Varimax Rotation and other rotations of factors or PCA loadings as MATLAB's rotatefactors function in Statistics Toolbox.
Thanks!
BVP

Warning if (e.g. fastica!) does not converge?

Would anyone else find it useful to have a warning if iterable algorithms are hitting maxiter instead of reaching tol? Could have another kwarg to turn this behaviour on/off. Happy to make PR if anyone thinks its a good idea.

Type restriction on ICA

Fitting on ICA is currently defined as fit(::ICA, X::DenseMatrix{Float64}, k::Int64) [1]

But Matrix{Float64} <: DenseMatrix{Float64}, so calling fit on ICA is not possible at all.

[1]

function fit(::Type{ICA}, X::DenseMatrix{Float64}, # sample matrix, size (m, n)

Support DataFrames

At the moment, I am trying to plot some information on some texts and it would be great if I could have this package support doing this with DataFrames so I can attempt to keep track of the titles of the texts and plot them out on a graph using Gadfly so I can actually is what is going on with the data.

[Kernel PCA] pairwise! function and kernel centering

Hello,

In Kernel PCA the pairwise! function does not work when generating non-symmetric kernels:

julia> MultivariateStats.pairwise((x,y)->x'y,[1.0 2 3],[1.0 2.0 3.0 4.0 5.0])                                                                                                                  
#3×5 Array{Float64,2}:                                                                                                                                                                          
#1.0  2.0  3.0  6.95024e-310  6.95024e-310                                                                                                                                                     
#2.0  4.0  6.0  6.95024e-310  6.95024e-310                                                                                                                                                     
#3.0  6.0  9.0  6.95024e-310  6.95024e-310 

Also, kernel centering may not be necessary when transforming (line 50 in kpca.jl). As I am not very familiar with Kernel PCA, I based my assumption on this and the fact that the centering code works on symmetric kernels only.

On a non-symmetric kernel, for testing data with more samples than the training data, the inner for loop fails as the second dimension of the input kernel matrix (i.e. the number of input test samples) is larger than the means vector C.means.

Best regards,

Julia 0.5.2 syntax: invalid character "‰"

Trying this command on ppca.jl with Juliav0.5.2, the character for approximately equal or some other forms of equal is not recognized in a ipynb.

@test transform(M, X[:,1]) ≈ T * (X[:,1] .- mv) and all the others with ≈ give:

syntax: invalid character "‰"

Pkg.test("MultivariateStats") passed.

Tag a new release?

Lots of interesting stuff happened since last release one year ago (in particular, factor analysis), would it be possible to tag a new release?

Need exception handling if only negative eigenvalues in classic_mds

I run classic_mds at a late stage in a long set of processing steps but got the warning that there were only 0 non-negative eigenvalues. Since the data was large I can't paste it in here. Maybe just throw an exception if there are less than p eigen values that are above 0? Or just select the largest ones?

least squares implementation

AFAICT the implementation of linear regression forms the X'X matrix then calculates (X'X)⁻¹(X'y) using Cholesky factorization.

I am curious why this was chosen. The the best of my knowledge, orthogonal (QR) methods are more stable (if a bit more costly), and of course SVD is the best solution for nearly rank-deficient matrices.

SubspaceLDA not working for Float32 data

I run into a method error when I attempt to fit a SubspaceLDA model to the some Float32 data.

julia> typeof(all_clips)
Array{Float32,2}
julia> size(all_clips)
(91, 6488)
julia> typeof(labels)
Array{Int64,1}
julia> size(labels)
(6488,)
julia> m = fit(SubspaceLDA, all_clips, labels; normalize = true)
ERROR: MethodError: no method matching SubspaceLDA(::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,1}, ::Array{Float32,2}, ::Array{Int64,1})
Closest candidates are:
  SubspaceLDA(::Array{T<:Real,2}, ::Array{T<:Real,2}, ::Array{T<:Real,1}, ::Array{T<:Real,2}, ::Array{Int64,1}) where T<:Real at /home/glynch/.julia/packages/MultivariateStats/nNJuu/src/lda.jl:203
Stacktrace:
 [1] #fit#26(::Bool, ::Function, ::Type{SubspaceLDA}, ::Array{Float32,2}, ::Array{Int64,1}, ::Int64) at /home/glynch/.julia/packages/MultivariateStats/nNJuu/src/lda.jl:247
 [2] #fit at ./none:0 [inlined] (repeats 2 times)
 [3] top-level scope at none:0

However, fitting works if I first convert the data to Float64. In contrast, MulticlassLDA does not complain about Float32 data.

floating point issue with PCA principal variance checking

on my Computer the following code throws an error, on other computers it works fine, probably due to some floating point inaccuracies:

using MultivariateStats
using RDatasets

data = convert(Matrix{Float64},dataset("datasets", "iris")[:,1:4])'

m = mean(data,2)
sd = std(data,2)
data2 = Float64[ (data[i,j] - m[i]) / sd[i] for i in 1:size(data,1), j in 1:size(data,2) ]

dataPCA = fit(PCA,data2,pratio = 1.0)

ERROR: ArgumentError: principal variance cannot exceed total variance.

Any plan for multivariate EMD?

Hi Folks,

I happen to work on this algorithm (MEMD) a lot recently. Currently there exists a MATLAB version but I try to start my project in Julia and haven't had much luck finding one yet.

It will be fantastic to have this algorithm in Julia. Just want to know if any of you guys are thinking of this?

DomainError for FactorAnalysis

Hi,

I am trying to run factor analysis on a matrix of size 2457x19 (all float64's + quite sparse). When calling fit(FactorAnalysis, X) I get the error:

ERROR: DomainError:
sqrt will only return a complex result if called with a complex argument. Try sqrt(complex(x)).
Stacktrace:
[1] sqrt(::Float64) at .\math.jl:425
[2] #facm#61(::Int64, ::Float64, ::Int64, ::Float64, ::Function, ::Array{Float64,2}, ::Array{Float64,1}, ::Int64) at C:\Users\St. Elmo.julia\v0.6\MultivariateStats\src\fa.jl:118
[3] (::MultivariateStats.#kw##facm)(::Array{Any,1}, ::MultivariateStats.#facm, ::Array{Float64,2}, ::Array{Float64,1}, ::Int64) at .<missing>:0
[4] #fit#70(::Symbol, ::Int64, ::Void, ::Float64, ::Int64, ::Float64, ::Function, ::Type{MultivariateStats.FactorAnalysis}, ::Array{Float64,2}) at C:\Users\St. Elmo.julia\v0.6\MultivariateStats\src\fa.jl:172
[5] fit(::Type{MultivariateStats.FactorAnalysis}, ::Array{Float64,2}) at C:\Users\St. Elmo.julia\v0.6\MultivariateStats\src\fa.jl:164

I'm not doing anything fancy before running the method, so there might be a bug in the actual code. I could also supply my input matrix if you want... What can I do to fix this?

FastICA missing a complex variant?

This would be useful as described in

E. Bingham and A. Hyvarinen, “A fast fixed-point algorithm for independent components analysis of complex valued signals,” International Journal of Neural Systems, vol. 10, pp. 1–8, Feb. 2000

SVD and LAPACK

Hi,

While doing:

M = MultivariateStats.fit(PCA,Xsc; method=:svd, mean=0, pratio = 0.99)

where Xsc is a scaled and centred matrix. I am being thrown the following error:

ERROR: LoadError: Base.LinAlg.LAPACKException(12)
in gesdd! at linalg/lapack.jl:1482
in svdfact! at linalg/svd.jl:17
in svdfact at linalg/svd.jl:23
in pcasvd at /.julia/v0.4/MultivariateStats/src/pca.jl:118

I am assuming that the matrix itself might be the problem and i am trying to take remedial action. However, I am unable to track down/interpret the error. Any help greatly appreciated

Thanks
Alan

JuliaBox Package unresolved

This might be related to #73.
In JuliaBox with a 1.0.0 kernel, running
Pkg.add("MultivariateStats"); using MultivariateStats
results in the error:

The following package names could not be resolved:
* MultivariateStats (not found in project, manifest or registry)
Please specify by known `name=uuid`. 

Performing EFA possible?

Am I missing something or is exploratory factor analysis no implemented yet? If not, are there any plans adding it?

LDA Method Error

Hi there,

I'm having an issue regarding the fit function. When I try to run fit(LinearDiscriminant, [1 2 3 4]. [5 6 7 8]) I get a Method Error saying no method matching LinearDiscriminant.

no method match fit while running PCA

I am running the example code from documentation, but get some error.
I removed the DataArray to since it was not defined in Julia 0.7 packages.

using MultivariateStats, RDatasets, Plots
plotly() # using plotly for 3D-interacive graphing

# load iris dataset
iris = dataset("datasets", "iris")

# split half to training set
Xtr = convert(Array, iris[1:2:end,1:4])'
Xtr_labels = convert(Array, iris[1:2:end,5])

# split other half to testing set
Xte = convert(Array,iris[2:2:end,1:4])'
Xte_labels = convert(Array,iris[2:2:end,5])

# suppose Xtr and Xte are training and testing data matrix,
# with each observation in a column

# train a PCA model, allowing up to 3 dimensions
M = fit(PCA, Xtr; maxoutdim=3)

# apply PCA model to testing set
Yte = transform(M, Xte)

# reconstruct testing observations (approximately)
Xr = reconstruct(M, Yte)

# group results by testing set labels for color coding
setosa = Yte[:,Xte_labels.=="setosa"]
versicolor = Yte[:,Xte_labels.=="versicolor"]
virginica = Yte[:,Xte_labels.=="virginica"]

# visualize first 3 principal components in 3D interacive plot
p = scatter(setosa[1,:],setosa[2,:],setosa[3,:],marker=:circle,linewidth=0)
scatter!(versicolor[1,:],versicolor[2,:],versicolor[3,:],marker=:circle,linewidth=0)
scatter!(virginica[1,:],virginica[2,:],virginica[3,:],marker=:circle,linewidth=0)
plot!(p,xlabel="PC1",ylabel="PC2",zlabel="PC3")

I am getting the following errors:

MethodError: no method matching fit(::Type{PCA}, ::LinearAlgebra.Adjoint{Float64,Array{Float64,2}}; maxoutdim=3)
Closest candidates are:
  fit(!Matched::Type{Histogram}, ::Any...; kwargs...) at /usr/people/jingpeng/.julia/packages/StatsBase/NzjNi/src/hist.jl:340
  fit(!Matched::StatisticalModel, ::Any...) at /usr/people/jingpeng/.julia/packages/StatsBase/NzjNi/src/statmodels.jl:151 got unsupported keyword argument "maxoutdim"
  fit(!Matched::Type{Distributions.Beta}, ::AbstractArray{T<:Real,N} where N) where T<:Real at /usr/people/jingpeng/.julia/packages/Distributions/y4rh9/src/univariate/continuous/beta.jl:129 got unsupported keyword argument "maxoutdim"
  ...

Stacktrace:
 [1] top-level scope at In[38]:19

1.0.0 release?

Hi, what's the plan for a 1.0.0 tag? What is left to be done? In general some stability promise would go a long way for developers like us (JuliaDynamics)

[PkgEval] MultivariateStats may have a testing issue on Julia 0.4 (2014-08-15)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.4

  • On 2014-08-14 the testing status was Tests pass.
  • On 2014-08-15 the testing status changed to Tests fail, but package loads.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Tests fail, but package loads. means that PackageEvaluator found the tests for your package, executed them, and they didn't pass. However, trying to load your package with using worked.

This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

>>> 'Pkg.add("MultivariateStats")' log
INFO: Installing ArrayViews v0.4.6
INFO: Installing MultivariateStats v0.1.1
INFO: Installing StatsBase v0.6.3
INFO: Package database updated

>>> 'using MultivariateStats' log

>>> test log
Running tests:
 * lreg.jl
 * whiten.jl

ERROR: type Cholesky has no field uplo
 in cov_whitening at /home/idunning/pkgtest/.julia/v0.4/MultivariateStats/src/whiten.jl:7
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in anonymous at no file:15
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:285
 in _start at ./client.jl:354
 in _start_3B_1724 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.4/MultivariateStats/test/whiten.jl, in expression starting on line 23
while loading /home/idunning/pkgtest/.julia/v0.4/MultivariateStats/test/runtests.jl, in expression starting on line 12

>>> end of log

Accepting Subarray in `fit` function

Hey there,

I would like to use the PCA functions offered in this package.
I work with fairly large datasets ( >> 200×1_000_000 ) and want to use a specific subset of my data
for the fitting process.
fit(PCA,Xtr) requires a DenseMatrix which in my case forces me to make a copy.
Yet, I don't see a particular reason for not allowing SubArray i.e. a @view.

Is there an issue that I have overlooked or is it just a matter of changing the parametric type requirement?

Best,
Jonas

Possible inconsistency in PCA matrix dimensions?

Documentation for the PCA fit function states the following:

Perform PCA over the data given in a matrix X. Each column of X is an observation.

Let (d, n) = size(X) be respectively the input dimension and the number of observations

Which means if X is an d x n matrix, then X has n observations of d variables/features (kinda counter-intuitive to me, since we generally consider variables as columns)

Now, correct me if I'm wrong, when doing PCA one should centralize all variables by subtracting their means. So, after doing that, we now have matrix X_centralized, whose d rows have zero mean. So far, so good.

I then do pcaResult = fit(PCA, X_centralized) and therefore obtain the PCA model. indim is listed as d, in my case 44. However, when checking the correlation matrix of the principal components, cor(pcaResult.proj), I verify the correlations are significant, which means PCA wasn't done correctly, since correlations should be zero.

I then do the opposite: I subtract the mean of each observation, not variable, which seems nonsensical. I use the fit function and obtain the PCA model. indim is listed as n, which in my case is 50 thousand. However, the correlation matrix in this case is zero as we should expect from a correct PCA!

Am I doing something wrong? Or is documentation incorrect and matrix X should have variables on columns instead of rows?

ArgumentError: principal variance cannot exceed total variance. NaN > NaN

So I am debugging some mean code that seems to be sending values right to the exteme ends of the Float32 range.

I kept getting ArgumentError: principal variance cannot exceed total variance
on this line

So I enhanced it to be more informative on my local copy:

tpvar <= tvar || isapprox(tpvar,tvar) || throw(ArgumentError("principal variance cannot exceed total variance. $tpvar > $tvar"))

And I found out that both tpvar and tvar ended up at NaN.
Now how this happens is unknown to me (right now, but soon I will know, I hope).

But my point here is that this is the wrong error message.
I suggest a NaN check should be added.

Not working with julia 1.0.0?

Maybe "doesn't work with julia 1.0.0" is not considered a bug at this point. If so, fair enough. But I'm interested in statistics packages and after updating from julia 0.6.4, MultivariateStats no longer works.

The most efficient way of showing what I'm talking about may be to display the "install" and "using" phases. Note: I'm using the current Linux x86_64 binary package of julia 1.0.0 from julialang.org. (Also tried the Arch Linux build with the same results.)

(v1.0) pkg> add MultivariateStats
 Resolving package versions...
 Installed MultivariateStats ? v0.5.0
  Updating ~/.julia/environments/v1.0/Project.toml
  [6f286f6a] + MultivariateStats v0.5.0
  Updating ~/.julia/environments/v1.0/Manifest.toml
  [6f286f6a] + MultivariateStats v0.5.0

^C # to get out of pkg mode

julia> using MultivariateStats
[ Info: Precompiling MultivariateStats [6f286f6a-111f-5878-ab1e-185364afe411]
ERROR: LoadError: UndefVarError: LinAlg not defined
Stacktrace:
 [1] include at ./boot.jl:317 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1038
 [3] include(::Module, ::String) at ./sysimg.jl:29
 [4] top-level scope at none:2
 [5] eval at ./boot.jl:319 [inlined]
 [6] eval(::Expr) at ./client.jl:389
 [7] top-level scope at ./none:3
in expression starting at /home/allin/.julia/packages/MultivariateStats/wGpiN/src/MultivariateStats.jl:8
ERROR: Failed to precompile MultivariateStats [6f286f6a-111f-5878-ab1e-185364afe411] to /home/allin/.julia/compiled/v1.0/MultivariateStats/l7I74.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] macro expansion at ./logging.jl:313 [inlined]
 [3] compilecache(::Base.PkgId, ::String) at ./loading.jl:1184
 [4] _require(::Base.PkgId) at ./logging.jl:311
 [5] require(::Base.PkgId) at ./loading.jl:852
 [6] macro expansion at ./logging.jl:311 [inlined]
 [7] require(::Module, ::Symbol) at ./loading.jl:834``

Note: line 8 wGpiN/src/MultivariateStats.jl reads:

import Base.LinAlg: Cholesky

Allin Cottrell

Memory-efficient LDA

I am not very familiar with this package, but it appears that by default this package's LDA routines store the full covariance matrix associated with each point. If your points are, say, an image (living in a 10^6-dimensional space), that covariance matrix will be large.

There's a more memory-efficient alternative: one can demonstrate that the eigenvectors lie in the subspace spanned by the class means, and so consequently if there are n classes, really the LDA should be performed on an n-1-dimensional projection of the underlying data points. Note that this is better than first doing a PCA to reduce the dimensionality, in the sense that PCA is going to pick dimensions based on the total sample variance and may not return a projection as useful as the one based on the class means.

I'm not sure how to fit this in with the framework in this package, but there's an implementation here. I'm offering to donate it to MultivariateStats, but I'd like a collaborator on the integration. (Or just go steal it and package it how you like.)

Deprecating DimensionalityReduction.jl?

This seems like a strict super-set of DimensionalityReduction.jl - should we try to deprecate DimensionalityReduction.jl before 0.3 is released, so it won't install on 0.3? Or is it OK to "keep it around" but make clear in the readme its been superseded by this package?

Documentation and implementation of llsq don't agree on data matrix structure

Unless I'm missing something, the implementation (and docs) of llsq don't agree with the statement on the documentation index that data matrices have features as rows and observations as columns.

In the following code from the llsq documentation, the number of observations is 1000, and the number of features is 3, but the observation matrix X has 1000 rows and 3 columns, and the output from llsq is a 3-vector. Note that the code does not use the trans option provided by llsq.

using MultivariateStats

# prepare data
X = rand(1000, 3)               # feature matrix
a0 = rand(3)                    # ground truths
y = X * a0 + 0.1 * randn(1000)  # generate response

# solve using llsq
a = llsq(X, y; bias=false)

# do prediction
yp = X * a

Implement PCoA (principal coordinate analysis)

Principal coordinate analysis is essentially a PCA with some transformations that make it suitable for non-metric distances where negative eigen values can be generated. It's often used in ecology for visualizing dissimilarity between different sites for example (see here for a paper describing it and an R package that implements it).

I've written an implementation for my microbiome package, but thought it might be more generally useful. I also suspect with code review from the community it will end up far better than what I've written on my own (I'm by no means an expert on linear algebra).

So I wanted to know if that code/should could be ported over here. I've verified on several matrices that my code generates the same results as the R package linked above, but I'm sure it could be made much more efficient and cleaned up a lot, with some help.

Partial Least Squares

I am working with Partial Least Squares in R. Is there any planned timeline for the PLS development for Julia?

Setting the output dimension for PCA

Is it possible to set the output dimension for a PCA instance? (From the documentation, it seems like PCA's are constructed only by the fit() method, and that method determines the output dimension from pratio.)

New release

Hi, can you make a new release of this and publish it to METADATA? In particular, we're trying to track down some precompile issues in our project, and we're trying to get all our dependencies precompiled. I see that __precompile__() was recently committed, but it hasn't been released. Thanks.

how can `llsq` be used with single dependent and independent variable

Is it possible to use llsq with a single predictor?

X = [1.0,3.0,4.0,5.0,6.0,8.0,9.0]
y = [1.1, 3.3, 4.4, 10.5, 12.6, 16.8, 18.9]
llsq(X, y, bias=false)

results in

MethodError: no method matching llsq(::Array{Float64,1}, ::Array{Float64,1}; bias=false)
Closest candidates are:
  llsq{T<:AbstractFloat}(::DenseArray{T<:AbstractFloat,2}, ::Union{DenseArray{T<:AbstractFloat,1},DenseArray{T<:AbstractFloat,2}}; trans, bias) at /Users/abisen/.julia/v0.5/MultivariateStats/src/lreg.jl:24

where as adding another predictor works

X = [[1.0,3.0,4.0,5.0,6.0,8.0,9.0] [1.0,3.0,4.0,5.0,6.0,8.0,9.0]] 
y = [1.1, 3.3, 4.4, 10.5, 12.6, 16.8, 18.9] 
llsq(X, y, bias=false)

and results in

2-element Array{Float64,1}:
 -0.012069
  2.0    

Make CCA take a type parameter

Currently CCA is Float64 only.
The other methods take a type parameter.

I think that would be useful for CCA also.
(I'm currently thinking of applying CCA to a 1_000_000 row matrix.
I'ld really like to be able to use Float32 or Float16

Use eigmax instead of maximum(eigvals)

In the function regularize_symmat!, we are now using maximum(eigvals(A)) to compute the maximum eigenvalues. That's due to a bug in the Julia Base.

Now that this has been fixed. We should restore using eigmax once the fix goes into the nightlies.

This issue mainly serves as the reminder.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.