Comments (3)
Well, outright replacing the GammaLassoPaths with linear regression is not supported.
But because those are a special case of a GammaLasso regression, you can tell it to set the lasso penalty to zero with λ=[0]
.
For example, after the setup steps here, you can run the following for a regularized regression (the default):
julia> m = fit(HDMR, mf, covarsdf, counts)
┌ Info: fitting 100 observations on 4 categories
└ 2 covariates for positive and 3 for zero counts
[ Info: distributed InclusionRepetition run on local cluster with 4 nodes
TableCountsRegressionModel{HDMRCoefs{InclusionRepetition, Matrix{Float64}, Lasso.MinAICc, Vector{Int64}}, DataFrame, SparseMatrixCSC{Float64, Int64}}
2-part model: [h ~ vy + v1 + v2, c ~ vy + v1]
julia> coef(m)
([0.0 -2.4091948280522995 -1.1615166356173474 -0.24334706195174494; 0.0 0.0 0.0 0.0; 0.0 0.0 -1.0808502402099707 0.1406421889555801], [0.2070682976995456 -0.9743741299190539 0.9915269486925402 0.0; -3.1843322904119264 0.0 0.0 0.0; -2.7790593025458876 0.0 -0.432323028097416 0.0; -2.919950684739639 -1.1826722108109264 0.0 0.0])
Or specify lambda as follows to no regularize:
julia> m = fit(HDMR, mf, covarsdf, counts; λ=[0])
┌ Info: fitting 100 observations on 4 categories
└ 2 covariates for positive and 3 for zero counts
[ Info: distributed InclusionRepetition run on local cluster with 4 nodes
TableCountsRegressionModel{HDMRCoefs{InclusionRepetition, Matrix{Float64}, Lasso.MinAICc, Vector{Int64}}, DataFrame, SparseMatrixCSC{Float64, Int64}}
2-part model: [h ~ vy + v1 + v2, c ~ vy + v1]
julia> coef(m)
([0.0 -1.110511024175597 -1.0734051961718625 -0.39088273796923456; 0.0 -2.2629151724896013 -0.09369930436827476 0.14587571303858277; 0.0 -2.0600336781455253 -1.1968304814730941 0.28030956136083035], [0.25757237750517803 0.48872982781408103 1.1153661779954087 0.0; -3.2376564147901465 -1.288184763373904 0.11953557405466866 0.0; -2.846531136681206 -0.8619608565644029 -0.619841160100449 0.0; -2.980542519794024 -2.7975054018279963 -0.19692424375022308 0.0])
As for the likelihoods, you need to replace HDMR
in that call with HDMRPaths
because only the latter saves that information. See here:
julia> m = fit(HDMRPaths, mf, covarsdf, counts; λ=[0])
┌ Info: fitting 100 observations on 4 categories
└ 2 covariates for positive and 3 for zero counts
[ Info: distributed InclusionRepetition run on remote cluster with 4 nodes
TableCountsRegressionModel{HDMRPaths{InclusionRepetition{Lasso.GammaLassoPath{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, LogitLink}, Lasso.NaiveCoordinateDescent{Float64, true, Matrix{Float64}, Lasso.RandomCoefficientIterator, Vector{Float64}}}, Float64}, Lasso.GammaLassoPath{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Poisson{Float64}, LogLink}, Lasso.NaiveCoordinateDescent{Float64, true, Matrix{Float64}, Lasso.RandomCoefficientIterator, Vector{Float64}}}, Float64}}, Vector{Int64}}, DataFrame, SparseMatrixCSC{Float64, Int64}}
2-part model: [h ~ vy + v1 + v2, c ~ vy + v1]
julia> m
TableCountsRegressionModel{HDMRPaths{InclusionRepetition{Lasso.GammaLassoPath{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, LogitLink}, Lasso.NaiveCoordinateDescent{Float64, true, Matrix{Float64}, Lasso.RandomCoefficientIterator, Vector{Float64}}}, Float64}, Lasso.GammaLassoPath{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Poisson{Float64}, LogLink}, Lasso.NaiveCoordinateDescent{Float64, true, Matrix{Float64}, Lasso.RandomCoefficientIterator, Vector{Float64}}}, Float64}}, Vector{Int64}}, DataFrame, SparseMatrixCSC{Float64, Int64}}
2-part model: [h ~ vy + v1 + v2, c ~ vy + v1]
julia> m.model.nlpaths
4-element Vector{InclusionRepetition{Lasso.GammaLassoPath{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, LogitLink}, Lasso.NaiveCoordinateDescent{Float64, true, Matrix{Float64}, Lasso.RandomCoefficientIterator, Vector{Float64}}}, Float64}, Lasso.GammaLassoPath{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Poisson{Float64}, LogLink}, Lasso.NaiveCoordinateDescent{Float64, true, Matrix{Float64}, Lasso.RandomCoefficientIterator, Vector{Float64}}}, Float64}}}:
InclusionRepetition regression
Positive part regularization path (Poisson{Float64}(λ=1.0) with LogLink() link):
Not Fitted
Zero part regularization path (Binomial{Float64}(n=1, p=0.5) with LogitLink() link):
Binomial GammaLassoPath (1) solutions for 4 predictors in 31 iterations):
─────────────────────────
λ pct_dev ncoefs
─────────────────────────
[1] 0.0 0.17728 3
─────────────────────────
InclusionRepetition regression
Positive part regularization path (Poisson{Float64}(λ=1.0) with LogLink() link):
Poisson GammaLassoPath (1) solutions for 3 predictors in 16 iterations):
──────────────────────────
λ pct_dev ncoefs
──────────────────────────
[1] 0.0 0.217102 2
──────────────────────────
Zero part regularization path (Binomial{Float64}(n=1, p=0.5) with LogitLink() link):
Binomial GammaLassoPath (1) solutions for 4 predictors in 23 iterations):
──────────────────────────
λ pct_dev ncoefs
──────────────────────────
[1] 0.0 0.130285 3
──────────────────────────
InclusionRepetition regression
Positive part regularization path (Poisson{Float64}(λ=1.0) with LogLink() link):
Poisson GammaLassoPath (1) solutions for 3 predictors in 18 iterations):
──────────────────────────
λ pct_dev ncoefs
──────────────────────────
[1] 0.0 0.080976 2
──────────────────────────
Zero part regularization path (Binomial{Float64}(n=1, p=0.5) with LogitLink() link):
Binomial GammaLassoPath (1) solutions for 4 predictors in 16 iterations):
────────────────────────────
λ pct_dev ncoefs
────────────────────────────
[1] 0.0 0.00890169 3
────────────────────────────
InclusionRepetition regression
Positive part regularization path (Poisson{Float64}(λ=1.0) with LogLink() link):
Poisson GammaLassoPath (1) solutions for 3 predictors in 14 iterations):
───────────────────────────
λ pct_dev ncoefs
───────────────────────────
[1] 0.0 0.0935799 2
───────────────────────────
Zero part regularization path (Binomial{Float64}(n=1, p=0.5) with LogitLink() link):
Not Fitted
from hurdledmr.jl.
Thank you for your swift reply! Really appreciate it. However, it seems that HDMRPaths
would set local_cluster=false
and utilize the method hdmr_remote_cluster
. My machine would crash into blue screen whenever I set the number of parallel workers to be too big (>4). I am running a 36000*30000 count data with four covariates on i7-12700 with 64G Ram. Previously, I was able to finish the estimation with eight workers in 600 seconds when setting local_cluster=true
. Is there a workaround or I have to run the HDMRPaths
probably with no parallelization? Thank you.
from hurdledmr.jl.
The local cluster methods are more memory efficient, partly because they reuse the same shared memory and partly because they discard all the information about the individual fitted GammaLassoPath models keeping only the coefficients.
The remote cluster methods use more memory in total but may be better when you have access to a computing cluster with many low memory notes. It also returns an HDMRPaths struct that keeps all the fitted paths.
I would try a smaller number of cores or even no parallelization if all it takes is about 8 x 600 seconds.
from hurdledmr.jl.
Related Issues (11)
- Consider making coef() return a sparse array
- OverflowError in windows platform HOT 1
- TagBot trigger issue HOT 4
- Expose minAICc's keyword arguments to coef HOT 1
- Cannot load dataset from sotu.jl HOT 5
- Standardize covars matrix only once
- Tutorials need syntax upgrades
- Multithreading HOT 1
- Occasional failures on windows
- Re-organize test code with Jive.jl
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hurdledmr.jl.