Giter VIP home page Giter VIP logo

statsample's Introduction

Statsample

Build Status Code Climate Gem Version

Homepage :: https://github.com/sciruby/statsample

Installation

You should have a recent version of GSL and R (with the irr and Rserve libraries) installed. In Ubuntu:

$ sudo apt-get install libgsl0-dev r-base r-base-dev
$ sudo Rscript -e "install.packages(c('Rserve', 'irr'))"

With these libraries in place, just install from rubygems:

$ [sudo] gem install statsample

On *nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods.

$ [sudo] gem install statsample-optimization

If you need to work on Structural Equation Modeling, you could see +statsample-sem+. You need R with +sem+ or +OpenMx+ [http://openmx.psyc.virginia.edu/] libraries installed

$ [sudo] gem install statsample-sem

Testing

See CONTRIBUTING for information on testing and contributing to statsample.

Documentation

You can see the latest documentation in rubydoc.info.

Usage

Notebooks

You can see some iruby notebooks here:

Statistics

Visualizations

Working with DataFrame and Vector

Examples

See the /examples directory for some use cases. The notebooks listed above have mostly the same examples, and they look better so you might want to see that first.

Description

A suite for basic and advanced statistics on Ruby. Tested on CRuby 2.0.0, 2.1.1, 2.2 and 2.3.0 See .travis.yml for more information.

Include:

  • Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
  • Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and gamma. Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
  • Intra-class correlation
  • Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.
  • Tests: F, T, Levene, U-Mannwhitney.
  • Regression: Simple, Multiple (OLS)
  • Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.
  • Reliability analysis for simple scale and a DSL to easily analyze multiple scales using factor analysis and correlations, if you want it.
  • Basic time series support
  • Dominance Analysis, with multivariate dependent and bootstrap (Azen & Budescu)
  • Sample calculation related formulas
  • Structural Equation Modeling (SEM), using R libraries +sem+ and +OpenMx+
  • Creates reports on text, html and rtf, using ReportBuilder gem
  • Graphics: Histogram, Boxplot and Scatterplot

Principles

  • Software Design:
    • One module/class for each type of analysis
    • Options can be set as hash on initialize() or as setters methods
    • Clean API for interactive sessions
    • summary() returns all necessary informacion for interactive sessions
    • All statistical data available though methods on objects
    • All (important) methods should be tested. Better with random data.
  • Statistical Design
    • Results are tested against text results, SPSS and R outputs.
    • Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
    • (When possible) All references for methods are documented, providing sensible information on documentation

Features

  • Classes for manipulation and storage of data:
    • Uses daru for storing data and basic statistics.
    • Statsample::Multiset: multiple datasets with same fields and type of vectors
  • Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
  • Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
  • Multiple types of regression.
    • Simple Regression : Statsample::Regression::Simple
    • Multiple Regression: Statsample::Regression::Multiple
  • Factorial Analysis algorithms on Statsample::Factor module.
    • Classes for Extraction of factors:
      • Statsample::Factor::PCA
      • Statsample::Factor::PrincipalAxis
    • Classes for Rotation of factors:
      • Statsample::Factor::Varimax
      • Statsample::Factor::Equimax
      • Statsample::Factor::Quartimax
    • Classes for calculation of factors to retain
      • Statsample::Factor::ParallelAnalysis performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components.
      • Statsample::Factor::MAP performs Velicer's Minimum Average Partial (MAP) test, which retain components as long as the variance in the correlation matrix represents systematic variance.
  • Dominance Analysis. Based on Budescu and Azen papers, dominance analysis is a method to analyze the relative importance of one predictor relative to another on multiple regression
    • Statsample::DominanceAnalysis class can report dominance analysis for a sample, using uni or multivariate dependent variables
    • Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by Azen & Budescu (2003) link[http://psycnet.apa.org/journals/met/8/2/129/].
  • Module Statsample::Codification, to help to codify open questions
  • Converters to export data:
    • Statsample::Mx : Write Mx Files
    • Statsample::GGobi : Write Ggobi files
  • Module Statsample::Crosstab provides function to create crosstab for categorical data
  • Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
    • Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
    • Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
    • Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
  • Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
  • Module Statsample::Test provides several methods and classes to perform inferencial statistics
    • Statsample::Test::BartlettSphericity
    • Statsample::Test::ChiSquare
    • Statsample::Test::F
    • Statsample::Test::KolmogorovSmirnov (only D value)
    • Statsample::Test::Levene
    • Statsample::Test::UMannWhitney
    • Statsample::Test::T
    • Statsample::Test::WilcoxonSignedRank
  • Module Graph provides several classes to create beautiful graphs using rubyvis
    • Statsample::Graph::Boxplot
    • Statsample::Graph::Histogram
    • Statsample::Graph::Scatterplot
  • Gem bio-statsample-timeseries provides module Statsample::TimeSeries with support for time series, including ARIMA estimation using Kalman-Filter.
  • Gem statsample-sem provides a DSL to R libraries +sem+ and +OpenMx+
  • Gem statsample-glm provides you with GML method, to work with Logistic, Poisson and Gaussian regression ,using ML or IRWLS.
  • Close integration with gem reportbuilder, to easily create reports on text, html and rtf formats.

Resources

License

BSD-3 (See LICENSE.txt)

Could change between version, without previous warning. If you want a specific license, just choose the version that you need.

statsample's People

Contributors

agarie avatar blahah avatar clbustos avatar hstove avatar ismailm avatar jeremyevans avatar jkebinger avatar justin808 avatar kojix2 avatar lokeshh avatar mqzhang avatar onli avatar rdlugosz avatar robbrit avatar thagomizer avatar ukd1 avatar v0dro avatar vpereira avatar zhomart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statsample's Issues

Incompatibility with Ruby-2.0.0 ?

I get this error on running rnorm on Ruby-2.0.0.

lokeshh:~/workspace $ rvm use 2.0.0-p647
Using /home/ubuntu/.rvm/gems/ruby-2.0.0-p647
lokeshh:~/workspace $ irb
2.0.0-p647 :001 > require 'statsample'
 => true 
2.0.0-p647 :002 > Statsample::Shorthand.rnorm 10
NoMethodError: undefined method `to_h' for #<Enumerator: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]:each_with_index>
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/daru-0.1.3.1/lib/daru/index.rb:61:in `initialize'
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/daru-0.1.3.1/lib/daru/index.rb:32:in `block in new'
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/daru-0.1.3.1/lib/daru/index.rb:32:in `tap'
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/daru-0.1.3.1/lib/daru/index.rb:32:in `new'
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/daru-0.1.3.1/lib/daru/vector.rb:1246:in `try_create_index'
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/daru-0.1.3.1/lib/daru/vector.rb:110:in `initialize'
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/daru-0.1.3.1/lib/daru/vector.rb:144:in `new'
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/daru-0.1.3.1/lib/daru/vector.rb:144:in `new_with_size'
        from /home/ubuntu/.rvm/gems/ruby-2.0.0-p647/gems/statsample-2.0.2/lib/statsample/shorthand.rb:45:in `rnorm'
        from (irb):2
        from /home/ubuntu/.rvm/rubies/ruby-2.0.0-p647/bin/irb:12:in `<main>'

Fix dependencies before 2.0.0 release

We can't release a new gem version with Statsample depending on 'gsl-nmatrix' and 'nmatrix'. They must be optional dependencies, like rb-gsl was.

The solution should be straightforward: move the deps from the gemspec to the Gemfile and update the post-install message so people know that Statsample can make use of NMatrix and gsl-nmatrix. Thus, someone without NMatrix and GSL installed could do a gem install statsample -v '0.2.0' successfully.

This is mandatory before a new version is released.

ping @v0dro

`require': iconv will be deprecated in the future, use String#encode instead

(Original: clbustos/statsample#9)

/Users/mind/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': iconv will be deprecated in the future, use String#encode instead.

ruby1.9.3:

require 'statsample'
# Note R like generation of random gaussian variable
# and correlation matrix

ss_analysis("Statsample::Bivariate.correlation_matrix") do
  samples=1000
  ds=data_frame(
    'a'=>rnorm(samples), 
    'b'=>rnorm(samples),
    'c'=>rnorm(samples),
    'd'=>rnorm(samples))
  cm=cor(ds) 
  summary(cm)
end

Use a Gemspec instead of the Hoe gem

The Hoe gem (see Rakefile) is used to package the gem for rubygems. However, a gemspec separates tasks from the gem definition and metadata and is considered a best practice.

Recheck dependencies for unused stuff

There are lots of dependencies in the gemspec. Two questions that need to be answered:

  1. Are they really needed? (probably yes)
  2. If so, what version constraints we should enforce?

I think we need to go dependency-by-dependency and verify how lax the version requirement can be, thus updating the gemspec to reflect that.

NoMethodError - undefined method `zero?' for nil:NilClass:

After updating to 2.0.1, I have this issue :

NoMethodError - undefined method `zero?' for nil:NilClass:
  activerecord (3.2.22) lib/active_record/associations/alias_tracker.rb:30:in `aliased_name_for'
  activerecord (3.2.22) lib/active_record/associations/alias_tracker.rb:18:in `aliased_table_for'
  activerecord (3.2.22) lib/active_record/associations/join_helper.rb:15:in `block in construct_tables'
  activerecord (3.2.22) lib/active_record/associations/join_helper.rb:14:in `construct_tables'
  activerecord (3.2.22) lib/active_record/associations/association_scope.rb:37:in `add_constraints'
  activerecord (3.2.22) lib/active_record/associations/association_scope.rb:31:in `scope'
  activerecord (3.2.22) lib/active_record/associations/association.rb:99:in `association_scope'
  activerecord (3.2.22) lib/active_record/associations/association.rb:88:in `scoped'
  activerecord (3.2.22) lib/active_record/associations/singular_association.rb:42:in `find_target'
  activerecord (3.2.22) lib/active_record/associations/association.rb:151:in `load_target'
  activerecord (3.2.22) lib/active_record/associations/association.rb:56:in `reload'
  activerecord (3.2.22) lib/active_record/associations/singular_association.rb:9:in `reader'
  activerecord (3.2.22) lib/active_record/associations/builder/association.rb:44:in `block in define_readers'
  app/controllers/user_sessions_controller.rb:22:in `create'
  actionpack (3.2.22) lib/action_controller/metal/implicit_render.rb:4:in `send_action'
  actionpack (3.2.22) lib/abstract_controller/base.rb:167:in `process_action'
  actionpack (3.2.22) lib/action_controller/metal/rendering.rb:10:in `process_action'
  actionpack (3.2.22) lib/abstract_controller/callbacks.rb:18:in `block in process_action'
  activesupport (3.2.22) lib/active_support/callbacks.rb:415:in `block in _run__4248930996535680107__process_action__60857023007713096__callbacks'
  activesupport (3.2.22) lib/active_support/callbacks.rb:215:in `block in _conditional_callback_around_517'
  activesupport (3.2.22) lib/active_support/callbacks.rb:326:in `around'
  activesupport (3.2.22) lib/active_support/callbacks.rb:310:in `_callback_around_13'
  activesupport (3.2.22) lib/active_support/callbacks.rb:214:in `_conditional_callback_around_517'
  activesupport (3.2.22) lib/active_support/callbacks.rb:414:in `_run__4248930996535680107__process_action__60857023007713096__callbacks'
  activesupport (3.2.22) lib/active_support/callbacks.rb:405:in `__run_callback'
  activesupport (3.2.22) lib/active_support/callbacks.rb:385:in `_run_process_action_callbacks'
  activesupport (3.2.22) lib/active_support/callbacks.rb:81:in `run_callbacks'
  actionpack (3.2.22) lib/abstract_controller/callbacks.rb:17:in `process_action'
  actionpack (3.2.22) lib/action_controller/metal/rescue.rb:29:in `process_action'
  actionpack (3.2.22) lib/action_controller/metal/instrumentation.rb:30:in `block in process_action'
  activesupport (3.2.22) lib/active_support/notifications.rb:123:in `block in instrument'
  activesupport (3.2.22) lib/active_support/notifications/instrumenter.rb:20:in `instrument'
  activesupport (3.2.22) lib/active_support/notifications.rb:123:in `instrument'
  actionpack (3.2.22) lib/action_controller/metal/instrumentation.rb:29:in `process_action'
  actionpack (3.2.22) lib/action_controller/metal/params_wrapper.rb:207:in `process_action'
  activerecord (3.2.22) lib/active_record/railties/controller_runtime.rb:18:in `process_action'
  actionpack (3.2.22) lib/abstract_controller/base.rb:121:in `process'
  actionpack (3.2.22) lib/abstract_controller/rendering.rb:45:in `process'
  actionpack (3.2.22) lib/action_controller/metal.rb:203:in `dispatch'
  actionpack (3.2.22) lib/action_controller/metal/rack_delegation.rb:14:in `dispatch'
  actionpack (3.2.22) lib/action_controller/metal.rb:246:in `block in action'
  actionpack (3.2.22) lib/action_dispatch/routing/route_set.rb:73:in `dispatch'
  actionpack (3.2.22) lib/action_dispatch/routing/route_set.rb:36:in `call'
  journey (1.0.4) lib/journey/router.rb:68:in `block in call'
  journey (1.0.4) lib/journey/router.rb:56:in `call'
  actionpack (3.2.22) lib/action_dispatch/routing/route_set.rb:608:in `call'
  exception_notification (4.1.1) lib/exception_notification/rack.rb:32:in `call'
  dragonfly (1.0.3) lib/dragonfly/middleware.rb:14:in `call'
  meta_request (0.3.4) lib/meta_request/middlewares/app_request_handler.rb:13:in `call'
  meta_request (0.3.4) lib/meta_request/middlewares/meta_request_handler.rb:13:in `call'
  sass (3.2.6) lib/sass/plugin/rack.rb:54:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/best_standards_support.rb:17:in `call'
  rack (1.4.7) lib/rack/etag.rb:23:in `call'
  rack (1.4.7) lib/rack/conditionalget.rb:35:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/head.rb:14:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/params_parser.rb:21:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/flash.rb:242:in `call'
  rack (1.4.7) lib/rack/session/abstract/id.rb:210:in `context'
  rack (1.4.7) lib/rack/session/abstract/id.rb:205:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/cookies.rb:341:in `call'
  activerecord (3.2.22) lib/active_record/query_cache.rb:64:in `call'
  activerecord (3.2.22) lib/active_record/connection_adapters/abstract/connection_pool.rb:479:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/callbacks.rb:28:in `block in call'
  activesupport (3.2.22) lib/active_support/callbacks.rb:405:in `_run__5345949111146849__call__2971314789955665953__callbacks'
  activesupport (3.2.22) lib/active_support/callbacks.rb:405:in `__run_callback'
  activesupport (3.2.22) lib/active_support/callbacks.rb:385:in `_run_call_callbacks'
  activesupport (3.2.22) lib/active_support/callbacks.rb:81:in `run_callbacks'
  actionpack (3.2.22) lib/action_dispatch/middleware/callbacks.rb:27:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/reloader.rb:65:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/remote_ip.rb:31:in `call'
  rack-contrib (1.1.0) lib/rack/contrib/response_headers.rb:17:in `call'
  meta_request (0.3.4) lib/meta_request/middlewares/headers.rb:16:in `call'
  better_errors (2.0.0) lib/better_errors/middleware.rb:84:in `protected_app_call'
  better_errors (2.0.0) lib/better_errors/middleware.rb:79:in `better_errors_call'
  better_errors (2.0.0) lib/better_errors/middleware.rb:57:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/debug_exceptions.rb:16:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/show_exceptions.rb:56:in `call'
  railties (3.2.22) lib/rails/rack/logger.rb:32:in `call_app'
  railties (3.2.22) lib/rails/rack/logger.rb:18:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/request_id.rb:22:in `call'
  rack (1.4.7) lib/rack/methodoverride.rb:21:in `call'
  dragonfly (1.0.3) lib/dragonfly/cookie_monster.rb:9:in `call'
  rack (1.4.7) lib/rack/runtime.rb:17:in `call'
  rack (1.4.7) lib/rack/lock.rb:15:in `call'
  actionpack (3.2.22) lib/action_dispatch/middleware/static.rb:83:in `call'
  railties (3.2.22) lib/rails/engine.rb:484:in `call'
  railties (3.2.22) lib/rails/application.rb:231:in `call'
  railties (3.2.22) lib/rails/railtie/configurable.rb:30:in `method_missing'
  rack-cors (0.4.0) lib/rack/cors.rb:80:in `call'
  rack (1.4.7) lib/rack/content_length.rb:14:in `call'
  unicorn (4.8.3) lib/unicorn/http_server.rb:576:in `process_client'
  unicorn (4.8.3) lib/unicorn/http_server.rb:670:in `worker_loop'
  unicorn (4.8.3) lib/unicorn/http_server.rb:525:in `spawn_missing_workers'
  unicorn (4.8.3) lib/unicorn/http_server.rb:140:in `start'
  unicorn-rails (2.2.0) lib/unicorn_rails.rb:33:in `run'
  rack (1.4.7) lib/rack/server.rb:268:in `start'
  railties (3.2.22) lib/rails/commands/server.rb:70:in `start'
  railties (3.2.22) lib/rails/commands.rb:55:in `block in <top (required)>'
  railties (3.2.22) lib/rails/commands.rb:50:in `<top (required)>'
  script/rails:6:in `<main>'

Seems to be related to : clbustos#45

Migrate tests to rspec

Of course this open for discussion, but I think we should be using RSpec instead of minitest. I feel its easier to work with.

Add a "strict mode" to cronbach_alpha

According to Claudio in clbustos/statsample#11, tests in which any vector of responses for a specific item has zero variance should be considered a mistake, at least in a strict_mode.

So: implement a strict_mode flag that raises a Statsample::ZeroVarianceVectorException (feel free to choose a better name) if any of the vectors in cronbach_alpha has zero variance.

Support for categorical variables in regression

Categorical (as opposed to numeric) variables are ubiquitous in data analysis and linear regression, but they seem not to be supported by Statsample::Regression.
Here is an example of what I mean:

In R, I can do:

> head(fake.salaries)
      salary years ethnicity
1  5.0823594     9     black
2 -0.4459633     3     black
3 16.0734587     2     white
4 10.5554305     7     other
5  9.9438798     8     other
6  9.6776724     6    latino
> mod <- lm(salary ~ years + ethnicity, fake.salaries)
> summary(mod)

Call:
lm(formula = salary ~ years + ethnicity, data = fake.salaries)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5068 -1.1283 -0.3713  1.1227  3.3027 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        1.5421     0.9851   1.565    0.131    
years              0.1729     0.1561   1.108    0.279    
ethnicitylatino    6.7300     0.9984   6.741 5.67e-07 ***
ethnicitymexican   5.4826     0.8755   6.262 1.79e-06 ***
ethnicityother     6.6404     0.9034   7.351 1.37e-07 ***
ethnicitywhite    11.5310     0.9309  12.387 6.46e-12 ***

---
Signif. codes:  0 โ€˜***โ€™ 0.001 โ€˜**โ€™ 0.01 โ€˜*โ€™ 0.05 โ€˜.โ€™ 0.1 โ€˜ โ€™ 1

Residual standard error: 1.66 on 24 degrees of freedom
Multiple R-squared:  0.8761,    Adjusted R-squared:  0.8503 
F-statistic: 33.95 on 5 and 24 DF,  p-value: 3.942e-10

We see that lm regards the variable "ethnicity" as a categorical variable and fits a model accordingly. We can see in the output that in this case it takes ethnicity "black" as the base level, and that all other ethnicities have a statistically significant effect on "salary" (with p-values of 1e-6 or smaller) when compared to the base level.

When I try to analyse the same data in Statsample:

pry(main)> df = Statsample::CSV.read("/home/alexej/Desktop/fake_salaries.csv")
=> #<Statsample::Dataset:69956503513460 @name=Dataset 1 @fields=[salary,years,ethnicity] cases=30
pry(main)> mod = Statsample::Regression.multiple(df, 'salary')
NoMethodError: NoMethodError
from /home/alexej/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/statsample-1.5.0/lib/statsample/vector.rb:186:in `_check_type'

So, "NoMethodError". And when I delete "ethinicity", the model can be fit:

pry(main)> df.delete_vector("ethnicity")
=> ["ethnicity"]
pry(main)> mod = Statsample::Regression.multiple(df, 'salary')
=> #<Statsample::Regression::Multiple::RubyEngine:0x007f4008733620
> puts mod.summary
= Multiple reggresion of years on salary
  Engine: Statsample::Regression::Multiple::RubyEngine
  Cases(listwise)=30(30)
  R=0.061
  R^2=0.004
  R^2 Adj=-0.032
  Std.Error R=4.358
  Equation=7.046 + 0.125years
  == ANOVA
    ANOVA Table
+------------+---------+----+--------+-------+-------+
|   source   |   ss    | df |   ms   |   f   |   p   |
+------------+---------+----+--------+-------+-------+
| Regression | 1.979   | 1  | 1.979  | 0.104 | 0.749 |
| Error      | 531.824 | 28 | 18.994 |       |       |
| Total      | 533.804 | 29 | 20.973 |       |       |
+------------+---------+----+--------+-------+-------+

  Beta coefficients
+----------+-------+-------+-------+-------+
|  coeff   |   b   | beta  |  se   |   t   |
+----------+-------+-------+-------+-------+
| Constant | 7.046 | -     | 2.233 | 3.155 |
| years    | 0.125 | 0.061 | 0.386 | 0.323 |
+----------+-------+-------+-------+-------+

This issue possibly allows for a common solution with SciRuby/statsample-glm#11 and SciRuby/daru#9.

Too many dependencies

There are too many gems that statsample depends on.Specifically, report-builder. I do not like installing old version of prawn. I know that many people use report builder. However, I think that more people simply use pry or jupyter. I think that installation of report builder should be optional.

Use of depraced DARU methods in statsample

Hi,

the last line in the following code raised an error:

require 'statsample'
a = Daru::Vector.new([1, 2, 3, 4, 5])
b = Daru::Vector.new([6, 7, 8, 9, 10])
t_2=Statsample::Test::T::TwoSamplesIndependent.new(a,b)
t_2.summary`

Messages:

Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:291.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:292.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/levene.rb:51.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/levene.rb:51.
NOTE: Daru::Vector#only_valid is deprecated; use reject_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#only_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/levene.rb:60.
NOTE: Daru::Vector#only_valid is deprecated; use reject_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#only_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/levene.rb:60.
NOTE: Daru::Vector#only_valid is deprecated; use reject_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#only_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/levene.rb:71.
NOTE: Daru::Vector#only_valid is deprecated; use reject_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#only_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/levene.rb:71.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:267.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:267.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:269.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:269.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:271.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:271.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:272.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:272.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:281.
NOTE: Daru::Vector#n_valid is deprecated; use count_values instead. It will be removed on or after 2016-10-01.
Daru::Vector#n_valid called from C:/Ruby21-x64/lib/ruby/gems/2.1.0/gems/statsample-2.0.2/lib/statsample/test/t.rb:282.

I'm using ruby 2.1.7p400 (2015-08-18 revision 51632) [x64-mingw32] with following gems installed:

ansi (1.5.0)
ast (2.2.0)
awesome_print (1.7.0)
backports (3.6.8)
bigdecimal (default: 1.2.4)
clbustos-rtf (0.4.2)
daru (0.1.4.1)
debase (0.2.2.beta8, 0.2.2.beta6, 0.2.1, 0.1.4)
debase-ruby_core_source (0.8.0)
dicom (0.9.6)
dirty-memoize (0.0.4)
distribution (0.7.3)
extendmatrix (0.4)
interpolate (0.3.0)
interpolation (0.0.2)
io-console (default: 0.4.3)
json (default: 1.8.1)
minimization (0.2.3)
minitest (default: 4.7.5)
oga (2.0.0)
ox (2.2.3, 2.2.2)
parallel (1.6.1)
prawn (0.8.4)
prawn-core (0.8.4)
prawn-layout (0.8.4)
prawn-security (0.8.4)
prawn-svg (0.9.1.11)
psych (default: 2.0.5)
rake (default: 10.1.0)
rdoc (default: 4.1.0)
reportbuilder (1.4.2)
rserve-client (0.3.1)
ruby-debug-ide (0.6.1.beta2, 0.6.0, 0.4.32)
ruby-ll (2.1.2)
ruby-ole (1.2.12)
rubyvis (0.6.1)
spreadsheet (1.1.4)
statsample (2.0.2)
test-unit (default: 2.1.7.0)
text-table (1.2.4)
xml-simple (1.1.5)

Cheers,
Bertram

Returning NaN for simple multiple linear regression case in 1.4.1

I'm finding some unexpected behaviour in the 1.4.1, which was not occurring in 1.4.0. (I've tried to keep to the format of some of the tests in the test suite in the example):

    @a=[27.0, 12.0, 16.0, 25.0].to_vector(:scale)
    @b=[10.0, 15.0, 19.0, 2.0].to_vector(:scale)
    @y=[1, 1, 1, 1].to_vector(:scale)

    ds={'a'=>@a,'b'=>@b,'y'=>@y}.to_dataset

    lr=Statsample::Regression::Multiple::RubyEngine.new(ds,'y')

    assert(!lr.r.nan?, "r should not be NaN")
    assert(!lr.r2.nan?, "r2 should not be NaN")
    lr.coeffs.each do |(coeff_key, coeff_value)|
      assert(!coeff_value.nan?, "#{coeff_key} should not be NaN")
    end

I've added this as a test on a fork: https://github.com/einpaule/statsample .

Can someone confirm this is an issue?

Monkey-patched `Array#sum` method changes/breaks Ruby 2.4 method functionality

In lib/statsample.rb, the core Ruby Array class is monkey-patched to change the behavior of the sum method:

def sum
inject(:+)
end

Active Support (included by Rails) already defines behavior for #sum, which has slightly different behavior and can accept a block for evaluation: https://github.com/rails/rails/blob/3d716b9e66e334c113c98fb3fc4bcf8a945b93a1/activesupport/lib/active_support/core_ext/enumerable.rb#L2-L27

Similarly, the Ruby core library added the #sum method to the Enumerable class in Ruby 2.4:

As such, for example, the following code works both with Active Support and/or Ruby 2.4:

> x = ['foo', 'bar']
#=> ["foo", "bar"]
> x.sum(&:bytesize)
#=> 6

But fails when the statsample 2.0 library is included in the project's Gemfile:

> x = ['foo', 'bar']
#=> ["foo", "bar"]
> x.sum(&:bytesize)
#=> "foobar"

Can this monkey-patching be removed and the suming functionality be done directly only where needed? This is preventing me from using statsample, unfortunately.

Error when running multi variable regression from a CSV: `_check_type': NoMethodError

I'm not really sure what's happening here. For some reason if the y value is high precision and I try to run a regression on it, I get this error. However this only works if the high precision data is in a CSV. It won't work as a number literal so it could be an issue with the CSV module itself. I should note that I'm running on Windows (8.1, 64 bit) so this might be an implementation issue.

The simplest code I can get this issue down to is as follows:

require 'statsample'
ds = Statsample::CSV.read("input.txt")
regression = Statsample::Regression.multiple(ds,'y')

And you'll need a file "input.txt" in the same directory with the following data:

x,y
1,9.629587310436753e+127
2,1.9341543147883677e+129
3,3.88485279048245e+130

The full stack trace is:

E:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/statsample-1.4.0/lib/statsample/vector.rb
:161:in `_check_type': NoMethodError (NoMethodError)
        from E:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/statsample-1.4.0/lib/statsam
ple/vector.rb:155:in `check_type'
        from E:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/statsample-1.4.0/lib/statsam
ple/vector.rb:911:in `mean'
        from E:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/statsample-1.4.0/lib/statsam
ple/regression/multiple/rubyengine.rb:23:in `initialize'
        from E:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/statsample-1.4.0/lib/statsam
ple/regression.rb:62:in `new'
        from E:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/statsample-1.4.0/lib/statsam
ple/regression.rb:62:in `multiple'
        from script.rb:3:in `<main>'

Ruby 3.1 no longer includes matrix in the stdlib

As of Ruby 3.1, the matrix gem is no longer distributed as a part of the standard library and must be bundled explicitly.

I believe the fix is straightforward: include matrix in the gemspec file. However, this should probably be tested on a couple of different Ruby versions in order to verify the change doesn't cause problems for older Rubies.

In the meantime, users of the statsample gem can add matrix to their Gemfile and you'll satisfy the missing dependency. For example:

# https://github.com/SciRuby/statsample
gem "distribution"
gem "prime" # No longer in stdlib as of Ruby 3.1

gem "statsample"
gem "matrix" # No longer in stdlib as of Ruby 3.1

Note: similar issue opened on the distribution gem

Trouble with Statsample::Bivariate#correlation_matrix

(Original: clbustos/statsample#17)

Hi, I'm in trouble with statsample to do PCA analysis for large data. Does anyone have any good idea?

I want to do PCA alanysis with very large data. (3000 variables, 50 samples)
Then, I wrote this code.

data_raw = IO.readlines('data1.txt').map{|v| v.split }[1..-1]

hash_tmp = {}

data_raw[1..3000].each do |ary|
  hash_tmp[ary[0]] = ary[1..-1].map(&:to_i).to_scale
end

ds = hash_tmp.to_dataset

puts "Input data done!"

cor_matrix=Statsample::Bivariate.correlation_matrix(ds)

puts "cor_matrix was prepared."

pca=Statsample::Factor::PCA.new(cor_matrix)

binding.pry

But the ruby on my mac doesn't return "Cor_matrix was prepared.".
I wrote another code to investigate a cause of this.

# Opening Class to investigate where is bottleneck
module Statsample
  module Bivariate
    class << self
      def covariance_matrix_optimized(ds)
        x=ds.to_gsl
        n=x.row_size
        m=x.column_size
        puts "calculating means..."
        means=((1/n.to_f)*GSL::Matrix.ones(1,n)*x).row(0)
        puts "centering matrix..."
        centered=x-(GSL::Matrix.ones(n,m)*GSL::Matrix.diag(means))
        puts "calculating covariance matrix..."
        ss=centered.transpose*centered
        puts "calculating n..."
        s=((1/(n-1).to_f))*ss
        puts "done!"              #<= This line has executed
        s
      end



      def correlation_matrix(ds)
        vars,cases=ds.fields.size,ds.cases
        if !ds.has_missing_data? and Statsample.has_gsl? and prediction_optimized(vars,cases) < prediction_pairwise(vars,cases)
          binding.pry
          cm=correlation_matrix_optimized(ds)
          binding.pry             #<= This line hasn't executed. :(
        else
          cm=correlation_matrix_pairwise(ds)
        end
        binding.pry
        cm.extend(Statsample::CovariateMatrix)
        binding.pry
        cm.fields=ds.fields
        binding.pry
        cm
      end
    end
  end
end

Then the Ruby return until "done!" and doesn't return from Statsample::Bivariate#covariance_matrix_optimized method.
I haven't seen a Ruby method which doesn't return.

If someone knows a way to solve this problem or investigate cause deeply, please tell me.

Splitting statsample into smaller chunks

I think it would be nice if statsample could be restructured into many extensions...

so if you simply need the wilcoxon test, you do the following:

require 'statsample/wilcoxon_test'

I believe that this would mean that dependencies are only installed when needed...

e.g. Based on my understanding the GSL library (and rb-gsl) are dependencies that are only required for Factorial analysis and polychorical correlation (according the Readme on the original fork). However, it is necessary to install these dependencies whether you are using Factorial analysis and polychorical correlation or not...

Change GSL dependency to nmatrix-GSL

An narray dependency issue mainly arises due to statsample's dependence on GSL. Maybe if we could use the nmatrix fork this will vanish. This way nmatrix will also work without name clashes.

Is the nmatrix GSL fork stable enough for this?

Removing linear regression support in Statsample

I think its reasonable to remove linear regression support in Statsample because of following reasons:

  • Its equivalent which is Normal regression lies in Statsample-GLM
  • This way all regression models would lie in Statsample-GLM
  • Statsample-GLM already supports predicting on new data which Statsample doesn't yet
  • Normal regression in Statsample-GLM works without an intercept providing more flexibility
  • This will keep regression models management limited to Statsample-GLM and Statsample gem would be responsible for other statistic tools other than regression, hence simplifying the management.

Replace Dataset and Vector with daru

Statsample currently uses its own data structures for storage and manipulation of data. This limits the scope of their implementation and scaling them would also be difficult in the long term.

Replacing these data structures (Statsample::Dataset and Statsample::Vector) with a dedicated data frame library like daru would solve these problems.

Vector#histogram broken

Vector#histogram fails to populate histogram when bins parameter is an Array.
Proposed fix pull request #67 .

Upgrade from Version 1.4.0 --> 1.4.2 introduces rb-gsl dependency, downgrades spreadsheet

[As mentioned in my issue in the old repo]โ€ฆ

It appears that going from 1.4.0 to 1.4.2 creates some unexpected dependency changes that should not be a part of a Patch-level release:

  1. The rb-gsl gem is now required. This is problematic on Heroku where it cannot be easily built.
  2. The spreadsheet gem gets pinned to an older Minor release; on my app this mandated a downgrade of spreadsheet from 0.9.9 to 0.6.9. Not sure if this is intentional or matters, but it was surprising if nothing else.
  3. If the minimization gem is not pinned to 0.2.1 a similar dependency is created with the bump to 0.2.2. I have opened an issue over there.

Here's a peek at the diff of Gemfile.lock when statsample ~> 1.4.0 and mimization = 0.2.1:

-    statsample (1.4.0)
-      dirty-memoize (~> 0.0)
+    statsample (1.4.2)
+      awesome_print
+      dirty-memoize
       distribution
-      extendmatrix (~> 0.3.1)
-      fastercsv (> 0)
-      minimization (~> 0.2.0)
+      extendmatrix
+      minimization
+      rb-gsl
       reportbuilder (~> 1.4)
       rserve-client
-      rubyvis
-      spreadsheet (~> 0.6)
-      statsample-bivariate-extension (> 0)
-    statsample-bivariate-extension (1.1.0)
-      distribution (~> 0.6)
+      rubyvis (~> 0.5.0)
+      spreadsheet (~> 0.6.5)

" `display': undefined method `session' for nil:NilClass" running correlation

Hi there,

I'm trying to create a correlation matrix using the example here:

http://nbviewer.ipython.org/github/SciRuby/sciruby-notebooks/blob/master/Statistics/Correlation%20Matrix%20with%20daru%20and%20statsample.ipynb

I've got iruby installed, ipython, jupyter-console all installed. I can run iruby from the command line with no errors. I'm trying to run the above example from a file like so:

$ ruby lib/correlation.rb 
/Users/tansaku/.rvm/gems/ruby-2.2.2/gems/iruby-0.2.7/lib/iruby/utils.rb:8:in `display': undefined method `session' for nil:NilClass (NoMethodError)
    from lib/correlation.rb:34:in `block in <main>'

and I'm not sure what to do with the above error that corresponds to this line of code: IRuby.display ds.head

Any ideas? Should I be working through the IRuby console?

Many thanks in advance

Add support for Krippendorff's alpha reliability coefficient

(Original: clbustos/statsample#13)

(First of all, thanks so much for this package!
While it took me some time to get used to it, it's super handy even for simple statistics in business applications.)

I might lack the knowledge to combine the pieces of this framework, but I'm missing explicit support for measuring reliability between raters, e.g. Krippendorff's alpha.

R provides a nice interface for this here.

Thanks again for your efforts and time in supporting this project!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.