Giter VIP home page Giter VIP logo

girth's Introduction

girth-tests Actions Status codecov CodeFactor PyPI version PyPI - Downloads License: MIT DOI

GIRTH: Item Response Theory

GIRTH

Girth is a python package for estimating item response theory (IRT) parameters. In addition, synthetic IRT data generation is supported. Below is a list of available functions, for more information visit the GIRTH homepage.

Interested in Bayesian Models? Check out girth_mcmc. It provides markov chain and variational inference estimation methods.

Need general statistical support? Download my other project RyStats which implements commonly used statistical functions. These functions are also implemented in an interactive webapp GoFactr.com without the need to download or install software.

Item Response Theory

Unidimensional Models

Dichotomous Models

  1. Rasch Model
    • Joint Maximum Likelihood
    • Conditional Likelihood
    • Marginal Maximum Likelihood
  2. One Parameter Logistic Models
    • Joint Maximum Likelihood
    • Marginal Maximum Likelihood
  3. Two Parameter Logistic Models
    • Joint Maximum Likelihood
    • Marginal Maximum Likelihood
    • Mixed Expected A Prior / Marginal Maximum Likelihood
  4. Three Parameter Logistic Models
    • Marginal Maximum Likelihood (No Optimization and Minimal Support)

Polytomous Models

  1. Graded Response Model
    • Joint Maximum Likelihood
    • Marginal Maximum Likelihood
    • Mixed Expected A Prior / Marginal Maximum Likelihood
  2. Partial Credit Model
    • Joint Maximum Likelihood
    • Marginal Maximum Likelihood
  3. Graded Unfolding Model
    • Marginal Maximum Likelihood

Ablity Estimation

  1. Dichotomous
    • Maximum Likelihood Estimation
    • Maximum a Posteriori Estimation
    • Expected a Posteriori Estimation
  2. Polytomous
    • Expected a Posteriori Estimation

Multidimensional Models

  1. Two Parameter Logistic Models
    • Marginal Maximum Likelihood
  2. Graded Response Model
    • Marginal Maximum Likelihood

Ablity Estimation

  1. Dichotomous
    • Maximum a Posteriori Estimation
    • Expected a Posteriori Estimation
  2. Polytomous
    • Maximum a Posteriori Estimation
    • Expected a Posteriori Estimation

Supported Synthetic Data Generation

Unidimensional

  1. Rasch / 1PL Models Dichotomous Models
  2. 2 PL Dichotomous Models
  3. 3 PL Dichotomous Models
  4. Graded Response Model Polytomous
  5. Partial Credit Model Polytomous
  6. Graded Unfolding Model Polytomous

Multidimensional

  1. Two Parameters Logisitic Models Dichotomous
  2. Graded Response Models Polytomous

Usage

Standard Estimation

To run girth with unidimensional models.

import numpy as np

from girth.synthetic import create_synthetic_irt_dichotomous
from girth import twopl_mml

# Create Synthetic Data
difficulty = np.linspace(-2.5, 2.5, 10)
discrimination = np.random.rand(10) + 0.5
theta = np.random.randn(500)

syn_data = create_synthetic_irt_dichotomous(difficulty, discrimination, theta)

# Solve for parameters
estimates = twopl_mml(syn_data)

# Unpack estimates
discrimination_estimates = estimates['Discrimination']
difficulty_estimates = estimates['Difficulty']

Missing Data

Missing data is supported with the tag_missing_data function.

from girth import tag_missing_data, twopl_mml

# import data (you supply this function)
my_data = import_data(filename)

# Assume its dichotomous data with True -> 1 and False -> 0
tagged_data = tag_missing_data(my_data, [0, 1])

# Run Estimation
results = twopl_mml(tagged_data)

Multidimensional Estimation

GIRTH supports multidimensional estimation but these estimation methods suffer from the curse of dimensionality, using more than 3 factors takes a considerable amount of time

import numpy as np

from girth.synthetic import create_synthetic_irt_dichotomous
from girth import multidimensional_twopl_mml

# Create Synthetic Data
discrimination = np.random.uniform(-2, 2, (20, 2))
thetas = np.random.randn(2, 1000)
difficulty = np.linspace(-1.5, 1, 20)

syn_data = create_synthetic_irt_dichotomous(difficulty, discrimination, thetas)

# Solve for parameters
estimates = multidimensional_twopl_mml(syn_data, 2, {'quadrature_n': 21})

# Unpack estimates
discrimination_estimates = estimates['Discrimination']
difficulty_estimates = estimates['Difficulty']

Standard Errors

GIRTH does not use typical hessian based optimization routines and, therefore, currently has limited support for standard errors. Confidence Intervals based on bootstrapping are supported but take longer to run. Missing Data is supported in the bootstrap function as well.

The bootstrap does not support the 3PL IRT Model or the GGUM.

from girth import twopl_mml, standard_errors_bootstrap

# import data (you supply this function)
my_data = import_data(filename)

results = standard_errors_bootstrap(my_data, twopl_mml, n_processors=4,
                                    bootstrap_iterations=1000)

print(results['95th CI']['Discrimination'])                                    

Factor Analysis

Factor analysis is another common method for latent variable exploration and estimation. These tools are helpful for understanding dimensionality or finding initial estimates of item parameters.

Factor Analysis Extraction Methods

  1. Principal Component Analysis
  2. Principal Axis Factor
  3. Minimum Rank Factor Analysis
  4. Maximum Likelihood Factor Analysis

Example

import girth.factoranalysis as gfa

# Assume you have converted data into correlation matrix
n_factors = 3
results = gfa.maximum_likelihood_factor_analysis(corrleation, n_factors)

print(results)

Polychoric Correlation Estimation

When collected data is ordinal, Pearson's correlation will provide biased estimates of the correlation. Polychoric correlations estimate the correlation given that the data is ordinal and normally distributed.

import girth.synthetic as gsyn
import girth.factoranalysis as gfa
import girth.common as gcm

discrimination = np.random.uniform(-2, 2, (20, 2))
thetas = np.random.randn(2, 1000)
difficulty = np.linspace(-1.5, 1, 20)

syn_data = gsyn.create_synthetic_irt_dichotomous(difficulty, discrimination, thetas)

polychoric_corr = gcm.polychoric_correlation(syn_data, start_val=0, stop_val=1)

results_fa = gfa.maximum_likelihood_factor_analysis(polychoric_corr, 2)

Support

Installation

Via pip

pip install girth --upgrade

From Source

pip install . -t $PYTHONPATH --upgrade

Dependencies

We recommend the anaconda environment which can be installed here

  • Python โ‰ฅ 3.8
  • Numpy
  • Scipy

Unittests

pytest with coverage.py module

pytest --cov=girth --cov-report term

Contact

Please contact me with any questions or feature requests. Thank you!

Ryan Sanchez
[email protected]

License

MIT License

Copyright (c) 2021 Ryan C. Sanchez

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

girth's People

Contributors

eribean avatar karloskar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

girth's Issues

January Code Clean Up

The SLOC is getting too high. I'm going to pause enhancements/features for January and just focus on cleaning up the code base. Plans include:

  • Removing the Dichotomous "FULL" methods
  • Removing the 2PL mml estimation over the grm
  • Removing "irt_evaluation" and using scipy.special.expit function
  • updated 1PL to use the integral function from grm
  • Updating Website to reflect work
  • (Maybe) Refactor base for easier function finding

Refactor dichotomous uni-dimensional partial integral

The partial integration found in utils is a 3D numpy array. This was done for speed but for appreciable sizes it can quickly hit memory limits. Refactor this into two methods, one with a for loop and one with a 2D array.

Add GGUM synthetic creation

Per discussion with Dr. Roberts, add generalized graded unfolding model parameter estimation, the first step is adding synthetic creation. Synthetic data can be created by adding a private function to polytomous creation

Multidimensional IRT

Investigate numerical estimation techniques for the very simple model of 2PL multi-dimensional model.

Add GUM parameter estimation

Add the graded unfolding model parameter estimation methods:

  • Joint Maximum Likelihood
  • Marginal Maximum Likelihood
  • Marginal Maximum A Posterior

JML methods incorrect computation

There is a bug in Joint maximum likelihood estimation function, there is an np.outer computation that should not be there. This doesn't impact Rasch/OnePL but does impact TwoPL when the 2 dimensional solver gets out of hand.

Dichotomous JML issues

While working on #30, noticed Joint Maximum Likelihood had large errors, need to investigate and fix.

Estimate ability with 3PL IRT

Hi is there a way to do theta estimation with guessing parameter?

I've seen guessing params been used in create_synthetic_irt_dichotomous() function but couldn't see any code that using it.

Is there a way to do 3PL IRT theta estimation using girth?

Jax Integration

Integrate JAX into code base to increase speed and flexibility.

Deprecate Approximate Methods

The approximation methods are only valid for normal ability distributions. Deprecate these functions in anticipation of moving to generalized distributions.

Add gaussian mixture model for ability

Current unidimensional functions assume a gaussian distribution for abilities, extend the functions to estimate a gaussian mixture model of the abilities.

Rename MML Methods

There are two maximum marginal likelihood methods, separate and full. This can be confusing to the user which one to actually call. Rename the separate methods to reflect the fact that these should be used and rename full ones to something less appealing.

Move 2PL_MML to GRM_MML

The 2PL model fits into the GRM_MML function with little change to code. Remove the 2PL code and point to the GRM code. This is less code to maintain.

Random number refactor

New best practice is to use a random number generator state instead of setting the seed. Update functions to follow this practice.

How to estimate theta?

The value of a and b can be estimated through MML. This software package does not seem to have the function of estimating the theta value of the subject. Whether to add the MAP method to estimate.
Thank you

Multiprocessing on Performance Metrics

The current performance framework runs concurrent on the ability parameters, this leads to a lack of concurrency as it waits for longer runs to finish. Switch to concurrency on iteration, this will lead to a uniform load on processors and faster run time.

Add 3PL Model

For completeness, add the 3 parameter logistic model in joint maximum likelihood and marginal maximum likelihood.

Refactor JML Code Base

There is a lot of reused code in the unidimensional code base. This was fine when building up initial feature set. Revisit the Joint Maximum likelihood code and refactor into a common functionality for the rasch, 1pl and 2pl parameter estimations.

Rasch with ability guessing

Implement the rasch model with ability guessing. This was suggested as alternative to the random guessing usually assumed.

Rasch np.unique and NaN values

Hi,
First of all, thank you very much for the great library.

Trying to use the Rasch model, it seems that the library does not accept any NaN or missing values.
In particular, if the dataset of [item X Participant] with values of True and False contains NaN values, an exception rises for every Rasch-related model.
The exception is from unique_sets, counts = np.unique(dataset, axis=1, return_counts=True)
For example in line 14 of def _jml_abstract.

Any suggestions on how to deal with missing NaN values?

Somehow in a time constraint for the academic project, that is why using a library instead of from-the-scratch implementation. So I highly appreciate any hint.

Add Documentation via Github Pages

Create a forward facing website to document girth usage and api. Pdoc is the documentation generation, still deciding between hugo/jekyll.

Remove Dichotmous Full Methods

The full methods were only used to compare against other methods, they are no longer needed and maintaining them is a pain in butt. Remove them.

Add performance metrics

Create a module that automates performance metrics for the purpose of comparing parameter estimate method techniques.

MCMC Solver

Markov chain monte carlo is a popular estimation technique, consider adding this for the current 1 dimensional methods.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.