Comments (7)
There isn't currently a way to estimate parameters with missing values. It is definitely in the pipeline of features to include. The main push right now is to bring up core functionality with respect to model estimation (polytomous, multi-dimensional, unfolding). I will begin to look at how I would include this and am open to suggestions, thank you for letting me know.
The main interface is a numpy array and many formats can be converted into an array. I don't plan on supporting any types of alternative file formats unless there is a great demand. I specifically decided against pandas as it is too messy in my opinion.
Your lists of lists can be converted into an array in many ways, below is a way to do it, there are other better alternatives as well
import numpy as np
## List of lists converted to a numpy array
# [ID, Item#, Response]
alt_format = [[0, 0, 1], [0, 1, 0], [0, 2, 1],
[5, 1, 0], [5, 0, 1], [5, 2, 0]]
# Make temporary array from list
unformatted_array = np.asarray(alt_format)
# Get the number of participants
participant_id = np.unique(unformatted_array[:, 0])
# Get the number items
n_items = np.unique(unformatted_array[:, 1]).size
# Create place holder for formated data
formatted_array = np.zeros((n_items, participant_id.max()+1))
formatted_array[unformatted_array[:, 1], unformatted_array[:, 0]] = unformatted_array[:, 2]
# Trim only valid participant ids
formatted_array = formatted_array[:, participant_id]
print(formatted_array)
>>> [[1. 1.]
[0. 0.]
[1. 0.]]
from girth.
For dichotomous parameter estimation, replacing null values with 0 in the kernel estimation ought to make optimization invariant to missing values. This will only work for joint and marginal likelihood and not conditional likelihood.
This will essentially ignore the missing values and does not try to impute them, treating them as MCAR (missing completely at random). This should be sufficient for now and will open two more issues to address imputation procedures as well as polytomous data.
In progress ...
from girth.
from girth.
I do something for pair-wise Incremental parameter estimation method by online learning.
Online IRT parameter estimation is an interesting problem. The separate parameter estimations i have for the unidimensional model would fit nicely into this paradigm. Keep track of the ratio of true / false for each item and a running integral. This would make it easy to do constant updates, might try to put something together. I also now understand why you would want the input format you wanted in the original, [ID, item, response], this is how an online update would occur.
from girth.
Refactored the mml / jml code base to account for missing values.
A missing value is represented with NAN which is found in numpy as numpy.nan.
Writing unittests now and will create a pull request shortly.
from girth.
Uni dimensional missing data example:
from girth import twopl_separate
from girth import create_synthetic_irt_dichotomous
# Create Synthetic Data
np.random.seed(42)
difficulty = np.linspace(-2, 2, 10)
discrimination = 0.5 + np.random.rand(10) * 2
theta = np.random.randn(400)
syn_data = create_synthetic_irt_dichotomous(difficulty, discrimination, theta)
syn_data = syn_data.astype('float')
# Add nans (missing values)
mask = np.random.rand(*syn_data.shape) < 0.125
syn_data_orig = syn_data.copy()
syn_data[mask] = np.nan
# Estimate parameters
a, b = twopl_separate(syn_data)
ao, bo = twopl_separate(syn_data_orig)
print("Discrimination Estimation")
print("RMSE: Missing | Full")
print(np.sqrt(np.square(a - discrimination).mean()).round(3), np.sqrt(np.square(ao - discrimination).mean()).round(3))
print("\n")
print("Difficulty Estimation")
print("RMSE: Missing | Full")
print(np.sqrt(np.square(b - difficulty).mean()).round(3), np.sqrt(np.square(bo - difficulty).mean()).round(3))```
from girth.
Hi @eribean
Would this feature you added, handle missing values in all models present under Unidimensional models section here like grm_mml, twopl_mml etc?
from girth.
Related Issues (20)
- Add initial guess method for multidimensional models HOT 1
- Investigate removing numba support HOT 1
- Add citation fille HOT 1
- Girth Rebrand HOT 6
- Add disclaimer about Rasch Models
- what is proper choice for sparse response HOT 2
- IRT at Scale
- November Code Refactor HOT 1
- Remove Correlated Abilities
- Unify dichotomous synthetic data creation. HOT 1
- Clean up Read-me File
- Remove irt function
- Directory Structure Refactor HOT 1
- Multi-dimensional Initial Guess Not Semi-Positive Definite
- I don't know where to take on it? HOT 4
- NaN encountered warning with `grm_mml`
- "float128" data type is not understood (grm_mml_eap)
- Installation does not seem to work HOT 1
- twopl_mml , threepl_mml model are taking too much time
- RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from girth.