bigdata-ustc / educdm Goto Github PK

The Model Zoo of Cognitive Diagnosis Models, including classic Item Response Ranking (IRT), Multidimensional Item Response Ranking (MIRT), Deterministic Input, Noisy "And" model(DINA), and advanced Fuzzy Cognitive Diagnosis Framework (FuzzyCDF), Neural Cognitive Diagnosis Model (NCDM) and Item Response Ranking framework (IRR).

License: Apache License 2.0

Python 98.77% Makefile 1.23%

cognitive-diagnosis-models model-zoo psychometrics dina fuzzycdf neuralcdm irt item-response-theory cdm students

educdm's People

Contributors

Stargazers

Watchers

educdm's Issues

About the preprocess of datasets

您好，
非常感谢你们的出色的工作！
我有一个关于数据集的问题：
请问Edudata中的‘a0910’数据集是处理自 ‘2009-2010 ASSISTment Skill Builder Data’ 吗？
我处理得到数据和这个数据集的有一点不同，而且和NCD，RCD的数据集也不太相同。
请问是否方便提供数据预处理的脚本，谢谢

[!important] the test will switch from travis to github actions due to the exhaust of credits

Is there a problem with this line？

I don't undersatand why it is (stu_i + 2 * (1 - q_m))[obj_prob_index] and not stu_i[obj_prob_index]?
I am looking forward to your reply!

EduCDM/EduCDM/FuzzyCDF/modules.py

Line 16 in 79f300c

mastery[i][obj_prob_index] = np.min((stu_i + 2 * (1 - q_m))[obj_prob_index], axis=1)

代码运行中遇到的问题

请问您的educdm代码怎么运行的，readme过于简单了，其中没有说明项目的运行（比如IRR的项目）

Fail to run examples/FuzzyCDF/prepare_dataset.ipynb in MacOS and Linux Server

🐛 Description

In MacOS and Linux, fail to run examples/FuzzyCDF/prepare_dataset.ipynb if not install rar and unrar.

Error Message

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

TypeError Traceback (most recent call last)
/tmp/ipykernel_32072/1971457670.py in
2 from EduData import get_data
3
----> 4 get_data("math2015", "../../data")

~/miniconda3/lib/python3.9/site-packages/EduData/DataSet/download_data/download_data.py in get_data(dataset, data_dir, override, url_dict)
221
222 try:
--> 223 return download_data(url, data_dir, override)
224 except FileExistsError:
225 logger.info("file existed, skipped")

~/miniconda3/lib/python3.9/site-packages/EduData/DataSet/download_data/download_data.py in download_data(url, data_dir, override, bloom_filter)
188 os.makedirs(data_dir, exist_ok=True)
189 save_path = path_append(data_dir, url.split('/')[-1], to_str=True)
--> 190 _data_dir = download_file(url, save_path, override)
191 bloom_filter.add(url)
192 return _data_dir

~/miniconda3/lib/python3.9/site-packages/EduData/DataSet/download_data/download_data.py in download_file(url, save_path, override, chunksize)
127
128 mode = 'wb+'
--> 129 content_len = int(res.headers.get('content-length'))
130 # Check if server supports range feature, and works as expected.
131 if res.status_code == 206:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

1.run examples/FuzzyCDF/prepare_dataset.ipynb

What have you tried to solve it?

1.Install rar and unrar in MacOS ,so it work well in local server
2.Fail to install rar and unrar in lab server because of permission

Environment

Environment Information

Operating System:
MacOS

Python Version: (e.g., python3.6, anaconda/python3.7, venv/python3.8)
python 3.9.5

Additional context

[FEATURE] add RCD model

Description

(A clear and concise description of what the feature is.)

If the proposal is about an algorithm or a model, provide mock examples if possible. In addition, you may need to carefully follow the guidance

References

[1]

Q: the Definition of parameters of EM-IRT

I tried running the EM-IRT code as a baseline model, but the definition of some parameters confused me...
I wonder the type and meaning of the parameter R, skip_value and D(which is initialized to 1.702).
Thank you!

IRT and Random Selection

The provided item response theory dataset was run twice, taking the top 10% of the most difficult questions b, and the repetition rate of the results obtained twice was less than 5%. This proves that IRT approximates random selection. Why is this?

Logical bug in IRTNet.forward

🐛 Description

I was browsing this repo for IRT implementations and found (I think) a theoretical bug in the implementation of IRTNet.

IRTNet.forward is defined here
https://github.com/bigdata-ustc/EduCDM/blob/main/EduCDM/IRT/GD/IRT.py#L30

    def forward(self, user, item):
        theta = torch.squeeze(self.theta(user), dim=-1)
        a = torch.squeeze(self.a(item), dim=-1)
        b = torch.squeeze(self.b(item), dim=-1)
        c = torch.squeeze(self.c(item), dim=-1)
        return torch.sigmoid(self.irf(theta, a, b, c, **self.irf_kwargs))

And the logic is that the output of irf is passed through the sigmoid function. This is fine if the output of irf itself is a "logit".

The IRF function is defined here:
https://github.com/bigdata-ustc/EduCDM/blob/main/EduCDM/IRT/irt.py#L10

def irf(theta, a, b, c, D=1.702, *, F=np):
    return c + (1 - c) / (1 + F.exp(-D * a * (theta - b)))

If you look at this you can see that it is already depicting sigmoid behaviour (assuming, of course, that 0 <= c <= 1). In other words, irf is returning probabilities, and not logits. As a result, the forward function above is actually doing this:

1 / (1 + exp(-(c + (1 - c) / (1 + F.exp(-D * a * (theta - b)))))

which I think is probably a bug.

If I haven't misunderstood, I have two recommendations:

Simply remove the torch.sigmoid call from forward
(optional) it may be worth passing c through a sigmoid function to ensure it doesn't go negative or above 1. (Perhaps selectable in irf_kwargs?)

i.e.

    def forward(self, user, item):
        theta = torch.squeeze(self.theta(user), dim=-1)
        a = torch.squeeze(self.a(item), dim=-1)
        b = torch.squeeze(self.b(item), dim=-1)
        c = torch.squeeze(self.c(item), dim=-1)
        if self.irf_kwargs.get("squash_c", True):
            c = torch.sigmoid(c)
        return self.irf(theta, a, b, c, **self.irf_kwargs)  # May want to clip values if c not constrained

Edit: I noticed that this torch.sigmoid(irf(...)) pattern also happens in MIRT, and possible elsewhere too.
Edit 2: I also realise that because sigmoid is monotonic, it doesn't really change the optimal solution. However, it does seem unnecessary to differentiate through sigmoid twice.

Error Message

To Reproduce

Environment

Environment Information

Operating System: NA

Python Version: NA

Additional context

Unstable DINA coverage test

🐛 Description

(A clear and concise description of what the bug is.)

Error Message

The coverage report of DINA is not stable @Ljyustc

To Reproduce

See line 74 - 75, the result is related to the initialization. In some case, line 74 - 75 is skiiped

Steps to reproduce

Rerun pytest for several times

Environment

Environment Information

Operating System: Windows 8

Python Version: python3.8

Additional context

C-Dina 代码

Description

根据Dina修改的C-Dina代码

References

A Cognitive Diagnosis Model for Continuous Response

bigdata-ustc / educdm Goto Github PK

educdm's People

Contributors

Stargazers

Watchers

Forkers

educdm's Issues

🐛 Description

Error Message

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

To Reproduce

Steps to reproduce

What have you tried to solve it?

Environment

Additional context

Description

References

🐛 Description

Error Message

To Reproduce

Environment

Additional context

🐛 Description

Error Message

To Reproduce

Steps to reproduce

Environment

Additional context

Description

References

Recommend Projects

Recommend Topics

Recommend Org