fmelinscak / cognibench Goto Github PK

CogniBench is a software framework for benchmarking cognitive models

License: MIT License

Python 99.92% Shell 0.08%

cognitive-modeling model-validation benchmarking cognitive-science

cognibench's Introduction

CogniBench

CogniBench is a software framework for benchmarking cognitive models. It is implemented as a free, open source package in Python, but it can readily be used for validating models implemented in any language for which there is a Python interface (such as R or Matlab/Octave). CogniBench builds upon SciUnit - a domain-agnostic framework for validating scientific models, and OpenAI Gym - a library for developing and testing artificial agents. For a short introduction to CogniBench structure, please refer to documentation notebooks under the docs folder. For detailed examples of how one can use CogniBench you can refer to example testing and simulation scripts under the examples folder.

Installation

You can install CogniBench by downloading or cloning the repository, and running the following command:

pip install cognibench-path

where cognibench-path is the path to the top-level directory of the unpacked/cloned CogniBench package (i.e., the directory that contains the setup.py file).

If you wish to contribute to the development, you can clone this repository, and create the conda development environment with CogniBench using the following commands executed at the package top-level directory:

conda env create -f environment.yml
conda activate cognibench
python setup.py install

Short usage example

Here is a short snippet describing how you can test several models against multiple sets of experimental observations using CogniBench.

import cognibench.models.decision_making as decision_models
from cognibench.testing import InteractiveTest
from cognibench.scores import AccuracyScore, PearsonCorrelationScore
from sciunit import TestSuite

# observations is a dictionary with keys such as 'stimuli', 'rewards', etc.
observations, obs_dim, action_dim = read_data(observation_path)
# define the list of models to test
model_list = [
    decision_models.RWCKModel(n_action=action_dim, n_obs=obs_dim, seed=42),
    decision_models.NWSLSModel(n_action=action_dim, n_obs=obs_dim, seed=42),
]
# define the list of test cases
test_list = [
    InteractiveTest(observation=observations, score_type=AccuracyScore, name='Accuracy Test'),
    InteractiveTest(observation=observations, score_type=PearsonCorrelationScore, name='Correlation Test'),
]
# combine in a suite and run
test_suite = TestSuite(test_list, name='Test suite')
test_suite.judge(model_list)

Main features of CogniBench

Interactive tests

Testing certain models should be performed in an interactive manner. Instead of presenting all the stimuli at once, test samples are inputted one at a time, while observing the actions of the model being tested. CogniBench formalizes this notion in InteractiveTest test class and Interactive model capability.

In addition to interactive tests, CogniBench also implements the common way of testing models against a batch of samples (BatchTest and BatchTestWithSplit) in case you don't need the interactive testing logic.

SciUnit and OpenAI Gym interaction

In the SciUnit framework, models are tagged with capabilities which define the tests a model can possibly take. CogniBench combines this idea with action and observation spaces from OpenAI Gym library. Therefore, a model also specifies against which environments it can be simulated against in addition to the tests it can take.

Support for both single- and multi-subject models

Some models operate on a single subject at a time (single-subject models) whereas others can operate on multiple subjects at the same time (multi-subject models). CogniBench supports multi-subject models by assuming the model implementation of required interface functions take the subject index as the first argument. The testing interface defined by CNBTest class can seamlessly work on both single- and multi-subject models. In addition, we provide a simple utility function to convert single-subject model classes deriving from CNBModel to multi-subject classes.

Data simulation

CogniBench provides utility functions to simulate agents and/or models against matching environments to generate stimuli, action and reward triplets. These functions support both single-subject and multi-subject models.

Implementation of common experimental tasks

CogniBench offers model_recovery and param_recovery functions that you can use to perform these common auxiliary modeling tasks.

Agent and model Separation

CogniBench distinguishes between agents (CNBAgent base class) and models (CNBModel base class). An agent can interact with an environment through act and update methods, and can only function when its parameters are set to given values. In contrast, a model represents a specific way of fitting parameters for an agent (fit) and predicting the probability distribution over the action space (predict). The models we provide in CogniBench are implemented by taking this distinction into consideration; however, CogniBench has the flexibility to support models that don't care about this distinction.

Associative learning agent and model implementations

We provide example implementations for several simple associative learning agents and models. These models also demonstrate how to satisfy the interfaces required by interactive tests that require log probability distributions as predictions. Currently implemented associative learning models are

Random responding
Beta-binomial
Rescorla-Wagner
Kalman Rescorla-Wagner
LSSPD (Rescorla-Wagner-Pearce-Hall)

Decision making agent and model implementations

Similarly, we also provide example implementations for several simple decision making agents and models. Currently implemented decision making models are

Random responding
Rescorla-Wagner Choice Kernel
Rescorla-Wagner
Choice Kernel
Noisy-win-stay-lose-shift

Documentation

We provide a series of Jupyter notebooks that you can use as an introduction to CogniBench:

Small note to developers

If you are going to use the development version and want to run notebooks, you should install CogniBench inside the conda environment. See the section on installing CogniBench.

Examples

We provide multiple examples of using CogniBench as a tool to test models, simulate data and perform experimental tasks. These are very useful to get acquainted with how to use CogniBench. Please refer to readme file under examples/ folder for further information.

Tests

We use built-in unittest module for testing CogniBench. To perform checks, clone this repository and type

./test.sh

Information for developers

If you want to extend CogniBench, you need to use the development environment and conda. Please follow the conda installation instructions in how to install section and then continue here.

We use black for code-formatting and pre-commit for ensuring high quality code. To enable these tools simply run

pre-commit install

The next time you try to commit, all the required tools and hooks will be downloaded (and cached) and checks will be performed on your code.

Generating local documentation

After enabling the development environment, you can generate a local version of the documentation by running

cd docs/sphinx
make html

Afterwards, you can browse the local documentation by opening docs/sphinx/_build/index.html.

Installing a local development version

After implementing some changes, you can install the modified version of CogniBench to your local system by running

python setup.py install

Then, every time you import CogniBench, this modified version will be imported.

License

CogniBench is distributed under MIT license. See the LICENSE file for the exact terms and conditions.

cognibench's People

Contributors

Stargazers

Watchers

Forkers

shashankyld

cognibench's Issues

Check what's up with too high score values

When running the interactive test suite in docs folder, some of the score values tend up to be too high.

EDIT: These tests are moved to examples/decision_making_simulated and examples/assoc_learning_simulated folders.

List of issues with pupil benchmarking

Summary

This is a list of known issues with the pupil benchmarking, (examples/pspm_pupil) that needs to be addressed before the results are publishable

All the pupil files need to be reimported using the new Eyelink import with no filtering during import

In addition to pupil, gaze and marker channels, blink and saccade channels also need to imported so that blink/saccade filtering can be applied
Once this is done, files should be put to examples/pspm_pupil/data in the current structuring.

~~Sometimes pspm_glm crashes due to the inability of pspm_prepdata to handle NaN values. The crash logs can be found in experiment 2 logs in examples/pspm_pupil/log/exp2 folder in science cloud~~
Valid fixations filtering in experiment 7 seems to be crashing. This issue needs to be handled, similarly by checking the logs in examples/pspm_pupil/log/exp7.
Some Cohen's D scores seem to be negative. Although this is theoretically possible, I am not sure if this should be happening at all. This issue needs to be investigated
In experiment 1, given a dataset, score values seem to be constant with respect to different parameter combinations. (Should be fixed with PsPM PR @ bachlab/PsPM#108)
Some score values are returned as NULL; however, this is not a bug. When the exclusion threshold is too low, none of the subjects in a dataset are fitted.
Recording geometry should be set properly in examples/pspm_pupil/libcommon/pp_pfe; currently it is set to one of the auto modes for convenience.

Model optimization

Feature Description

Every test class should offer the possibility to optimize the models before generating the predictions. By default, we fit the model on the data used for testing. However, the optimization API should be generic enough to allow other types of tests such as train and test split to be integrated with it.

Model recovery

Feature Description

A user of ldmunit should be easily able to perform model recovery experiments. Here, model recovery refers to identifying which model has generated the given (simulated) data out of a range of possible models. This feature should be implemented after #37 and #39 are completed.

New way of training and testing

Feature Description

Implement a testing method for training a model on a certain part of the given data, and then testing it on the remaining part. This might be train-test split or some sort of cross validation.

Merge old feature branches into develop

There are a couple of old feature branches, some of which include new model implementations. We should merge these branches into develop before making the release.

Find a better way to define the feasible range of score objects

Feature Description

We need a better way to define the feasible range of a given score object. Currently this is defined on class level and it is not possible to create different test objects with different feasible ranges for the score.

Make the optimization procedure in PolicyModel changeable

Feature Description

Users shall be able to change the optimization procedure used in PolicyModel. The new API should be flexible enough to support both the existing scipy optimizers and possible new function implementations while maintaining compatibility with the current model design.

Agent and Model Separation

Feature Description

We need to emphasize the difference between an agent and a model in our codebase. Agents act on the environment by choosing one of the possible actions while models produce distributions over actions. From an implementation perspective, this distinction should lead to act and predict methods being somehow separated from the current LDMModel class. But we need to think about a good way to do this separation, possibly way down the line.

Update documentation

Feature Description

After the major changes in the last couple of months, docstrings are quite outdated. We need to update them.

A more generic test object that accepts various score objects?

Feature Description

Currently concrete test classes such as AICTest, MSETest define how they compute the score. Instead of doing this, maybe we can do this part in the score object itself, and somehow use a more generic test object that accepts a score as a parameter.

Associative learning demo

Feature Description

We need a demo of testing associative learning models. In addition to want-to-haves mentioned in #17, we would like to test using both simulated and real datasets.

Add ability to save predictions during testing

Feature Description

It might be useful to access the model predictions after testing. If the testing takes a long time, then having the predictions automatically saved to some file might be very handy.

Support Environments in testing classes

Feature Description

Interactive testing class should accept environments, and use it during the testing. Similar to models, environments should have update and generate_observation methods. Afterwards, the interactive testing framework should be extended with environments.

Implement a simple CPC18 Track I demo

Feature Description

We need a demo of testing CPC18 Track I using our framework. We want

at least 3 different tests
at least 3 different models with at least one model in each of python, Octave and R. Models might differ in their implementation, or just in their parameter values for simplicity.
nice visualization of the results in a jupyter notebook.

Extend sciunit.TestSuite to implement testing with multiprocessing

Feature Description

Testing many models against many testcases is a perfectly parallelizable task. We can offer an n_jobs parameter in this hypothetical test suite class to enable parallel testing, thus achieving significant speedups. This would also be a precursor to an ultimate distributed test suite class that may perform each of the tasks on a different node.

Refactoring Interactive Testing

Feature Description

Current interactive testing interface as implemented in InteractiveTest does too many things at once. Therefore, we are not able to use certain parts of it with similar models having slightly different requirements. We should divide this class into roughly three different ones (more or less):

A multi-subject testing interface,
A testing interface that compares the results of the agent against the actions of the training inputs. This is the idea used when testing reinforcement learning agents. I haven't found a good name for this, so for now we will call it reinforcement learning testing interface,
An interactive testing interface.

Parameter recovery

Feature Description

Similar to #40, we would like to provide easy-to-use parameter recovery methods. Parameter recovery refers to identifying the parameters that has generated the given (simulated) data out of a set of bounded/unbounded set of parameters.

Refactor interactive testing logic

Feature Description

Interactive testing logic is being duplicated in a couple of places, some of which are InteractiveTest and ReinforcementLearningFittingMixin classes. We should consider refactoring it somehow.

Implement a simple CPC18 Track II demo

Feature Description

We need a version of #17 adapted to track II. This might be better for showcasing our framework (and possible interactive testing capabilities) since it seems that it consists of a time series prediction task. Same want-to-haves apply here, as well.

Data simulation

Feature Description

We need a couple of utilities to be able to simulate data using a given environment and model. This issue might be related to #37 and #38

Decision making demo

Feature Description

We need a demo of testing decision making models similar to #20. Possible tasks are bandit tasks and two-step task.

Fix documentation related TODO items

Feature Description

~~There are a couple of documentation related TODO items in the repository. We should fix these before the release.~~

We need to update the documentation of the whole library before final testing and 0.1.0 release

fmelinscak / cognibench Goto Github PK

cognibench's Introduction

CogniBench

Installation

Short usage example

Main features of CogniBench

Interactive tests

SciUnit and OpenAI Gym interaction

Support for both single- and multi-subject models

Data simulation

Implementation of common experimental tasks

Agent and model Separation

Associative learning agent and model implementations

Decision making agent and model implementations

Documentation

Small note to developers

Examples

Tests

Information for developers

Generating local documentation

Installing a local development version

License

cognibench's People

Contributors

Stargazers

Watchers

Forkers

cognibench's Issues

Summary

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Feature Description

Recommend Projects

Recommend Topics

Recommend Org