kiudee / cs-ranking Goto Github PK

Context-sensitive ranking and choice in Python with PyTorch

Home Page: https://cs-ranking.readthedocs.io

License: Apache License 2.0

Python 99.01% Shell 0.57% Nix 0.42%

machine-learning neural-networks ranking learning-to-rank object-ranking context-aware deep-learning tensorflow discrete-choice choice-model

cs-ranking's People

Contributors

Stargazers

Watchers

Forkers

jimmy-walker xujim hytsang hehondou mindis anodynos srinikrish22 timokau helegraf julilien yashila22 lfsblack charismaticzone ahmad-abdellatif prithagupta

cs-ranking's Issues

Allow users to override tuning parameters when calling public methods

Currently variables like learning_rate are specified as a tunable and it is not possible to continue training with a lower learning rate.

Thoughts?

Use a linter for the documentation

It would be nice to check our documentation before committing / merging. Now that we're making use of the pre-commit framework, this should be easy. Options include

https://github.com/PyCQA/doc8
https://github.com/twolfson/restructuredtext-lint (though there's no native pre-commit config; here are two options to use it anyway)

We just need to make sure our docs pass the linter first. This issue might be relevant regarding warnings.

Sequence-based indexing in theano is deprecated

In csrank/discretechoice/nested_logit_model.py we make use of sequence indexing in theano multiple times. For example:

cs-ranking/csrank/discretechoice/nested_logit_model.py

Line 175 in ba03234

rows, cols = tt.eq(self.y_nests, i).nonzero()

Here rows and cols are both 1d tensors which are then used to index a different tensor:

cs-ranking/csrank/discretechoice/nested_logit_model.py

Line 176 in ba03234

 utility = tt.set_subtensor(utility[rows, cols], tt.dot(self.Xt[rows, cols], weights[i])) 

Theano complains that this is deprecated:

FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.

I'm not sure how to fix this. According to the documentation, theano should support boolean mask indexing. So I thought we should be able to do

mask = tt.eq(self.y_nests, i)
utility = tt.set_subtensor(utility[mask], tt.dot(self.Xt[mask], weights[i]))

instead (as tt.eq should return a boolean mask). But unfortunately that doesn't work; it gives the same warning.

Any ideas?

Migrate away from tf1

There has been some internal discussion about this, but I think its time to also open an issue about it. We are still using tensorflow 1, which has been outdated for a while now. Switching to tensorflow 2 would be a significant effort, since the underlying model fundamentally changed (there is no explicit graph construction anymore). At that point, it may be worth evaluating switching to pytorch instead. pytorch is a newer, very popular autodiff framework.

This article comes to the conclusion that

TensorFlow is still mentioned in many more job listings that PyTorch, but the gap is closing. PyTorch has taken the lead in usage in research papers at top conferences and almost closed the gap in Google search results. TensorFlow remains three times more common in usage according to the most recent Stack Overflow Developer Survey.

Here's another relevant article. Overall it seems to me that pytorch is the more future-proof choice, and if we're going to have to rewrite a lot of the code anyway we might as well switch. I do not have any practical experience in pytorch yet though, that's just what I could determine from other's opinions and first impressions.

We should also think about how we want to do the transition. This is a major undertaking and probably will take a while. Should we support tf1 and newthing in parallel? Gradually move models to newthing (thereby having mixed support)? Fork the project? Work on one big PR/branch, effectively blocking most other work for the time due to potential conflicts?

Organize imports

Currently we import all submodules in __init__:

cs-ranking/csrank/__init__.py

Lines 3 to 9 in 55396fb

 from .choicefunction import * 

 from .core import * 

 from .dataset_reader import * 

 from .discretechoice import * 

 from .objectranking import * 

 from .tunable import Tunable 

 from .tuning import ParameterOptimizer

This results in the user seeing a confusing list of submodules. We should trim that by using __all__ to only import important classes and functions.

The remaining modules are still available, but hidden by default.

Implement proper ranking conversion for nDCG

nDCG is expecting relevance scores as input. When supplying rankings, we first have to convert the ranking into a set of ordered scores. This can quickly lead to numerical problems, due to the exponential growth of 2 ** s.

Todo

Implement a method of converting rankings to relevance stores, which ensures numerical stability for nDCG.
Reactivate the test in test_metrics and account for the conversion.

"mean of empty slice" in spearman correlation calculation

During the tests, numpy complains about a "mean of empty slice". That happens because the calculation of the spearman correlation filters the labels it applies to as follows:

cs-ranking/csrank/metrics_np.py

Line 24 in ba03234

if len(np.unique(r2)) == len(r2):

And then averages its results:

cs-ranking/csrank/metrics_np.py

Line 29 in ba03234

return np.nanmean(np.array(rho))

Which may be empty (or consist of only NaNs) due to the previous filter. What is the intention behind that filter?

CC @prithagupta

Bug in setting the number of nests in generalized_nested_logit

See #118 (comment) for details. Should be resolved after merging #118 to avoid conflicts.

Docstring for FATEObjectRanker

Currently we only document the fit function.

Clean up notebooks

We need the following notebooks:

Usage of FATE-Network
Usage of FETA-Network
Run of experiments on synthetic data
...

Adhere to scikit-learn estimator interface

Rationale

Most of the learners implemented in cs-ranking already implement an interface similar to the one described in https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator,
i.e., we usually have a fit and predict method implemented.
For users to be able to use all learners effortlessly in a scikit-learn pipeline.Pipeline or to apply model_selection.GridSearchCV, we should make sure that all additional requirements are also fulfilled.

To do

Use get_params and set_params to set parameters. This is important, since GridSearchCV or BayesSearchCV call set_params for hyperparameter optimization. sklearn.base.BaseEstimator implements basic versions of these. The current way we handle hyperparameters should be deprecated.
It is recommended to not do any parameter validation in __init__, but rather in fit itself. set_params is supposed to do exactly the same thing as __init__ with respect to parameters.
Init parameters should be written without changes as attributes. All generated attributes should have a trailing _.
There should be no mandatory parameters. The user should be able to run the learner without having to provide arguments.
Implement a score method. This is helpful, since hyperparameter optimizers call this function by default. Otherwise the user has to implement a custom one.
Implement clone methods for each learner.

Most of these changes are independent of each other and could be done using separate branches.

Move scripts for the experiments to another repository

Currently, the scripts we use for our experiments are still part of this repository. Since this repository is moving towards being a library for object ranking and choice, this code should be moved to a separate repository.

Document the release process

We are using bump2version now to change the version number and create a tagged commit. This triggers an upload of the new version to PyPi.
At the same time the HISTORY.rst file needs to be updated with the recent changes.

This process should be documented in Sphinx.

The correct order is:

Update HISTORY.rst and commit.
Run bump2version [patch|minor|major]
Push commits and tag to master/branch.

Remove deprecated methods

Device placement is logged by default

We have a utility function configure_numpy_keras which is used in some of the experiment scripts:

cs-ranking/csrank/tensorflow_util.py

Lines 40 to 58 in a635d59

 def configure_numpy_keras(seed=42): 

 tf.set_random_seed(seed) 

 os.environ["KERAS_BACKEND"] = "tensorflow" 

 devices = [x.name for x in device_lib.list_local_devices()] 

 logger = logging.getLogger("ConfigureKeras") 

 logger.info("Devices {}".format(devices)) 

 n_gpus = len([x.name for x in device_lib.list_local_devices() if x.device_type == 'GPU']) 

 if n_gpus == 0: 

 config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1, 

 allow_soft_placement=True, log_device_placement=False, 

 device_count={'CPU': multiprocessing.cpu_count() - 2}) 

 else: 

 config = tf.ConfigProto(allow_soft_placement=True, 

 log_device_placement=True, intra_op_parallelism_threads=2, 

 inter_op_parallelism_threads=2) # , gpu_options = gpu_options) 

 sess = tf.Session(config=config) 

 K.set_session(sess) 

 np.random.seed(seed) 

 logger.info("Number of GPUS {}".format(n_gpus))

It does the following:

Set random seeds
Sets the KERAS_BACKEND to Tensorflow
Checks the number of GPUs and sets the Tensorflow options accordingly
Creates a Tensorflow session for Keras to use

There are a few issues (and maybe more) with this:

Everything is set to hardcoded constants. Making it configurable is desirable.
log_device_placement is set to True, which can cause slowdowns due to logging and should be False by default.
It is not clear, if tensorflow_util.py is the correct location, if the function is only ever used in experiments.
It is not documented.

Migrate Optimizer to BoTorch

Rationale

scikit-optimize is currently not maintained anymore and BoTorch implements several features making it very useful for our library:

Proper handling of hyper priors (including sensible defaults), which should help stabilize our runs
Analytic and Monte-Carlo acquisition functions designed for noisy target functions
Batching of hyperparameter runs (allowing parallel execution)

Choice function html doc generation is not working properly

While working on fixing #126, I noticed that our online docs for choice functions are broken:

Notice that there are no details other than the name. The choice functions don't link to any additional documentation either. The docs for our other types of estimators work as expected.

Documentation gives Error 404

The documentation at https://kiudee.github.io/cs-ranking/ returns a 404 error.
It appears our changes to travis-ci must have caused a problem and the documentation is not updated correctly.

Evaluate weight averaging

Maintaining a running average of weights has been shown to improve generalization.
We should evaluate the effectiveness for our architecture(s).

Paper: Averaging Weights Leads to Wider Optima and Better Generalization

Fix warnings in tests

We have many warnings in the tests, since we accidentally escape characters in the docstrings.

Example:

cs-ranking/csrank/core/cmpnet_core.py

Lines 69 to 74 in 6c4c30c

 """ 

  Construct the CmpNet which is used to approximate the :math:`U_1(x_i,x_j)`. For each pair of objects in 

  :math:`x_i, x_j \in Q` we construct two sub-networks with weight sharing in all hidden layers. 

  The output of these networks are connected to two sigmoid units that produces the outputs of the network, 

  i.e., :math:`U(x_1,x_2), U(x_2,x_1)` for each pair of objects are evaluated. :math:`U(x_1,x_2)` is a measure 

  of how favorable it is to choose :math:`x_1` over :math:`x_2`.

Release 1.2

Since the last release, we have improved the UX of our API, fixed unexpected behaviour, fixed a few bugs and did some refactoring. That's just off the top of my head, there are probably other things as well.

We should think about a new release.

Document the new choice settings

Todo

Look into typing with mypy

See #129 (comment). We already declare many types in docstrings. Using mypy would require us to formalize this a bit more, with the added bonus of static guarantees and better tooling support (such as enhanced tab completion).

Add a travis minimal import sanity check

To prevent issues linke #137 we should have a CI check that tries to import csrank with only the minimal dependencies.

Support saving of models

Rationale

When training the models on different datasets, it would be advantageous to be able to save the model as is to a file. That way it is easy to later load the model and e.g. evaluate it on new instances etc.

Check documentation style

We have recently added some static analysis and formatting tools. One thing we are not checking for yet is inline documentation. There are some tools out there, for example pycodestyle. It seems like pylint has some doc checking functionality too.

I think it would be valuable to extend our static checks to the inline documentation. We already check stand-alone rst files with doc8..

Improve Tunable

The new Tunable class should be able to change the set of tunable parameters during runtime (currently it is a class method).
This would allow us to attach arbitrary numbers of parameters to a model (e.g. coming from callbacks etc).

Change Tunable to be nestable
- Methods should be object methods
Change optimizer to use object methods

Potential problems

Fit function is called after optimizer needs to know about parameters
Is it realistic that a model can have parameters to be tuned which do depend on the model (and thus are not settable in advance by the user)?
When the user provides tunable objects -> ensure that they reset properly across iterations.

Potential solutions

Let optimizer handle an ordered dictionary of all the tunables.

Improve handling of validation loss in ParameterOptimizer

AllPositive Choice Baseline does not predict anything for the non-variadic case

The _predict_scores_fixed method of the class AllPositive requires X and Y inputs:

cs-ranking/csrank/choicefunction/baseline.py

Lines 23 to 24 in 49e39df

 def _predict_scores_fixed(self, X, Y, **kwargs): 

 return np.zeros_like(Y) + Y.mean()

In the variadic case, it is called with X and Y:

cs-ranking/csrank/choicefunction/baseline.py

Lines 50 to 51 in 49e39df

 scores[ranking_size] = self._predict_scores_fixed( 

 x, Y[ranking_size], **kwargs

However, it is called without Y in the predict_scores method for the non-variadic case:

cs-ranking/csrank/choicefunction/baseline.py

Line 55 in 49e39df

scores = self._predict_scores_fixed(X, **kwargs)

This leads to a None prediction.

FATE Choice ignores parameters of the fit method

The fit method parameters are:

cs-ranking/csrank/choicefunction/fate_choice.py

Lines 134 to 150 in 49e39df

 def fit( 

 self, 

 X, 

 Y, 

 epochs=35, 

 inner_epochs=1, 

 callbacks=None, 

 validation_split=0.1, 

 verbose=0, 

 global_lr=1.0, 

 global_momentum=0.9, 

 min_bucket_size=500, 

 refit=False, 

 tune_size=0.1, 

 thin_thresholds=1, 

 **kwargs, 

 ):

However the parameters are not passed to the super class:

cs-ranking/csrank/choicefunction/fate_choice.py

Line 199 in 49e39df

super().fit(X_train, Y_train, **kwargs)

cs-ranking/csrank/choicefunction/fate_choice.py

Line 208 in 49e39df

super().fit(X, Y, **kwargs)

Simplify tests using nox/tox

The current testing setup is a source of frequent problems and should be simplified.

To do

Switch to nox a "successor" to tox to manage the test environment setup
Use pytest-xdist and/or parallel mode to parallelize the builds.
(optional) Use tox-travis if we still want to use the build matrix on travis.

Related issues

#66, #97

Write docstrings for dataset generators

Dataset generators like

cs-ranking/csrank/dataset_reader/choicefunctions/choice_data_generator.py

Lines 8 to 17 in 229d5dd

 class ChoiceDatasetGenerator(SyntheticDatasetGenerator): 

 def __init__(self, dataset_type='pareto', **kwargs): 

 super(ChoiceDatasetGenerator, self).__init__( 

 learning_problem=CHOICE_FUNCTION, **kwargs) 

 dataset_function_options = {'linear': self.make_latent_linear_choices, 

 "pareto": self.make_globular_pareto_choices} 

 if dataset_type not in dataset_function_options.keys(): 

 dataset_type = "pareto" 

 self.dataset_function = dataset_function_options[dataset_type]

inherit the docstring of the parent class, which is not very informative.

Speed up Travis-CI builds

Currently, the tests for the probabilistic models implemented in PyMC3 take a long time to run, which causes a long delay between updating a pull request and receiving Travis confirmation.

Measures

Parallelize build using several environments
Speed up PyMC3 tests
Speed up installation process and optimize caching (if possible)

Split Travis-CI build into build and deploy stage

Currently, as soon as one of the parallel envs is finished, travis immediately tries to deploy to gh-pages and PyPI. This is obviously not desirable.
We should split the build into two stages 'build' and 'deploy' as described here:
https://docs.travis-ci.com/user/build-stages/matrix-expansion/

Prepare package for PyPI/Anaconda

Rationale

For users it is much more convenient to be able to install the package from PyPI using a simple

pip install cs-ranking

conda install -c conda-forge cs-ranking

rather than checking out the repository.

What needs to be done

Check the official packaging guidelines.
Create a recipe for conda-forge.

Make auxiliary dependencies optional

Currently we require quite a few dependencies, which makes installing the library difficult. I went ahead and categorized the different dependencies (see below). We should do the following:

Remove these dependencies from install_requires
Move these to extras_require

Dependencies

Core (mandatory):
- numpy
- scipy
- scikit-learn
- scikit-optimize
- joblib
- keras
- tensorflow
- docopt
Data I/O or generation (optional):
- psycopg2-binary for database access
- pandas
- h5py
- pygmo
Required for some of the probabilistic models (optional):
- pymc3
- theano
Nice to haves (optional):
- tqdm for progress bars

Importing csrank causes error without optional theano installed

Simply running

import csrank

after installing csrank from pip without any optional dependencies, results in the following error:

csrank.util.MissingExtraError: Could not import the optional dependency theano. Please install it or specify the "probabilistic" extra when installing this package.

This should definitely not happen.

Version: 1.2.0

Check if dataset generator exists and otherwise raise an exception

Currently, if the dataset generator receives an invalid dataset, it silently picks a default generator.

cs-ranking/csrank/dataset_reader/objectranking/object_ranking_data_generator.py

Lines 35 to 36 in 5bdc0e4

 if dataset_type not in dataset_function_options.keys(): 

 dataset_type = "medoid"

This is unexpected behavior and should be changed. If the dataset generator is unknown, an exception should be raised.

We should check all dataset generators/parsers for similar behavior.

Look into further static checking with pylint

A bug like #126 could have been cached by a static check for unused variables. We should think about using (at least some of) pylints checks.

Callbacks

Bug in LRScheduler of callbacks
Create LRSchedular and EarStopping independent of Keras implementation to avoid issues.
One way would be to inherit from Callback from Keras.

There is an issue with the current LrScheduler. The problem is that it exponentially decreases the learning rate at each epoch after the epoch_drop.
For learning rate of 0.015 and epoch drop=5 and drop percentage =0.9

Epoch 00001: LearningRateScheduler setting learning rate to 0.014999999664723873.
Epoch 00005: LearningRateScheduler setting learning rate to 0.013499999698251487.
Epoch 00006: LearningRateScheduler setting learning rate to 0.012149999476969242.
Epoch 00007: LearningRateScheduler setting learning rate to 0.010934999864548446.
Epoch 00008: LearningRateScheduler setting learning rate to 0.009841500129550696
Epoch 00009: LearningRateScheduler setting learning rate to 0.00885734986513853.
Epoch 00010: LearningRateScheduler setting learning rate to 0.007174453390762211
Epoch 00011: LearningRateScheduler setting learning rate to 0.005811307118274272
While the output should be:
Epoch 00001: LearningRateScheduler setting learning rate to 0.014999999664723873.
Epoch 00005: LearningRateScheduler setting learning rate to 0.013499999698251487.
Epoch 00010: LearningRateScheduler setting learning rate to 0.012149999728426338.
Epoch 00015: LearningRateScheduler setting learning rate to 0.010934999755583704.

The f-measure is ill-defined when there are no true positives or no positive predicitons

sklearn issues a warning during the tests:

sklearn.exceptions.UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no predicted labels.

This is because

some of the test samples generated in csrank/tests/test_choice_functions.py:trivial_choice_problem have no true positives
some of the learners predict no positives for some of the generated problems

In both of those cases the f-measure is not properly defined. sklearn assigns 0 and 1 respectively.

How should we deal with this? A metric should be defined for these possibilities. 0 and 1 in those cases seems somewhat reasonable, so maybe we should just silence the warning?

Add a `pyproject.toml`

The current standards for specifying the build of a python project, its dependencies and the configuration of various tools is pyproject.toml. It may be a good idea to adopt this. There are some holdouts for the tools. Specifying the build system without needing to run a python program (which may have dependencies itself) is the biggest benefit.

Fix test for the FATE ranker

It appears from the failed build 158 that the test

cs-ranking/csrank/tests/test_ranking.py

Line 75 in bb143bc

assert np.all(pred == y)

can fail with some probability because the training is random.

We should ensure, that it only fails if the underlying model is broken.

MultinomialLogitModel silently chooses l2 for unknown regularization

If the given regularization is not "l1" or "l2", "l2" is chosen, which seems undesirable as the parameter already has "l2" as the default value so None values do not need to be accounted for.

cs-ranking/csrank/discretechoice/multinomial_logit_model.py

Lines 68 to 71 in 49e39df

 if regularization in ["l1", "l2"]: 

 self.regularization = regularization 

 else: 

 self.regularization = "l2"

Use a semantic linter for the python code

Now that we use a linter for formatting (#78), we could also use a semantic linter. Two common options are flake8, which is more conservative (less reports) and pylint. We would first need to address the issues those linters raise. For example for flake8 (ignoring line length, since black takes care of that and disagrees with flake8s limit):

$ flake8 **/*.py | grep -v 'line too long' | wc -l
36

Fixing those will likely improve the code anyway.

Bug in predicting scores for feta discrete choice

Bug in predicting scores for dictionaries in feta discrete choice
Refactor code for experiments

Implement versioneer support for automatic version string updates

Rationale

Updating the version string when doing a new release is error-prone. Versioneer automatically determines the version string by querying git.
This allows us to simply tag a commit using a descriptive name like v1.1 or 1.1 and it will be applied automatically.

	from .choicefunction import *
	from .core import *
	from .dataset_reader import *
	from .discretechoice import *
	from .objectranking import *
	from .tunable import Tunable
	from .tuning import ParameterOptimizer

	def configure_numpy_keras(seed=42):
	tf.set_random_seed(seed)
	os.environ["KERAS_BACKEND"] = "tensorflow"
	devices = [x.name for x in device_lib.list_local_devices()]
	logger = logging.getLogger("ConfigureKeras")
	logger.info("Devices {}".format(devices))
	n_gpus = len([x.name for x in device_lib.list_local_devices() if x.device_type == 'GPU'])
	if n_gpus == 0:
	config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1,
	allow_soft_placement=True, log_device_placement=False,
	device_count={'CPU': multiprocessing.cpu_count() - 2})
	else:
	config = tf.ConfigProto(allow_soft_placement=True,
	log_device_placement=True, intra_op_parallelism_threads=2,
	inter_op_parallelism_threads=2) # , gpu_options = gpu_options)
	sess = tf.Session(config=config)
	K.set_session(sess)
	np.random.seed(seed)
	logger.info("Number of GPUS {}".format(n_gpus))

	"""
	Construct the CmpNet which is used to approximate the :math:`U_1(x_i,x_j)`. For each pair of objects in
	:math:`x_i, x_j \in Q` we construct two sub-networks with weight sharing in all hidden layers.
	The output of these networks are connected to two sigmoid units that produces the outputs of the network,
	i.e., :math:`U(x_1,x_2), U(x_2,x_1)` for each pair of objects are evaluated. :math:`U(x_1,x_2)` is a measure
	of how favorable it is to choose :math:`x_1` over :math:`x_2`.

	def _predict_scores_fixed(self, X, Y, **kwargs):
	return np.zeros_like(Y) + Y.mean()

	scores[ranking_size] = self._predict_scores_fixed(
	x, Y[ranking_size], **kwargs

	def fit(
	self,
	X,
	Y,
	epochs=35,
	inner_epochs=1,
	callbacks=None,
	validation_split=0.1,
	verbose=0,
	global_lr=1.0,
	global_momentum=0.9,
	min_bucket_size=500,
	refit=False,
	tune_size=0.1,
	thin_thresholds=1,
	**kwargs,
	):

	class ChoiceDatasetGenerator(SyntheticDatasetGenerator):

	def __init__(self, dataset_type='pareto', **kwargs):
	super(ChoiceDatasetGenerator, self).__init__(
	learning_problem=CHOICE_FUNCTION, **kwargs)
	dataset_function_options = {'linear': self.make_latent_linear_choices,
	"pareto": self.make_globular_pareto_choices}
	if dataset_type not in dataset_function_options.keys():
	dataset_type = "pareto"
	self.dataset_function = dataset_function_options[dataset_type]

	if dataset_type not in dataset_function_options.keys():
	dataset_type = "medoid"

	if regularization in ["l1", "l2"]:
	self.regularization = regularization
	else:
	self.regularization = "l2"

kiudee / cs-ranking Goto Github PK

cs-ranking's People

Contributors

Stargazers

Watchers

Forkers

cs-ranking's Issues

Todo

Rationale

To do

Rationale

Todo

Rationale

Potential problems

Potential solutions

To do

Related issues

Measures

Rationale

What needs to be done

Dependencies

Rationale

Recommend Projects

Recommend Topics

Recommend Org