[Feature Request] Create demo for using dense, sparse and embedded features. about ranking HOT 13 CLOSED

ramakumar1729 commented on August 27, 2024

[Feature Request] Create demo for using dense, sparse and embedded features.

from ranking.

Comments (13)

eggie5 commented on August 27, 2024

Would it be possible to learn a ranker from pairwise data where the features are latent factors (w/o any hand made features)? Like a matrix factorization model?

So the input to the pairwise loss is the respective embeddings for two documents you are ranking...

from ranking.

ramakumar1729 commented on August 27, 2024

Yes, it is possible. transform_fn would be an appropriate place to convert your features to latent vectors. Whether you jointly train such representations with the ranker depends on your problem.

from ranking.

eggie5 commented on August 27, 2024

Yes, I was thinking about jointly training a document embedding. I have pairwise labels (A > B, etc). For each labeled pair (A,B), I'll lookup their embeddings (A_emb, B_emb) and use that as the document features. This would replace classical LTR query-document features (I don't have any queries in my context anyways). Not sure what you mean w/ the transform_fn but I'll research the code a bit.

from ranking.

eggie5 commented on August 27, 2024

Here's my example (modified from tf ranking example) of using an embedding to learn a latent factor model:

from tensorflow.feature_column import categorical_column_with_identity, embedding_column

_BUCKETS=10000
_K=10

def make_score_fn(buckets):
  """Returns a scoring function to build `EstimatorSpec`."""

  def _score_fn(context_features, group_features, mode, params, config):
    """Defines the network to score a documents."""
    del params
    del config

    item_id = categorical_column_with_identity(key='item_id', num_buckets=_BUCKETS, default_value=0)
    item_emb = embedding_column(item_id, _K)
    
    input_layer = tf.feature_column.input_layer(group_features, item_emb)

    cur_layer = input_layer
    for i, layer_width in enumerate(int(d) for d in _HIDDEN_LAYER_DIMS):
      cur_layer = tf.layers.dense(cur_layer, units=layer_width, activation="relu")

    logits = tf.layers.dense(cur_layer, units=1) #regression
    return logits

  return _score_fn

I modified my input function to only return an item_id which is an integer number that maps to an embedding. But it could be concerted w/ any other arbitrary features into fixed length vector for the FC layers.

This gets good results on my ranking task for MRR:

baseline: .54
LF model: .76

from ranking.

ramakumar1729 commented on August 27, 2024

Thanks for sharing this example, Alex. This looks great. If you wish, you could define the feature columns outside, so that you can also use them to make the parsing_spec to read in tf.Examples or tf.SequenceExamples.

from ranking.

eggie5 commented on August 27, 2024

When you say "define your feature columns outside", you mean like in the notebook example, where there is a example_feature_columns function which is called from the _score_fn function?

Also, I don't understand what the transform_fn function is for. Can you provide an example?

from ranking.

darlliu commented on August 27, 2024

Hello, I would just like to chime in that having an example of using feature columns where group_size and feature dimension is not 1 would be helpful. I can use a groupwise feature tensor directly with dimension [?, group_size (10), feature_dimension (180)] but when I put it in a numeric feature column with shape [group_size, feature_dimension] I get the following error on this line in _groupwise_dnn_v2:

      scores = _score_fn(
          large_batch_context_features, large_batch_group_features, reuse=False)

error is:

ValueError: Dimensions must be equal, but are 1800 and 180 for 'groupwise_dnn_v2/group_score/rescale/mul' (op: 'Mul') with input shapes: [?,1800], [180].

I think it has something to do with the shape to the feature column but I'm unsure what's the issue here.

from ranking.

sjhermanek commented on August 27, 2024

FWIW I ran into the same issue as @darlliu

from ranking.

ramakumar1729 commented on August 27, 2024

Please check out the demo on using sparse features and embeddings in TF-Ranking. You can click on the colab link to start executing the content of the notebook.

Tensorboard is integrated into the notebook, and you can use it to visualize the eval and loss curves.

Feel free to post your feedback by responding on this issue.

from ranking.

eggie5 commented on August 27, 2024

Thanks for posting some concrete examples in this new notebook. Some questions:

I see a new data format EIE whereas the previous examples seemed to use SequenceExample. What are some of the motivations on moving to the new format?
Also what is the intuition behind this?: _NUM_TRAIN_STEPS = 15 * 1000
"The transform function takes in the raw dense or sparse features from the input reader, applies suitable transformations to return dense representations for each feature"
I never understood what the transform function where for until reading this line. I always just made my features in the scoring function.
I noticed you moved away from tf.contrib.layers.optimize_loss. It was a nice abstraction, however it was probably replaced b/c contrib is being depreciated in tf 2.0?

from ranking.

eggie5 commented on August 27, 2024

And also, I think it would be nice to have the hyper params passed to the transform_fn just like the group_score_fn so you can do something like this:

def example_feature_columns(params):
    
    rest_id = categorical_column_with_identity(key='rid', num_buckets=item_buckets)
    rest_emb = embedding_column(rest_id, params.K)

    return {"rid": rest_emb}


def make_transform_fn():
    def _transform_fn(features, mode, params):
        """Defines transform_fn."""
        example_name = next(six.iterkeys(example_feature_columns(params)))
        input_size = tf.shape(input=features[example_name])[1]
        context_features, example_features = tfr.feature.encode_listwise_features(
            features=features,
            input_size=input_size,
            context_feature_columns=context_feature_columns(),
            example_feature_columns=example_feature_columns(),
            mode=mode,
            scope="transform_layer")

        return context_features, example_features
    return _transform_fn

See how I added params to the signature of _transform_fn so now it can accept hyper params like the group_score_fn?

The use case for this is the common one where the embedding dimensions are a hyper parameter.

Would you accept a PR from me for this?

from ranking.

ramakumar1729 commented on August 27, 2024

Hi Alex, great set of questions. Please find my replies inline.

Thanks for posting some concrete examples in this new notebook. Some questions:

I see a new data format EIE whereas the previous examples seemed to use SequenceExample. What are some of the motivations on moving to the new format?
The primary motivation for EIE is that it has the per-query and per-document (which we call context and example features) in self contained tf.Examples. SequenceExample represents in a feature major format, and EIE represents in a document major format. Does this make sense?

Also what is the intuition behind this?: _NUM_TRAIN_STEPS = 15 * 1000
The number of training steps are kept low for the sake of demonstration. This should show the curves in Tensorboard in around 15 mins. You can assign this to any number you feel appropriate.

"The transform function takes in the raw dense or sparse features from the input reader, applies suitable transformations to return dense representations for each feature"
I never understood what the transform function where for until reading this line. I always just made my features in the scoring function.
The transform function applies the transformation for all the documents only once, while the logic in group score function is applied over each group. Think of the transform function as a "pre-precocessing" before sending to scoring function.

I noticed you moved away from tf.contrib.layers.optimize_loss. It was a nice abstraction, however it was probably replaced b/c contrib is being depreciated in tf 2.0?
Yes, that is exactly the reason. Our repository is TF2.0 alpha compatible.

from ranking.

ramakumar1729 commented on August 27, 2024

I like this PR suggestion. Please go ahead with this. One thing to keep in mind is that you will need to change the model_fn builder, which expects only (features, mode) argument. See this line for more details.

And also, I think it would be nice to have the hyper params passed to the transform_fn just like the group_score_fn so you can do something like this:

def example_feature_columns(params):
    
    rest_id = categorical_column_with_identity(key='rid', num_buckets=item_buckets)
    rest_emb = embedding_column(rest_id, params.K)

    return {"rid": rest_emb}


def make_transform_fn():
    def _transform_fn(features, mode, params):
        """Defines transform_fn."""
        example_name = next(six.iterkeys(example_feature_columns(params)))
        input_size = tf.shape(input=features[example_name])[1]
        context_features, example_features = tfr.feature.encode_listwise_features(
            features=features,
            input_size=input_size,
            context_feature_columns=context_feature_columns(),
            example_feature_columns=example_feature_columns(),
            mode=mode,
            scope="transform_layer")

        return context_features, example_features
    return _transform_fn

See how I added params to the signature of _transform_fn so now it can accept hyper params like the group_score_fn?

The use case for this is the common one where the embedding dimensions are a hyper parameter.

Would you accept a PR from me for this?

from ranking.

[Feature Request] Create demo for using dense, sparse and embedded features. about ranking HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent