tensorflow / recommenders Goto Github PK

View Code? Open in Web Editor NEW

1.8K 49.0 272.0 887 KB

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.

License: Apache License 2.0

Python 99.21% Shell 0.79%

tensorflow-recommenders tensorflow recommender recommender-system

recommenders's Introduction

TensorFlow Recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.

It helps with the full workflow of building a recommender system: data preparation, model formulation, training, evaluation, and deployment.

It's built on Keras and aims to have a gentle learning curve while still giving you the flexibility to build complex models.

Installation

Make sure you have TensorFlow 2.x installed, and install from pip:

pip install tensorflow-recommenders

Documentation

Have a look at our tutorials and API reference.

Quick start

Building a factorization model for the Movielens 100K dataset is very simple (Colab):

from typing import Dict, Text

import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

# Ratings data.
ratings = tfds.load('movielens/100k-ratings', split="train")
# Features of all the available movies.
movies = tfds.load('movielens/100k-movies', split="train")

# Select the basic features.
ratings = ratings.map(lambda x: {
    "movie_id": tf.strings.to_number(x["movie_id"]),
    "user_id": tf.strings.to_number(x["user_id"])
})
movies = movies.map(lambda x: tf.strings.to_number(x["movie_id"]))

# Build a model.
class Model(tfrs.Model):

  def __init__(self):
    super().__init__()

    # Set up user representation.
    self.user_model = tf.keras.layers.Embedding(
        input_dim=2000, output_dim=64)
    # Set up movie representation.
    self.item_model = tf.keras.layers.Embedding(
        input_dim=2000, output_dim=64)
    # Set up a retrieval task and evaluation metrics over the
    # entire dataset of candidates.
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(128).map(self.item_model)
        )
    )

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

    user_embeddings = self.user_model(features["user_id"])
    movie_embeddings = self.item_model(features["movie_id"])

    return self.task(user_embeddings, movie_embeddings)


model = Model()
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))

# Randomly shuffle data and split between train and test.
tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

# Train.
model.fit(train.batch(4096), epochs=5)

# Evaluate.
model.evaluate(test.batch(4096), return_dict=True)

recommenders's People

Contributors

Stargazers

Watchers

Forkers

kiminh flychen50 allensmile jiadiwu sallyyinzhang nanaakwasiabayieboateng vslyu shainaraza bahadirbulut micseb whongyi returaj konstantinklepikov albertleers thecodemasterk hojinyang sujitahirrao zalihat vreyespue thewayoftherob trawely ml-and-ai-repo linkonbsmrstu cxz sharadgupta27 drugintelligence dcthang lucifer2288 albertvillanova jshang tyborra anisayari fjkfwz tfresources woniupapa liuq4360 dreadlord1984 king-xiang xiaming9880 wqw123 qianrenjian ericdoug-qi windowxiaoming mredencom sunamo91 kemnhan 4322vipul flipper-afk mr-nineteen mathlf2015 jamindy beoy samuelmarks ssusantachary bakanggaolekwe zhangruiskyline shenzhun pu55yf3r isabella232 anishbapna kingtelepuz5 dmacjam joserfjuniorllms smellslikeml finite-abelian amliuyong summitchen ruthnduta ahmadqerm d34d10ck mitchellg martinbagaram candamchoi wangxueliangustc cmgreen210 duanchao xiaolancara khairi-brahmi a1r33s-thewarman veryvegetable-dev cjmcgraw dimatolsto vivicoco angadhjot abwhb jancervenka jmillanm evanfwelch poinwater joseph-chan saiprasad16 kanchishimono beijinggao drtinumohan nbswords ldmichel mldangelo hjh1213 jafaircl tpnguyen

recommenders's Issues

How do user model and movie model relate in retrieval example?

In the retrieval example, the user model is created with user ids

user_model = tf.keras.Sequential([
  tf.keras.layers.experimental.preprocessing.StringLookup(
      vocabulary=unique_user_ids, mask_token=None),
  tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
])

The movie model is created with movie titles

movie_model = tf.keras.Sequential([
  tf.keras.layers.experimental.preprocessing.StringLookup(
      vocabulary=unique_movie_titles, mask_token=None),
  tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)
])

I thought the example wants to build a relation between user id and movie title when seeing this code.

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
})
movies = movies.map(lambda x: x["movie_title"])

But then the example only keeps user id in ratings

user_ids = ratings.batch(1_000_000).map(lambda x: x["user_id"])

I am confused that how these two properties relate to each other if we don't know the corresponding movie title of that user id.

Does tensorflow recommenders support terabyte-scale embedding table?

Dear recommenders authors,

Suppose the number of features is 10^9, and the embedding size is 1000, then the size of the embedding table is 10^12, which is terabyte scale. GPU or CPU, does recommenders support an embedding table this large? And, would parameter server strategy be the only strategy that I should use?

Thank you for your time!

can't predict after saving then loading model

Hj, Thanks for creating tfrs. I examine a recommendation model based on tfrs, after that, I fit, predict and save model ok, but when loading model with tf.keras.models.load_model, the summary after loading model:

when predicting: the loaded model show error:

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
Positional arguments (3 total):
* {'card_id': <tf.Tensor 'inputs:0' shape=(None, 1) dtype=string>, 'card_title': <tf.Tensor 'inputs_2:0' shape=(None, 1) dtype=string>, 'card_topic_id': <tf.Tensor 'inputs_3:0' shape=(None, 1) dtype=string>, 'user_id': <tf.Tensor 'inputs_5:0' shape=(None, 1) dtype=string>, 'interested_topic_id': <tf.Tensor 'inputs_4:0' shape=(None, 1) dtype=string>, 'card_recency': <tf.Tensor 'inputs_1:0' shape=(None, 1) dtype=float32>}
* False
* None
Keyword arguments: {}

Expected these arguments to match one of the following 4 option(s).....

How to solve this issues? Tks again.

KeyError: 'user_id'

tensorflow.python.autograph.pyct.errors.KeyError: in converted code:

main.py:18 None  *
    "movie_title": x["movie_title"],

KeyError: 'user_id'

How to pass multiple inputs into the BruteForce index?

I'm trying to use the index on my model which requires a dict as inputs (similar to [https://www.tensorflow.org/recommenders/examples/deep_recommenders#combined_model](this tutorial)). How do I input a query correctly when I call the index? I'm getting multiple errors.

This is my last attempt:

index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
# recommends products out of the entire products dataset.
index.index(train.batch(100).map(model.candidate_model), 
            train.batch(100).map(lambda x: model.candidate_model({'product_id': x['product_id']})))

test_query = {
        "last_100_product_views": train_df["last_100_product_views"][0],
        "last_100_purchases": train_df["last_100_purchases"][0],
        "last_100_searches": train_df["last_100_searches"][0],
        "user_gender": train_df["user_gender"][0],
        "user_country": train_df["user_country"][0],
    }
# Get recommendations.
_, titles = index(dict(train_df[['last_100_product_views',
                                        'last_100_purchases',
                                        'last_100_searches',
                                        'user_gender',
                                        'user_country',]].iloc[0]))

print(f"Recommendations for user 42: {titles[0, :3]}")```

How can I build a model using the Functional APIs?

I'm trying to build a model using the functional APIs to define a keras.Model object instead of creating a subclass like in the tutorials. However, I'm struggling to understand how to tie everything together (e.g. compute the loss, pass the right variables in the Input layers).

Before proceeding further I just want to know if this is even possible/advised? The reason why I'm doing this is that I need to share embedding layers across features.

Any advice is really appreciated.

Saving user_model for serving (Serialized keras model Vs TF SavedModel)

In the retrieval tutorial it says that 'we can either serialize the Keras model directly' or export it to a SavedModel format.

But when I try to save the model.user_model as a keras h5 file as follows:

with tempfile.TemporaryDirectory() as tmp:
  path = os.path.join(tmp, "query_model")
  model.user_model.save(filepath=path,save_format='h5')

I get the following error:

NotImplementedError: Save or restore weights that is not an instance of `tf.Variable` is not supported in h5, use `save_format='tf'` instead. Got a model or layer StringLookup with weights [<tensorflow.python.keras.engine.base_layer_utils.TrackableWeightHandler object at 0x7f0b8c1fe518>]

I notice in the keras documentation it states:

Models built with the Sequential and Functional API can be saved to both the HDF5 and SavedModel formats. Subclassed models can only be saved with the SavedModel format.

Questions:

Can we save the user_model as a single keras (.h5 file)? And if so what is wrong with the implementation above? This tf saving guide mentions that In order to save/load a model with a subclassed model, you should overwrite the get_config and optionally from_config methods?
If we have to save the user_model as a TF SavedModel format it would appear that we need to have the full directory of info ((i.e. a directory of assets, saved_model.pb, variables) in place to load the model back into memory. We typically push trained models, indices etc. to S3 buckets for eventual downloading and serving? Having to save, publish, download and recreate a full directory structure for TF Saved Model data seems overly complicated? Is there an easier way that I'm not thinking of...eg. tar/zip directory?

Thanks for any guidance on the above.
Being able to encapsulate model.user_model in a single keras .h5 model would really make our lives a lot easier :)

Best way to pass pre-trained image embeddings into model

I'm trying to incorporate pre-trained image embeddings into my ranking model.
I'll outline some high level details on my approach below and then 2 short questions I had on the implementation:

1. Building the tf dataset:
I start with a datframe of style_ids(int), user_ids(int) and image_embeddings(np arrays).

        style = np.array([[i] for i in self.df_interactions['style'].values])
        user_id = np.array([[i] for i in self.df_interactions['user_id'].values])
        image_embedding = np.array([i for i in self.df_interactions['embedding'].values])

        interactions_tf_dataset = {'user_id': user_id,
                                   'style': style,
                                   'image_embedding':image_embedding}
     interactions = tf.data.Dataset.from_tensor_slices(interactions_tf_dataset)

2. Concatenating Embeddings with Pre-trained Image Embeddings
I'm setting up the user and style embeddings as normal but this time I'm concatenating pre-trained image embeddings directly in the call. The concatenated embeddings are of shape: (?, 1, 64) (?, 1, 64) and (?, 1,64)

  def call(self, inputs):
    user_embedding = self.user_embeddings(inputs['user_id'])
    style_embedding = self.style_embeddings(inputs['style_id'])
    image_embeddings = tf.expand_dims(features['image_embedding'], axis=1)

    concatenated_embeddings = tf.concat([user_embedding, style_embedding, image_embeddings], axis=1))

Questions:

To be able to concatenate the image embeddings with the user, style embeddings I compressed them down to the same dimension (i.e. I kept all to 64 dim). Do I need to do this? Or can I keep my image embeddings at say 1000 dim and concatenate along a different access? (eg. axis=-1, axis=2)?
My training appears more unstable since adding the image embeddings? I normalized the pre-trained image embeddings to be in the range (-1,1) before passing them into the model but is there a different normalization strategy I should be using? Or is there a mismatch between initialized style/id embeddings and the input image embedding vectors which are fixed?

Thanks in advance for any help/pointers!

Aside:
One potential solution I was looking at is based on how pre-trained word embeddings are loadded in kearas: https://keras.io/examples/nlp/pretrained_word_embeddings/

Evaluate retrievial model with MRR

Is it possibile to add an example how to evaluate the trained retrieval model with MRR@K in addition to FactorizedTopK?
I have found that tensorflow ranking has MRR evaluation.

Thank you.

Error when saving BruteForce layer to use by TensorFlow Serving

Following the quickstart, i'm trying to save the BruteForce model to use by TensorFlow Serving. The goal is to pass a user_id to TensorFlow Serving endpoint and return a list of movie_titles

...
index = tfrs.layers.ann.BruteForce(model.user_model)
index.save(tensorflow_saved_model_path_bruteforce_model, save_format='tf')

ValueError: Model <tensorflow_recommenders.layers.ann.BruteForce object at 0x7f6e9c2aa7b8> cannot be saved because the input shapes have not been set. Usually, input shapes are automatically determined from calling `.fit()` or `.predict()`. To manually set the shapes, call `model.build(input_shape)`.

I tried explicitly calling index.build(1), etc, but this is leading to more errors.

(This is a continuation of this thread...)

model.predict in colab DNN

Sorry for the beginner issue but I am doing the DNN colab and I'm just trying to make a prediction.
What data format/shape am i supposed to pass in model.predict to get a prediction as I keep getting this error :

NotImplementedError: When subclassing the Model class, you should implement a call method.

Thank you

Making recommendations from the "Building deep retrieval models" tutorial

I've been following the Building deep retrieval models tutorial and after fitting/evaluating the model, I tried to make predictions using the following code (as seen is other tutorials):

# Use brute-force search to set up retrieval using the trained representations.
index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
index.index(movies.batch(100).map(model.candidate_model), movies)

# Get some recommendations.
_, titles = index(tf.constant(["42"]))
print(f"Top 3 recommendations for user 42: {titles[0, :3]}")

But I'm getting this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-46-a21dd6bad2ac> in <module>()
      1 # Get some recommendations.
----> 2 _, titles = index(tf.constant(["42"]))
      3 print(f"Top 3 recommendations for user 42: {titles[0, :3]}")

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py in _check_index(idx)
    863     # TODO(slebedev): IndexError seems more appropriate here, but it
    864     # will break `_slice_helper` contract.
--> 865     raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx))
    866 
    867 

TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got 'user_id'

I'm sort of new to TensorFlow, so not sure if I'm missing something super obvious. Any help would be appreciated!

ValueError: The model cannot be compiled because it has no loss to optimize.

i run quickStart，but get this error when fitting model:
ValueError: The model cannot be compiled because it has no loss to optimize.

Using bruteforce to retrieve recommendations

In this example: https://github.com/tensorflow/recommenders/blob/main/docs/examples/deep_recommenders.ipynb
Could you give an example of how to use the tfrs.layers.ann.BruteForce to retrieve recommendations?

Advice on improving accuracy of the TFRS examples

After running all of the examples, i'm only seeing 20-25% accuracy.

Any tips on improving the accuracy? Is there a different dataset that we should use to highlight the awesomeness of this framework?

movielens.py and nbtools.py

What are movielens.py and nbtools.py for?
I couldn't find the functions inside movielens.py and nbtools.py are used in this git repo.

Thank you!

User model with UNK tokens

I have seen from the tutorial that you add a +1 to take in account missing tokens like in the example here:

user_model = tf.keras.Sequential([
  tf.keras.layers.experimental.preprocessing.StringLookup(
      vocabulary=unique_user_ids, mask_token=None),
  # We add an additional embedding to account for unknown tokens.
  tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
])

Now, in my User model, I have users with missing features, like missing age, therefore I may have

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "timestamp": x["timestamp"],
    "age": x["age"],
})

where age may have a NULL value (because it is missing). So will it be considered as a OOV / UNK token automatically if I do like:

 self.age_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.Discretization(age_buckets.tolist()),
        tf.keras.layers.Embedding(len(age_buckets) + 1, 16),
    ])

Making predictions with features embedding and Dense Layers

Hello ,
I am currently following the notebook example : context_features.ipynb .

After the model is trained I am trying to make prediction from a new input.

I am bulding my input as following in order to get a single input with a timestamp and a user_id:

prediction_dataset = (
    tf.data.Dataset.from_tensor_slices(
        (
            tf.cast([1558656000], tf.int64),            
            tf.cast(['42'], tf.string),
        )
    )
)

prediction_tensor = prediction_dataset.map(lambda x,y: {
    "timestamp":x,
    "user_id":y,
})

But when I am trying to make prediction (like in basic_retrieval.ipynb
Doing :

index = tfrs.layers.factorized_top_k.BruteForce(UserModel(use_timestamps=True))
index.index(movies.batch(100).map(MovieModel()), movies)

# Get recommendations.
for pred in prediction_tensor.take(1):
    _, titles = index(pred)
    print(f"Recommendations for user 42: {titles[0, :3]}")

I am getting this error :
InvalidArgumentError: ConcatOp : Expected concatenating dimensions in the range [-1, 1), but got 1 [Op:ConcatV2] name: concat

And when I am trying to use the model.predict() function on my prediction_tensor :

model.predict(prediction_tensor)

I am getting this error :
NotImplementedError: When subclassing the `Model` class, you should implement a `call` method.

Please can you advice me how can I run prediction here ? Maybe I am doing something wrong here but i am stucked with this error for 2 days now. Any help will be really appreciated. Maybe we could update the notebook deep_recommenders with a section Making predictions as well.

Unexpected git status after cloning in macOS using git v. 2.24.1

Important note: this issue does not happen when using the git v. 2.11.0 under Linux.

Dear all, I got a strange git status when cloning the repo into macOS, in which I have the git v. 2.24.1 installed.
It looks like the main branch is kind of corrupted (see below). The problem arises within the folder tools (deleted files).
Did you see before something like that? Maybe it has to do with git hooks? Can it be easily fixed?

Disclaimer: maybe it does not have anything to do with the repo itself. If so, please excuse me (I am just trying to be helpful).

$ git --version
git version 2.24.1

$ git clone https://github.com/tensorflow/recommenders.git
Cloning into 'recommenders'...
remote: Enumerating objects: 68, done.
remote: Counting objects: 100% (68/68), done.
remote: Compressing objects: 100% (50/50), done.
remote: Total 810 (delta 32), reused 49 (delta 18), pack-reused 742
Receiving objects: 100% (810/810), 316.14 KiB | 969.00 KiB/s, done.
Resolving deltas: 100% (555/555), done.

$ cd recommenders

$ git status -v
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    tools/BUILD

no changes added to commit (use "git add" and/or "git commit -a")

$ git reset --hard
HEAD is now at f48e3ed Merge pull request #81 from hyphmongo:hyphmongo-patch-1

git status -v
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    tools/build/pip_install.sh
	deleted:    tools/build/release.sh
	deleted:    tools/build/test.sh
	deleted:    tools/build/utils.sh

no changes added to commit (use "git add" and/or "git commit -a")

$ git reset --hard
HEAD is now at f48e3ed Merge pull request #81 from hyphmongo:hyphmongo-patch-1

$ git status -v
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    tools/BUILD

no changes added to commit (use "git add" and/or "git commit -a")

Many thanks and best regards.

issue about tutorial efficient serving

I wonder what server you run the tutorial on. My timecost for scann index doesn't align with yours.

my screen shot w.r.t Approximate prediction is as follows:

M screen shot w.r.t Tuning ScaNN is as follows:

It seems my timecost is always 10 times longer than yours. On the other side，the brute force index runs only for 2ms?

I didn't change any code. My server has 4 V100-16GB Gpus (CUDA 10.2) and 40 cpus. And my python env lists as follows:

# packages in environment at /root/Softwares/anaconda3/envs/tf2.0:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
absl-py                   0.11.0                   pypi_0    pypi
astunparse                1.6.3                    pypi_0    pypi
attrs                     20.3.0                   pypi_0    pypi
backcall                  0.2.0                      py_0
ca-certificates           2020.10.14                    0
cachetools                4.1.1                    pypi_0    pypi
certifi                   2020.11.8        py37h06a4308_0
chardet                   3.0.4                    pypi_0    pypi
decorator                 4.4.2                      py_0
dill                      0.3.3                    pypi_0    pypi
faiss-gpu                 1.6.5                    pypi_0    pypi
future                    0.18.2                   pypi_0    pypi
gast                      0.3.3                    pypi_0    pypi
google-auth               1.23.0                   pypi_0    pypi
google-auth-oauthlib      0.4.2                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
googleapis-common-protos  1.52.0                   pypi_0    pypi
grpcio                    1.33.2                   pypi_0    pypi
h5py                      2.10.0                   pypi_0    pypi
idna                      2.10                     pypi_0    pypi
importlib-metadata        3.1.0                    pypi_0    pypi
importlib-resources       3.3.0                    pypi_0    pypi
ipykernel                 5.3.4            py37h5ca1d4c_0
ipython                   7.19.0           py37hb070fc8_0
ipython_genutils          0.2.0                    py37_0
jedi                      0.17.2                   py37_0
jupyter_client            6.1.7                      py_0
jupyter_core              4.7.0            py37h06a4308_0
keras-preprocessing       1.1.2                    pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7
libedit                   3.1.20191231         h14c3975_1
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.1.0                hdf63c60_0
libsodium                 1.0.18               h7b6447c_0
libstdcxx-ng              9.1.0                hdf63c60_0
markdown                  3.3.3                    pypi_0    pypi
ncurses                   6.2                  he6710b0_1
numpy                     1.18.5                   pypi_0    pypi
oauthlib                  3.1.0                    pypi_0    pypi
openssl                   1.1.1h               h7b6447c_0
opt-einsum                3.3.0                    pypi_0    pypi
pandas                    1.1.4                    pypi_0    pypi
parso                     0.7.0                      py_0
pexpect                   4.8.0              pyhd3eb1b0_3
pickleshare               0.7.5                 py37_1001
pip                       20.2.2                   py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
promise                   2.3                      pypi_0    pypi
prompt-toolkit            3.0.8                      py_0
protobuf                  3.14.0                   pypi_0    pypi
ptyprocess                0.6.0              pyhd3eb1b0_2
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pygments                  2.7.2              pyhd3eb1b0_0
python                    3.7.9                h7579374_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil           2.8.1                      py_0
pytz                      2020.4                   pypi_0    pypi
pyzmq                     19.0.2           py37he6710b0_1
readline                  8.0                  h7b6447c_0
requests                  2.25.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.6                      pypi_0    pypi
scann                     1.1.1                    pypi_0    pypi
setuptools                49.6.0                   py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
six                       1.15.0           py37h06a4308_0
sqlite                    3.33.0               h62c20be_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tabulate                  0.8.7                    pypi_0    pypi
tensorboard               2.4.0                    pypi_0    pypi
tensorboard-plugin-wit    1.7.0                    pypi_0    pypi
tensorflow                2.3.1                    pypi_0    pypi
tensorflow-datasets       4.1.0                    pypi_0    pypi
tensorflow-estimator      2.3.0                    pypi_0    pypi
tensorflow-metadata       0.25.0                   pypi_0    pypi
tensorflow-recommenders   0.3.0                    pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
tk                        8.6.10               hbc83047_0
tornado                   6.0.4            py37h7b6447c_1
tqdm                      4.51.0             pyhd3eb1b0_0
traitlets                 5.0.5                      py_0
typing-extensions         3.7.4.3                  pypi_0    pypi
urllib3                   1.26.2                   pypi_0    pypi
wcwidth                   0.2.5                      py_0
werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.35.1                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wrapt                     1.12.1                   pypi_0    pypi
xz                        5.2.5                h7b6447c_0
zeromq                    4.3.3                he6710b0_3
zipp                      3.4.0                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3

SampledSoftmax Loss in Retrieval

Hi, as is shown in the basic_retrieval tutorial, we seem to use tf.keras.losses.CategoricalCrossentropy loss as default.

I wonder if there is any difference between that and tf.nn.sampled_softmax_loss? In my view, which is also mentioned in YoutubeDNN paper(Google 2016), it might be better to use sampled softmax (corresponding to the multi-class classification) since we are at the retrieval stage?
If so, how can we incorporate sampled softmax in the model as we are using keras as the high-level api. Any example code?

Correct data formulation when using both User and Item features (question)

I am trying to build an emotion-aware recommendation system for music by adapting the TFRS tutorials on my data's needs but I have a little trouble understanding the required formulation of the data for the Item-Side features.

More specifically, I treat the "current emotion" values similarly to the tutorial's "timestamps". A dynamic, 'contextual' user-side feature that may vary with each interaction. Emotions are mapped in the Interaction matrix (tutorial's ratings matrix) :

interactions = tf_data.map(lambda x: {
    "user_name": x["user_name"],
    "album_name": x["album_name"],
    "user_emotion": x["user_emotion"]})

the values are preprocessed in the User_Model and used in the Query Embeddings in the 'compute_loss' function.
This part seems to work properly.

My trouble is with adding Item Features. I have created a tensor dataset for music features (each Unique Track, its Genre etc):

albums = tf_album_data.map(lambda x: {
    "album_name": x["album_name"],
    "music_genres": x["music_genres"]})

which are preprocessed in the Music_Model (similar to the Movie_Model) and then used for the Candidate_Model in 'compute_loss' function. Using this structure, where the user and music features are separate, i get a KeyError: "music_genres". It does not recognize the item features.

Only when i add the "music_genres" in the Interaction (ratings) matrix the error disappears but i can not verify if that is the correct formulation. Is the a aforementioned architecture okay or should the user and item features be kept seperate and thus the error may lie elsewhere?

Thank you very much in advance!

Hope to add some deep learning ctr/cvr model

Such as Wide&Deep, DeepFM, Deep&Cross network, DIN, DIEN, xDeepFM, ONN, AutoInt etc.

Question | Would it be a good idea to share an embedding layer between inputs and labels?

Let's say in the movielens dataset I have another column where each row is the list of the last 100 movies clicked by that user.

user_id	movie_watched_id	last_100_movies_clicked_ids
'123'	'67865'	'4543534 345435435 657657'

Would it make sense to encode the movie_watched_id in the candidate tower using the embedding layer used in the query tower to embed last_100_movies_clicked_ids?

Like, having this user model:

class UserModel(tf.keras.Model):

  def __init__(self):
    super().__init__()

    self.user_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary=unique_user_ids, mask_token=None),
        tf.keras.layers.Embedding(len(unique_user_ids) + 1, 32),
    ])

    self.last_100_movies_clicked_ids_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.TextVectorizer(
            vocabulary=unique_last_100_movies_clicked_ids, mask_token=None),
        tf.keras.layers.Embedding(len(unique_last_100_movies_clicked_ids) + 1, 32),
    ])

  def call(self, inputs):
    # Take the input dictionary, pass it through each input layer,
    # and concatenate the result.
    return tf.concat([self.user_embedding(inputs["user_id"]), 
   					  self.last_100_movies_clicked_ids_embedding(inputs["last_100_movies_clicked_ids"])
        			])

We define the movie model as:

class MovieModel(tf.keras.Model):

  def __init__(self, user_model):
    super().__init__()

    max_tokens = 10_000

    self.movie_watched_id_embedding = user_model.last_100_movies_clicked_ids_embedding

  def call(self):
    return self.movie_watched_id_embedding(inputs['movie_watched_id'])

Increasing validation loss during training

I'm following the notebook of this tutorial (deep_recommenders) and I see that the dataset is split only in two parts (train and test).

For other kind of tasks, like classification, I know that dataset is split in train, validation and test in order to use, for example, early stopping on validation metrics to avoid to overfit the model on train data. I think I can use early stopping also while I'm training deep neural networks for recommendations, but running the notebook mentioned before without any changes the loss on test set grows up during the training instead of decreasing. It is a normal behavior?

Another question (maybe related to this). Why is the batch size so large? (2048 for train set and 4096 for test set) I notice that changing it cause a big variation on loss value..

I report below some information about training params and metrics of last training epochs

one_layer_history = model.fit(
cached_train,
validation_data=cached_test,
validation_freq=5,
epochs=300,
verbose=1)

Epoch 275/300
40/40 [==============================] - 9s 222ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11909.4069 - regularization_loss: 0.0000e+00 - total_loss: 11909.4069 - val_factorized_top_k/top_1_categorical_accuracy: 1.0000e-04 - val_factorized_top_k/top_5_categorical_accuracy: 0.0014 - val_factorized_top_k/top_10_categorical_accuracy: 0.0053 - val_factorized_top_k/top_50_categorical_accuracy: 0.1135 - val_factorized_top_k/top_100_categorical_accuracy: 0.2682 - val_loss: 32880.6445 - val_regularization_loss: 0.0000e+00 - val_total_loss: 32880.6445
Epoch 276/300
40/40 [==============================] - 4s 92ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11908.4688 - regularization_loss: 0.0000e+00 - total_loss: 11908.4688
Epoch 277/300
40/40 [==============================] - 4s 93ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11905.8294 - regularization_loss: 0.0000e+00 - total_loss: 11905.8294
Epoch 278/300
40/40 [==============================] - 4s 92ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11906.1159 - regularization_loss: 0.0000e+00 - total_loss: 11906.1159
Epoch 279/300
40/40 [==============================] - 4s 90ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11906.5296 - regularization_loss: 0.0000e+00 - total_loss: 11906.5296
Epoch 280/300
40/40 [==============================] - 9s 215ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11906.5406 - regularization_loss: 0.0000e+00 - total_loss: 11906.5406 - val_factorized_top_k/top_1_categorical_accuracy: 2.5000e-04 - val_factorized_top_k/top_5_categorical_accuracy: 0.0019 - val_factorized_top_k/top_10_categorical_accuracy: 0.0052 - val_factorized_top_k/top_50_categorical_accuracy: 0.1126 - val_factorized_top_k/top_100_categorical_accuracy: 0.2677 - val_loss: 32925.2422 - val_regularization_loss: 0.0000e+00 - val_total_loss: 32925.2422
Epoch 281/300
40/40 [==============================] - 3s 87ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11903.4211 - regularization_loss: 0.0000e+00 - total_loss: 11903.4211
Epoch 282/300
40/40 [==============================] - 3s 87ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11904.1313 - regularization_loss: 0.0000e+00 - total_loss: 11904.1313
Epoch 283/300
40/40 [==============================] - 4s 89ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11904.1110 - regularization_loss: 0.0000e+00 - total_loss: 11904.1110
Epoch 284/300
40/40 [==============================] - 4s 90ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11902.0310 - regularization_loss: 0.0000e+00 - total_loss: 11902.0310
Epoch 285/300
40/40 [==============================] - 9s 213ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11901.5600 - regularization_loss: 0.0000e+00 - total_loss: 11901.5600 - val_factorized_top_k/top_1_categorical_accuracy: 1.0000e-04 - val_factorized_top_k/top_5_categorical_accuracy: 0.0018 - val_factorized_top_k/top_10_categorical_accuracy: 0.0052 - val_factorized_top_k/top_50_categorical_accuracy: 0.1131 - val_factorized_top_k/top_100_categorical_accuracy: 0.2706 - val_loss: 32949.3906 - val_regularization_loss: 0.0000e+00 - val_total_loss: 32949.3906
Epoch 286/300
40/40 [==============================] - 4s 88ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11901.8011 - regularization_loss: 0.0000e+00 - total_loss: 11901.8011
Epoch 287/300
40/40 [==============================] - 4s 88ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11900.6831 - regularization_loss: 0.0000e+00 - total_loss: 11900.6831
Epoch 288/300
40/40 [==============================] - 3s 87ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11900.3813 - regularization_loss: 0.0000e+00 - total_loss: 11900.3813
Epoch 289/300
40/40 [==============================] - 3s 87ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11900.7525 - regularization_loss: 0.0000e+00 - total_loss: 11900.7525
Epoch 290/300
40/40 [==============================] - 13s 322ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11899.1266 - regularization_loss: 0.0000e+00 - total_loss: 11899.1266 - val_factorized_top_k/top_1_categorical_accuracy: 1.0000e-04 - val_factorized_top_k/top_5_categorical_accuracy: 0.0018 - val_factorized_top_k/top_10_categorical_accuracy: 0.0049 - val_factorized_top_k/top_50_categorical_accuracy: 0.1132 - val_factorized_top_k/top_100_categorical_accuracy: 0.2705 - val_loss: 32987.5469 - val_regularization_loss: 0.0000e+00 - val_total_loss: 32987.5469
Epoch 291/300
40/40 [==============================] - 3s 86ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11899.5831 - regularization_loss: 0.0000e+00 - total_loss: 11899.5831
Epoch 292/300
40/40 [==============================] - 3s 86ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11896.3421 - regularization_loss: 0.0000e+00 - total_loss: 11896.3421
Epoch 293/300
40/40 [==============================] - 3s 85ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11897.9094 - regularization_loss: 0.0000e+00 - total_loss: 11897.9094
Epoch 294/300
40/40 [==============================] - 3s 83ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11896.4027 - regularization_loss: 0.0000e+00 - total_loss: 11896.4027
Epoch 295/300
40/40 [==============================] - 8s 206ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11896.5980 - regularization_loss: 0.0000e+00 - total_loss: 11896.5980 - val_factorized_top_k/top_1_categorical_accuracy: 2.5000e-04 - val_factorized_top_k/top_5_categorical_accuracy: 0.0018 - val_factorized_top_k/top_10_categorical_accuracy: 0.0054 - val_factorized_top_k/top_50_categorical_accuracy: 0.1119 - val_factorized_top_k/top_100_categorical_accuracy: 0.2693 - val_loss: 33031.1914 - val_regularization_loss: 0.0000e+00 - val_total_loss: 33031.1914
Epoch 296/300
40/40 [==============================] - 3s 83ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11896.8725 - regularization_loss: 0.0000e+00 - total_loss: 11896.8725
Epoch 297/300
40/40 [==============================] - 3s 84ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11894.4259 - regularization_loss: 0.0000e+00 - total_loss: 11894.4259
Epoch 298/300
40/40 [==============================] - 3s 84ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11894.5416 - regularization_loss: 0.0000e+00 - total_loss: 11894.5416
Epoch 299/300
40/40 [==============================] - 3s 84ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11893.9245 - regularization_loss: 0.0000e+00 - total_loss: 11893.9245
Epoch 300/300
40/40 [==============================] - 8s 203ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_100_categorical_accuracy: 0.0000e+00 - loss: 11894.7852 - regularization_loss: 0.0000e+00 - total_loss: 11894.7852 - val_factorized_top_k/top_1_categorical_accuracy: 2.0000e-04 - val_factorized_top_k/top_5_categorical_accuracy: 0.0018 - val_factorized_top_k/top_10_categorical_accuracy: 0.0049 - val_factorized_top_k/top_50_categorical_accuracy: 0.1121 - val_factorized_top_k/top_100_categorical_accuracy: 0.2691 - val_loss: 33056.2227 - val_regularization_loss: 0.0000e+00 - val_total_loss: 33056.2227
Top-100 accuracy: 0.27.

Unable to save multi-task recommender model

I was following the guide for the multi-task recommender found here, but when I tried to save using model.save(), I was unable to to do so with the following error:

WARNING:tensorflow:Skipping full serialization of Keras layer <tensorflow_recommenders.metrics.factorized_top_k.FactorizedTopK object at 0x7fa43148ae80>, because it is not built.
WARNING:tensorflow:Skipping full serialization of Keras layer <tensorflow_recommenders.metrics.factorized_top_k.FactorizedTopK object at 0x7fa43148ae80>, because it is not built.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:2309: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:2309: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-10-d6cd16d67b34> in <module>()
----> 1 model.save('./')

23 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

FailedPreconditionError: Failed to serialize the input pipeline graph: ResourceGather is stateful. 

- [ ] 

- [ ] [Op:DatasetToGraphV2]

I also cannot save it in HDF5 format, but I believe that's because the model in question is a custom subclassing of the model class. What is the appropriate way to save the model?

Can I use `scann` on Windows.

Hi, I'm not sure but am I able to install scann on Windows. pip says it can't find it and I read here: "manylinux2014-compatible wheels are available on PyPI". Does that mean its only available for Linux? Any clarification would be appreciated.

ValueError: The first argument to `Layer.call` must always be passed.

Hi. I'm trying to reproduce example from: https://www.tensorflow.org/recommenders/examples/movielens_side_information

I'm doing exactly like the tutorial, defining classes, methods etc. I'm using Google Colab.
But when I try to fit the model (exactly like the tutorial):
model.fit(cached_train, epochs=3)

I receive the error:

ValueError: The first argument to `Layer.call` must always be passed.

I think that this errors is due the __call__ method in the Models.
This are the models that the tutorial used.
User model

embedding_dimension = 32

class UserModel(tf.keras.Model):

  def __init__(self, embedding_dimension, timestamp_buckets):
    super(UserModel, self).__init__()
    # An embedding column for user ids.
    user_id_feature = tf.feature_column.embedding_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            "user_id", unique_user_ids,
        ),
        embedding_dimension,
    )

    # An embedding column for the bucketized timestamps: there will be a separate
    # embedding for each of the timestamp buckets.
    time_feature = tf.feature_column.embedding_column(
        tf.feature_column.bucketized_column(
            tf.feature_column.numeric_column("timestamp"),
            timestamp_buckets.tolist(),
        ),
        embedding_dimension,
    )
    self.embedding_layer = tf.keras.layers.DenseFeatures(
        [user_id_feature, time_feature],
        name="query_embedding",
    )
    self.dense_layer = tf.keras.layers.Dense(embedding_dimension)

   # maybe the error is here?
  def call(self, inputs):
    input_embedding = self.embedding_layer(inputs)
    return self.dense_layer(input_embedding)

Movie model

class MovieModel(tf.keras.Model):

  def __init__(self, embedding_dimension):
    super(MovieModel, self).__init__()
    movie_features = [tf.feature_column.embedding_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            "movie_id", unique_movie_ids,
        ),
        embedding_dimension,
    )]
    self.embedding_layer = tf.keras.layers.DenseFeatures(movie_features, name="movie_embedding")
  
  # maybe the error is here?
  def call(self, inputs):
    return self.embedding_layer(inputs)

And this is the model that user the two models above:

class MovielensModel(tfrs.models.Model):

  def __init__(self, embedding_dimension, timestamp_buckets):
    super().__init__()
    self.user_model: tf.keras.Model = UserModel(embedding_dimension, timestamp_buckets)
    self.movie_model: tf.keras.Model = MovieModel(embedding_dimension)
    self.task = tfrs.tasks.Retrieval(
        loss=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(128).map(self.movie_model)
        )
    )

  def compute_loss(self, features, training=False):
    user_embeddings = self.user_model({"user_id": features["user_id"],
                                       "timestamp": features["timestamp"]})
    positive_movie_embeddings = self.movie_model(
        {"movie_id": features["movie_id"]}
    )

    return self.task(user_embeddings, positive_movie_embeddings)

Tensorboard error when using quickstart model

I was able to get the Quickstart model running seamlessly.

However, I then tried to add a Tensorboard callback, by following the Tensorboard Quickstart.

In the tfrs API docs, it says:

Note that this base class is a thin conveniece wrapper for tf.keras.Model.

Since it is a Keras model, I followed this section: Using TensorBoard with Keras Model.fit()

This means I replaced this line in Quickstart:

model.fit(ratings.batch(4096), epochs=3)

with these lines:

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

model.fit(ratings.batch(4096), epochs=3, callbacks=[tensorboard_callback])

(in other words, adding the callbacks= input argument`)

When I run this, I get the following error:

2020-10-01 21:00:11.235419: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223]  GpuTracer has collected 0 callback api events and 0 activity events. 
2020-10-01 21:00:11.543177: I tensorflow/core/profiler/rpc/client/save_profile.cc:176] Creating directory: logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11
2020-10-01 21:00:11.631104: I tensorflow/core/profiler/rpc/client/save_profile.cc:182] Dumped gzipped tool data for trace.json.gz to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.trace.json.gz
2020-10-01 21:00:11.733572: I tensorflow/core/profiler/rpc/client/save_profile.cc:176] Creating directory: logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11
2020-10-01 21:00:11.736341: I tensorflow/core/profiler/rpc/client/save_profile.cc:182] Dumped gzipped tool data for memory_profile.json.gz to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.memory_profile.json.gz
2020-10-01 21:00:11.737640: I tensorflow/python/profiler/internal/profiler_wrapper.cc:111] Creating directory: logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11Dumped tool data for xplane.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.xplane.pb
Dumped tool data for overview_page.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.overview_page.pb
Dumped tool data for input_pipeline.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logs/fit/20201001-210007/train/plugins/profile/2020_10_01_21_00_11/temp-instance.kernel_stats.pb

 2/25 [=>............................] - ETA: 9s - factorized_top_k: 7.5684e-04 - factorized_top_k/top_1_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_5_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_10_categorical_accuracy: 0.0000e+00 - factorized_top_k/top_50_categorical_accuracy: 3.6621e-04 - factorized_top_k/top_100_categorical_accuracy: 0.0034 - loss: 34084.4688 - regularization_loss: 0.0000e+00 - total_loss: 34084.4688    WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.2553s vs `on_train_batch_end` time: 0.6103s). Check your callbacks.
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.2553s vs `on_train_batch_end` time: 0.6103s). Check your callbacks.
25/25 [==============================] - ETA: 0s - factorized_top_k: 0.0293 - factorized_top_k/top_1_categorical_accuracy: 6.0000e-05 - factorized_top_k/top_5_categorical_accuracy: 0.0014 - factorized_top_k/top_10_categorical_accuracy: 0.0047 - factorized_top_k/top_50_categorical_accuracy: 0.0424 - factorized_top_k/top_100_categorical_accuracy: 0.0978 - loss: 33915.1966 - regularization_loss: 0.0000e+00 - total_loss: 33915.1966Traceback (most recent call last):
  File "tfrs_quickstart.py", line 100, in <module>
    model.fit(ratings.batch(4096), epochs=3, callbacks=[tensorboard_callback])
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1137, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 412, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 2182, in on_epoch_end
    self._log_weights(epoch)
  File "/home/temp/.venv/tfrs/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 2233, in _log_weights
    weight_name = weight.name.replace(':', '_')
AttributeError: 'TrackableWeightHandler' object has no attribute 'name'

Should this work as-is with .fit()? Or do I need to make a custom implementation (as described at the top here)?

I did notice the tfrs.models.Model uses tf.GradientTape(), and the Tensorboard docs have different directions for trainers that use that method.

I also attached the .py file representation of the quickstart for reference (where the lines are added from above)

tfrs_quickstart.py.txt

Model implementation details leak outside the model

Since the base Model class doesn't implement (at least a placeholder for) the call() method, other parts of the library can't rely on that method being defined. As a result, the details of how to compute predictions from a model show up in multiple other places:

The example model evaluation
The retrieval task
The topk layers (1, 2)
The topk metrics

There are two flavors represented among those examples:

Those with one-to-one relationships between query embeddings and candidate embeddings (like the topK metrics) that are implemented with element-wise products
Those with many-to-many relationships between query embeddings and candidate embeddings (like the retrieval task) that are implemented with matmuls

In order to consolidate the prediction code and abstract the way predictions are computed, it seems like these two modes could either be captured in a single Model method with a flag that selects between element-wise pairs and batch predictions, or represented as two Model methods.

Thoughts?

Sequential Recommendation

There are a few independent implementations for sequential recommenders (sequence-aware, session-based, and so on) such as slientGe's repository. How do we perform such recommendations using this library?

Every example in the guides focuses on solving the matrix-completion problem, while this is not the objective in a sequential recommendation.

The dtype of the source tensor must be floating (e.g. tf.float32) when calling ...

Hi, fantastic libary!

I would like to ask, especially when it comes to "visually output optimization", if a dtype switch like tf.cast(dtype) can avoid the Warning Outputs or would the counter variable gets updated during GradientTape, even when trainable=None?

Idea(s):

In: layers/factorized_top_k.py

self._counter = self.add_weight("counter", dtype=tf.float32, trainable=None)      
...
def enumerate_rows(batch: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
  ....
  end_counter = self._counter.assign_add(tf.dtypes.cast(tf.shape(batch)[0], tf.float32))
  return tf.dtypes.cast(tf.range(starting_counter, end_counter), tf.int32), batch

Looking forward faszinated,
Flipper

save model error

error info :

TypeError: Invalid input_signature [{'user_id': [None], 'user_occupation_label': [TensorSpec(shape=(), dtype=tf.int32, name=None)]}]; input_signature must be a possibly nested sequence of TensorSpec objects.

the code in here, how to fix it?

from typing import Dict, Text
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
from pprint import pprint
import os

os.environ['CUDA_VISIBLE_DEVICES'] = "0"
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(device=gpus[0], enable=True)

# os.environ['CUDA_VISIBLE_DEVICES'] = "-1"
ratings = tfds.load('movielens/100k-ratings', split="train")
# Features of all the available movies.
movies = tfds.load('movielens/100k-movies', split="train")

for i in ratings.take(1):
    pprint(i)
for i in movies.take(1):
    pprint(i)

ratings = ratings.map(lambda x: {
    "movie_id": x["movie_id"],
    "user_id": x["user_id"],
    "user_occupation_label": x["user_occupation_label"]
})
movies = movies.map(lambda x: {
    "movie_id": x["movie_id"],
    "movie_title": x["movie_title"]
})


tf.random.set_seed(2021)
shuffled = ratings.shuffle(100000, seed=2021, reshuffle_each_iteration=False)
num_sample = len(shuffled)

train = shuffled.take(int(num_sample * 0.8))
test = shuffled.skip(int(num_sample * 0.8)).take(int(num_sample * 0.2))

movie_ids = movies.batch(1000).map(lambda x: x["movie_id"])
user_ids = ratings.batch(1000000).map(lambda x: x["user_id"])

unique_movie_ids = np.unique(np.concatenate(list(movie_ids)))
unique_user_ids = np.unique(np.concatenate(list(user_ids)))

unique_movie_id_strings = [id.decode('utf-8') for id in unique_movie_ids]
unique_user_id_strings = [id.decode('utf-8') for id in unique_user_ids]



hidden_dimension = 128
embedding_dimension = 64

class UserModel(tf.keras.Model):
    def __init__(self, embedding_dimension, **kwargs):
        super(UserModel, self).__init__(**kwargs)
        user_features = [
            tf.feature_column.embedding_column(
                tf.feature_column.categorical_column_with_vocabulary_list('user_id', unique_user_id_strings),
                hidden_dimension),
            tf.feature_column.embedding_column(
                tf.feature_column.categorical_column_with_identity("user_occupation_label", 30),
                hidden_dimension
            )
        ]
        self.embedding_layer = tf.keras.layers.DenseFeatures(user_features, name="user_embedding")
        self.dense1 = tf.keras.layers.Dense(hidden_dimension)
        self.dense2 = tf.keras.layers.Dense(embedding_dimension)

    def call(self, inputs):
        x = self.embedding_layer(inputs)
        x= self.dense1(x)
        x= self.dense2(x)
        return x


class MovieModel(tf.keras.Model):
    def __init__(self, embedding_dimension, **kwargs):
        super(MovieModel, self).__init__(**kwargs)
        movie_features = [
            tf.feature_column.embedding_column(
                tf.feature_column.categorical_column_with_vocabulary_list('movie_id', unique_movie_id_strings),
                hidden_dimension)

        ]

        self.embedding_layer = tf.keras.layers.DenseFeatures(movie_features, name="movie_embedding")
        self.dense = tf.keras.layers.Dense(embedding_dimension)

    def call(self, inputs):
        x = self.embedding_layer(inputs)
        x =self.dense(x)

        return x


class MovielensModel(tfrs.Model):
    def __init__(self, **kwargs):
        super(MovielensModel, self).__init__(**kwargs)
        self.user_model = UserModel(embedding_dimension)
        self.movie_model = MovieModel(embedding_dimension)
        self.task = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=movies.batch(128).map(lambda x: {"movie_id": x["movie_id"]}).map(self.movie_model))
        )

    def compute_loss(self, features, training=False) -> tf.Tensor:
        user_embeddings = self.user_model( {"user_id": features["user_id"], "user_occupation_label": features["user_occupation_label"]})
        # user_embeddings = self.user_model( {"user_id": features["user_id"]})
        movie_embeddings = self.movie_model({"movie_id": features["movie_id"]})
        return self.task(user_embeddings, movie_embeddings)


model = MovielensModel()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
model.compile(optimizer=optimizer)
model.fit(train.batch(8192), epochs=1)





index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)

index.index(movies.batch(100).map(model.movie_model), movies.map(lambda x:x['movie_title']))
# index.index(movies.batch(100).map(model.movie_model), movies.map(lambda x:x['movie_id']))
_,titles = index({"user_id":["32"],"user_occupation_label":[20]})

print(f"Top 3 recommendations for user 32: {titles[0, :3]}")

index.save('./model')

Gradients do not exist for variables

I'm trying to train a model that includes candidate side features (e.g. for a given fashion style I've tried to include it's shape. This is a google collab file in which the example can be run.

This is a high level summary of how my input data is configured:

interactions = interactions.map(lambda x: {
    "user_id": x["user_id"],
    "style": x["style"],
    "high_level_shape": x["high_level_shape"]
})
styles = styles.map(lambda x: {
    "style": x["style"],
    "high_level_shape": x["high_level_shape"]
})

However when I run the two tower model I get the following TF warning which appears concerning:

WARNING:tensorflow:Gradients do not exist for variables ['embedding_1/embeddings:0', 'embedding_2/embeddings:0', 'sequential_3/dense/kernel:0', 'sequential_3/dense/bias:0'] when minimizing the loss.

All of the tutorial examples provided have only one candidate feature (e.g. movie_title) so I'm not sure if the approach I'm taking is exactly correct or if there is a problem with my model configuration?

The other warning message I get on most of runs is from the way I pass in my dataset to the keras fit method.
`
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: {'style': <tf.Tensor 'IteratorGetNext:1' shape=(None,) dtype=string>, 'high_level_shape': <tf.Tensor 'IteratorGetNext:0' shape=(None,) dtype=string>}

Consider rewriting this model with the Functional API.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:543: UserWarning: Input dict contained keys ['style', 'high_level_shape'] which did not match any model input. They will be ignored by the model.
[n for n in tensors.keys() if n not in ref_input_names])
`

From looking at the keras documentation it appears passing in a TF dataset as a dictionary of multiple key value pairs gets picked up as a single X (input dataset) with no target -> ({"x0": x0, "x1": x1, "x3":x3}. From my understanding of how TF reccomenders work this is what we want to do as it is assumed that every interaction is positive and as such we don't explicity pass in a target value. Is this warning expected and does it have anything to do with the other gradients warning above? I noticed that the tutorial examples provided also have similar warnings that don't seem problematic so this may be totally unrelated.

Apologies for the long winded query but have tried to provide as much context as possible. Thanks for any help or guidance on the above.

Online learning with tensorflow recommenders and serving

How do I serve a model with online learning, as in it learns in realtime with new input and outputs up to date recommendations for the user? And is that best practice, if not what is?

This is not an issue, just a question I've had for a while. For context, I am talking about a movie recommender but should apply in any context I think.

Confusion in featurization tutorial

Hi, I was a little confused in the section titled, "Defining the embeddings", in the featurization tutorial. There was a number of questions/issues/confusing things here.

It is written # Let's use the hashing approach. despite the input_dim using the .vocab_size() of the string lookup and the sequential model it uses movie_title_lookup, the variable with the string lookup (movie_title_hashing is the variable for the hashing approach).
user_id_lookup is a string lookup but input_dim is assigned num_hashing_bins.

Sorry, but am I missing something or is it a mistake. I would appreciate any clarification.

PS. I thought I would ask some questions while I'm at it: would you recommend using Hashing() or StringLookup() for the user and/or movie embeddings. If hashing what should be a good num_hashing_bins for a dataset like 25m in size. Is there a maximum number of bins I can use?

Thanks for any help in advance!

Calculating all movie-item recco scores for the MF quick-start tutorial

I’m currently working my way through the quickstart matrix factorization retrieval tutorial .

The example outlines an approach for narrowing down candidates using brute force search. But for the example use case I’m interested in I would like to calculate the scores for the full suite of movie-users pairs at once and store them for serving. To be confident that I’m doing this correctly my goal was to correctly predict the same top 3 recommendations for user 42 as given in the tutorial [b'Rent-a-Kid (1995)' b'House Arrest (1996)' b'Mirage (1995)']

The approach I’m taking (for quality check) is:

movie_embeddings = movie_model.layers[1].get_weights()[0]
users_embeddings = user_model.layers[1].get_weights()[0]

scores = tf.linalg.matmul(
   users_embeddings, movie_embeddings, transpose_b=True)
 
top_movie_indices = tf.argsort(scores[42], direction='DESCENDING')[:3]
 
for indice in top_movie_indices:
   indice_value = indice.numpy()
 
   **#option a -> get title from indice & vocabulary lookup **
   movie_title = movie_titles_vocabulary.get_vocabulary()[indice_value]
 
   **#option b -> get title from indice & movies_ids input into embedding layer**
   movie_titles = movie_model.layers[1].get_weights()[0]
   movie_title = movie_titles[indice_value]

Neither option above (a) or (b) gives me results which match for user 42 so I’m a little confused.
Should the top 3 results from both methodologies match? I may be mapping back to the movie titles via the indices incorrectly? Am I missing a bias term in my score calculations/

Apologies for what may be a simple question but it I hope it might eventually help with documentation efforts down the line.

Updating embedding input sizes in document tutorials

Dear all, thanks for the great project and tutorial examples! It helped me figure out how tensorflow-recommender works. To be of little help, I just created pull requests #85 about the input dimension of the embedding matrix in the examples in the docs.

For timestamp embedding, I fixed tf.keras.layers.Embedding(len(timestamp_buckets) + 2, 32) to tf.keras.layers.Embedding(len(timestamp_buckets) + 1, 32). The index ranges from 0 to the size of the timestamp_buckets. That is, if the number of buckets is 1,000, we only need 1000 + 1 indexes, not 1000 + 2.

For user embedding, I fixed tf.keras.layers.Embedding(len(unique_user_ids) + 2, embedding_dimension) to tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension), as mask embedding is never used in two-tower model. I set mask_token argument of StringLookup to None to ignore mask embedding. Same as item embedding.

Maybe it does not have anything to do with the tutorial itself, as the original version also works. If so, please excuse me (I am just trying to be helpful and make the embedding size more clearly).

Excluding previously seen items from test recommendations

In other libraries (e.g. Lightfm) it's common to have a facility to exclude previously seen items from test\eval reccomendations by passing in a set of train_interactions into our evaluation method (see lightfm approach). This mimics what we would do in production and as such gives us a more accurate read evaluation results in our offline training. Without this ability we would find it hard to know if we are overfitting on our training data as the eval results become a lot less meaningful when your real performance is being masked.

I noticed in the tutorials provided it explicitly states that this is approach has not been adopted for TFRS and that we should "appropriately specify models to learn this behaviour automatically".

What does an appropriately specified model in this instance look like? (e.g. do we need to capture sequence behaviour via RNNS (e.g. like in this paper)? Should including timestamp data as context info in our queries capture this behaviour?)..... is there an intention to provide more details on this?
Is it envisaged that a provision will be made to exclude previously interacted with items (or is this a decision that's dictated by the modelling approach?)

How to handle items that can be purchased/used just once?

I'm training the model using purchase data for a certain period and validating it on what the same users purchased in the future.

How should I set up the model in case the items can be purchased just once? I don't want the model to recommend items present in the training set but only choose among the ones in the test set (which would be the ones available at that time in the future).

I was thinking of setting the candidates_dataset in the FactorizedTopK metric to use just the candidates from the test group but how would the model then be able to compute the loss on the training set?

self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=candidates_dataset.batch(8192).map(self.candidate_model).cache()
        ))

The candidates_dataset is defined as

candidates = tf.data.Dataset.from_tensor_slices(
              dict(train_df[candidate_features] \
                   .append(test_df[candidate_features]) \
                   .drop_duplicates())) \
              .cache(tempfile.NamedTemporaryFile().name)

EDIT:
I think the solution might be defining different train_step and test_step using 2 different tasks that use different candidates. This is my attempt:

# using input embedding layer for candidate model
class RetrievalModel(tfrs.models.Model):

  def __init__(self, layer_sizes, train_dataset, candidates_dataset,
               max_tokens=100_000, embed_dim=32):
    super().__init__()

    self.embed_dim = embed_dim
    self.query_model = QueryModel(layer_sizes, train_dataset, max_tokens=max_tokens, embed_dim=embed_dim)
    self.candidate_model = CandidateModel(layer_sizes, self.query_model)
    self.training_task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=train.batch(8192).map(self.candidate_model).cache()
        ))
    self.test_task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=test.batch(8192).map(self.candidate_model).cache()
        ))
    
  # def compute_loss(self, features, training=False):
  #   return self.task(self.query_model(features), 
  #                    self.candidate_model(features), 
  #                    compute_metrics=not training)
    
  def train_step(self, features) -> tf.Tensor:

    # Set up a gradient tape to record gradients.
    with tf.GradientTape() as tape:

      # Loss computation.
      query_vectors = self.query_model(features)
      candidate_vectors = self.candidate_model(features)
      loss = self.training_task(query_vectors, candidate_vectors, compute_metrics=False)

      # Handle regularization losses as well.
      regularization_loss = sum(self.losses)

      total_loss = loss + regularization_loss

    gradients = tape.gradient(total_loss, self.trainable_variables)
    self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

    metrics = {metric.name: metric.result() for metric in self.metrics}
    metrics["loss"] = loss
    metrics["regularization_loss"] = regularization_loss
    metrics["total_loss"] = total_loss

    return metrics

  def test_step(self, features) -> tf.Tensor:

    # Loss computation.
    query_vectors = self.query_model(features)
    candidate_vectors = self.candidate_model(features)
    loss = self.test_task(query_vectors, candidate_vectors, compute_metrics=True)

    # Handle regularization losses as well.
    regularization_loss = sum(self.losses)

    total_loss = loss + regularization_loss

    metrics = {metric.name: metric.result() for metric in self.metrics}
    metrics["loss"] = loss
    metrics["regularization_loss"] = regularization_loss
    metrics["total_loss"] = total_loss

    return metrics

Shape must be rank 2 but is rank 1 during model.fit

I'm applying this basic tutorial to my dataset but I'm getting a shape error when fitting the model. What could be causing the problem? The shape of the user_embeddings and positive_movie_embeddings match like in the tutorial.

Thanks for your help 👍

Full error here

Epoch 1/3

ValueError Traceback (most recent call last)
in ()
5 cached_test = test_ds.batch(4096).cache()
6
----> 7 model.fit(train_ds, epochs=3)

10 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
971 except Exception as e: # pylint:disable=broad-except
972 if hasattr(e, "ag_error_metadata"):
--> 973 raise e.ag_error_metadata.to_exception(e)
974 else:
975 raise

ValueError: in user code:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:806 train_function  *
    return step_function(self, iterator)
/usr/local/lib/python3.6/dist-packages/tensorflow_recommenders/tasks/retrieval.py:126 call  *
    scores = tf.linalg.matmul(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper  **
    return target(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3255 matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py:5642 mat_mul
    name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:744 _apply_op_helper
    attrs=attr_protos, op_def=op_def)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py:593 _create_op_internal
    compute_device)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:3485 _create_op_internal
    op_def=op_def)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:1975 __init__
    control_input_ops, op_def)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:1815 _create_c_op
    raise ValueError(str(e))

ValueError: Shape must be rank 2 but is rank 1 for '{{node retrieval_3/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true](sequential_6/embedding_6/embedding_lookup/Identity_1, sequential_7/embedding_7/embedding_lookup/Identity_1)' with input shapes: [32], [32].

How to calculate mean loss instead of sum loss in retrieval task?

When training the retrieval model, the loss will drop suddenly at the end step of each epoch in the tensorboard. In my opinion, this is because the last data batch is smaller than the normal batch size. And the loss is the sum loss in the retrieval task. (tfrs.tasks.Retrieval use tf.keras.losses.CategoricalCrossentropy as default. )
So how to calculate mean loss instead of sum loss in retrieval task?
Thanks in advance!

Cold start reccomendations for unknown users (confirming approach)

I'm trying to implement a retrieval model (e.g like the context_features_tutorial) that will handle cold start user predictions for users who are unknown to our model:

Example query structure:

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "timestamp": x["timestamp"],
    "age": x["age"],
})

In the cold start scenario we will often have certain data missing (e.g. user_id, age).

From the tutorials it would appear that the following should work:

Pass the unknown user_id to the index -> the StringLookup layer will determine that this user is not in our vocabulary and will assign it to the OOV / UNK embedding. My assumption is this will also work the same for any other unknown categorical vars handled via StringLookUp.
For missing numerical features -> it would appear from (issue 108) that the discretization layer cannot handle missing or None values. The suggested approach in issue 108 is to first use IntegerLookup to encode missing values and then use the bin discretization layer afterwards. One other approach we're considering is just replacing all missing numerical features wiht their respective means before feeding them to the model (e.g. mean age, mean_time_watched)

My question are:

Is this the correct way to approach making predictions for cold start users?
Does the 'OOV/UNK' treatment work the same way for user_ids not in the vocab (e.g. '1013413') and missing user ids ('')

Thanks for any guidance on the above

Tutorial using MirroredStrategy Failed

Hi, thanks for your great work. I'm learning your basic_retrieval tutorial.

When I try to use multi-gpu strategy, I found your model failed.

Code to reproduce the error:

num_gpus = 4
devices = ["device:GPU:%d" % i for i in range(num_gpus)]
strategy = tf.distribute.MirroredStrategy(devices)

class MovielensModel(tfrs.Model):

    def __init__(self, user_model, movie_model):
        super().__init__()
        self.movie_model: tf.keras.Model = movie_model
        self.user_model: tf.keras.Model = user_model
        self.task: tf.keras.layers.Layer = task
            
    def call(self, features):
        user_embeddings = self.user_model(features["user_id"])
        # And pick out the movie features and pass them into the movie model,
        # getting embeddings back.
        positive_movie_embeddings = self.movie_model(features["movie_title"])
        return tf.keras.layers.Dot(axes=1)([user_embeddings, positive_movie_embeddings])

    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
        # We pick out the user features and pass them into the user model.
        user_embeddings = self.user_model(features["user_id"])
        # And pick out the movie features and pass them into the movie model,
        # getting embeddings back.
        positive_movie_embeddings = self.movie_model(features["movie_title"])

        # The task computes the loss and the metrics.
        return self.task(user_embeddings, positive_movie_embeddings, compute_metrics=(not training))

with strategy.scope():
    embedding_dimension = 32
    user_model = tf.keras.Sequential([
      tf.keras.layers.experimental.preprocessing.StringLookup(
          vocabulary=unique_user_ids, mask_token=None),
      # We add an additional embedding to account for unknown tokens.
      tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
    ])
    movie_model = tf.keras.Sequential([
      tf.keras.layers.experimental.preprocessing.StringLookup(
          vocabulary=unique_movie_titles, mask_token=None),
      tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)
    ])
    metrics = tfrs.metrics.FactorizedTopK(
      candidates=movies.batch(128).map(movie_model)
    )
    task = tfrs.tasks.Retrieval(
      metrics=metrics
    )
    model = MovielensModel(user_model, movie_model)
    model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))
    
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()
model.fit(cached_train, epochs=3)

It raises Exception as follows:

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Epoch 1/3
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Efficient allreduce is not supported for 2 IndexedSlices
WARNING:tensorflow:Efficient allreduce is not supported for 2 IndexedSlices
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['counter:0'] when minimizing the loss.
WARNING:tensorflow:Efficient allreduce is not supported for 2 IndexedSlices
WARNING:tensorflow:Efficient allreduce is not supported for 2 IndexedSlices
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:GPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3').
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-8-68c551a86e37> in <module>
     52 cached_train = train.shuffle(100_000).batch(8192).cache()
     53 cached_test = test.batch(4096).cache()
---> 54 model.fit(cached_train, epochs=3)

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
    106   def _method_wrapper(self, *args, **kwargs):
    107     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
--> 108       return method(self, *args, **kwargs)
    109 
    110     # Running inside `run_distribute_coordinator` already.

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1096                 batch_size=batch_size):
   1097               callbacks.on_train_batch_begin(step)
-> 1098               tmp_logs = train_function(iterator)
   1099               if data_handler.should_sync:
   1100                 context.async_wait()

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    778       else:
    779         compiler = "nonXla"
--> 780         result = self._call(*args, **kwds)
    781 
    782       new_tracing_count = self._get_tracing_count()

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    838         # Lifting succeeded, so variables are initialized and we can run the
    839         # stateless function.
--> 840         return self._stateless_fn(*args, **kwds)
    841     else:
    842       canon_args, canon_kwds = \

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   2827     with self._lock:
   2828       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2829     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   2830 
   2831   @property

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _filtered_call(self, args, kwargs, cancellation_manager)
   1846                            resource_variable_ops.BaseResourceVariable))],
   1847         captured_inputs=self.captured_inputs,
-> 1848         cancellation_manager=cancellation_manager)
   1849 
   1850   def _call_flat(self, args, captured_inputs, cancellation_manager=None):

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1922       # No tape is watching; skip to running the function.
   1923       return self._build_call_outputs(self._inference_function.call(
-> 1924           ctx, args, cancellation_manager=cancellation_manager))
   1925     forward_backward = self._select_forward_and_backward_functions(
   1926         args,

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
    548               inputs=args,
    549               attrs=attrs,
--> 550               ctx=ctx)
    551         else:
    552           outputs = execute.execute_with_cancellation(

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError: Cannot assign a device for operation sequential/embedding/embedding_lookup/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential/embedding/embedding_lookup/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1, /job:localhost/replica:0/task:0/device:XLA_GPU:2, /job:localhost/replica:0/task:0/device:XLA_GPU:3, /job:localhost/replica:0/task:0/device:GPU:0, /job:localhost/replica:0/task:0/device:GPU:1, /job:localhost/replica:0/task:0/device:GPU:2, /job:localhost/replica:0/task:0/device:GPU:3]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
GatherV2: GPU CPU XLA_CPU XLA_GPU 
Cast: GPU CPU XLA_CPU XLA_GPU 
Const: GPU CPU XLA_CPU XLA_GPU 
ResourceSparseApplyAdagradV2: CPU 
_Arg: GPU CPU XLA_CPU XLA_GPU 
ReadVariableOp: GPU CPU XLA_CPU XLA_GPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  sequential_embedding_embedding_lookup_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adagrad_adagrad_update_1_update_0_resourcesparseapplyadagradv2_accum (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/embedding/embedding_lookup/ReadVariableOp (ReadVariableOp) 
  sequential/embedding/embedding_lookup/axis (Const) 
  sequential/embedding/embedding_lookup (GatherV2) 
  gradient_tape/sequential/embedding/embedding_lookup/Shape (Const) 
  gradient_tape/sequential/embedding/embedding_lookup/Cast (Cast) 
  Adagrad/Adagrad/update_1/update_0/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0

	 [[{{node sequential/embedding/embedding_lookup/ReadVariableOp}}]] [Op:__inference_train_function_3642]

In addtion, I know keras for little time. When I try to print the structure of the model, it still fails. Can you provide any ideas on how to solve it?

model.summary()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-5f15418b3570> in <module>
----> 1 model.summary()

~/Softwares/anaconda3/envs/tf2.0/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in summary(self, line_length, positions, print_fn)
   2349     """
   2350     if not self.built:
-> 2351       raise ValueError('This model has not yet been built. '
   2352                        'Build the model first by calling `build()` or calling '
   2353                        '`fit()` with some data, or specify '

ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic build.

Save model in https://github.com/tensorflow/recommenders/blob/main/docs/examples/basic_retrieval.ipynb

Is it possible to save the entire model (not model.user_model) in https://github.com/tensorflow/recommenders/blob/main/docs/examples/basic_retrieval.ipynb? I tried model.save() and got the error below.

cannot be saved because the input shapes have not been set. Usually, input shapes are automatically determined from calling .fit() or .predict(). To manually set the shapes, call model.build(input_shape).

Thank you!

How to save ranking model as saved model?

I tried the retrieval model in the notebook: https://www.tensorflow.org/recommenders/examples/basic_retrieval
In the notebook, the user_model is saved to savedmodel as following:

model.user_model.save(path)

Then I tried to save the ranking model in the notebook: https://www.tensorflow.org/recommenders/examples/basic_ranking
as following:

model.ranking_model.save(path)

But the error occurs:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-5abadabf4a36> in <module>
----> 1 model.ranking_model.save('tfrs-ranking')

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in save(self, filepath, overwrite, include_optimizer, save_format, signatures, options)
   1977     """
   1978     save.save_model(self, filepath, overwrite, include_optimizer, save_format,
-> 1979                     signatures, options)
   1980 
   1981   def save_weights(self,

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py in save_model(model, filepath, overwrite, include_optimizer, save_format, signatures, options)
    132   else:
    133     saved_model_save.save(model, filepath, overwrite, include_optimizer,
--> 134                           signatures, options)
    135 
    136 

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/save.py in save(model, filepath, overwrite, include_optimizer, signatures, options)
     78     # we use the default replica context here.
     79     with distribution_strategy_context._get_default_replica_context():  # pylint: disable=protected-access
---> 80       save_lib.save(model, filepath, signatures, options)
     81 
     82   if not include_optimizer:

~/.local/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py in save(obj, export_dir, signatures, options)
    974 
    975   _, exported_graph, object_saver, asset_info = _build_meta_graph(
--> 976       obj, export_dir, signatures, options, meta_graph_def)
    977   saved_model.saved_model_schema_version = constants.SAVED_MODEL_SCHEMA_VERSION
    978 

~/.local/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py in _build_meta_graph(obj, export_dir, signatures, options, meta_graph_def)
   1045   if signatures is None:
   1046     signatures = signature_serialization.find_function_to_export(
-> 1047         checkpoint_graph_view)
   1048 
   1049   signatures, wrapped_functions = (

~/.local/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_serialization.py in find_function_to_export(saveable_view)
     73   # If the user did not specify signatures, check the root object for a function
     74   # that can be made into a signature.
---> 75   functions = saveable_view.list_functions(saveable_view.root)
     76   signature = functions.get(DEFAULT_SIGNATURE_ATTR, None)
     77   if signature is not None:

~/.local/lib/python3.6/site-packages/tensorflow/python/saved_model/save.py in list_functions(self, obj, extra_functions)
    143     if obj_functions is None:
    144       obj_functions = obj._list_functions_for_serialization(  # pylint: disable=protected-access
--> 145           self._serialization_cache)
    146       self._functions[obj] = obj_functions
    147     if extra_functions:

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py in _list_functions_for_serialization(self, serialization_cache)
   2588     self.predict_function = None
   2589     functions = super(
-> 2590         Model, self)._list_functions_for_serialization(serialization_cache)
   2591     self.train_function = train_function
   2592     self.test_function = test_function

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py in _list_functions_for_serialization(self, serialization_cache)
   3017   def _list_functions_for_serialization(self, serialization_cache):
   3018     return (self._trackable_saved_model_saver
-> 3019             .list_functions_for_serialization(serialization_cache))
   3020 
   3021   def __getstate__(self):

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/base_serialization.py in list_functions_for_serialization(self, serialization_cache)
     85         `ConcreteFunction`.
     86     """
---> 87     fns = self.functions_to_serialize(serialization_cache)
     88 
     89     # The parent AutoTrackable class saves all user-defined tf.functions, and

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/layer_serialization.py in functions_to_serialize(self, serialization_cache)
     77   def functions_to_serialize(self, serialization_cache):
     78     return (self._get_serialized_attributes(
---> 79         serialization_cache).functions_to_serialize)
     80 
     81   def _get_serialized_attributes(self, serialization_cache):

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/layer_serialization.py in _get_serialized_attributes(self, serialization_cache)
     93 
     94     object_dict, function_dict = self._get_serialized_attributes_internal(
---> 95         serialization_cache)
     96 
     97     serialized_attr.set_and_validate_objects(object_dict)

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/model_serialization.py in _get_serialized_attributes_internal(self, serialization_cache)
     49     # cache (i.e. this is the root level object).
     50     if len(serialization_cache[constants.KERAS_CACHE_KEY]) == 1:
---> 51       default_signature = save_impl.default_save_signature(self.obj)
     52 
     53     # Other than the default signature function, all other attributes match with

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/saved_model/save_impl.py in default_save_signature(layer)
    203   original_losses = _reset_layer_losses(layer)
    204   fn = saving_utils.trace_model_call(layer)
--> 205   fn.get_concrete_function()
    206   _restore_layer_losses(original_losses)
    207   return fn

~/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py in get_concrete_function(self, *args, **kwargs)
   1165       ValueError: if this object has not yet been called on concrete values.
   1166     """
-> 1167     concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
   1168     concrete._garbage_collector.release()  # pylint: disable=protected-access
   1169     return concrete

~/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py in _get_concrete_function_garbage_collected(self, *args, **kwargs)
   1071       if self._stateful_fn is None:
   1072         initializers = []
-> 1073         self._initialize(args, kwargs, add_initializers_to=initializers)
   1074         self._initialize_uninitialized_variables(initializers)
   1075 

~/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
    695     self._concrete_stateful_fn = (
    696         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
--> 697             *args, **kwds))
    698 
    699     def invalid_creator_scope(*unused_args, **unused_kwds):

~/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   2853       args, kwargs = None, None
   2854     with self._lock:
-> 2855       graph_function, _, _ = self._maybe_define_function(args, kwargs)
   2856     return graph_function
   2857 

~/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   3211 
   3212       self._function_cache.missed.add(call_context_key)
-> 3213       graph_function = self._create_graph_function(args, kwargs)
   3214       self._function_cache.primary[cache_key] = graph_function
   3215       return graph_function, args, kwargs

~/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3073             arg_names=arg_names,
   3074             override_flat_arg_shapes=override_flat_arg_shapes,
-> 3075             capture_by_value=self._capture_by_value),
   3076         self._function_attributes,
   3077         function_spec=self.function_spec,

~/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    984         _, original_func = tf_decorator.unwrap(python_func)
    985 
--> 986       func_outputs = python_func(*func_args, **func_kwargs)
    987 
    988       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

~/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
    598         # __wrapped__ allows AutoGraph to swap in a converted function. We give
    599         # the function a weak reference to itself to avoid a reference cycle.
--> 600         return weak_wrapped_fn().__wrapped__(*args, **kwds)
    601     weak_wrapped_fn = weakref.ref(wrapped_fn)
    602 

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/saving_utils.py in _wrapped_model(*args)
    132     with base_layer_utils.call_context().enter(
    133         model, inputs=inputs, build_graph=False, training=False, saving=True):
--> 134       outputs = model(inputs, training=False)
    135 
    136     # Outputs always has to be a flat dict.

~/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    983 
    984         with ops.enable_auto_cast_variables(self._compute_dtype_object):
--> 985           outputs = call_fn(inputs, *args, **kwargs)
    986 
    987         if self._activity_regularizer:

~/.local/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs)
    300   def wrapper(*args, **kwargs):
    301     with ag_ctx.ControlStatusCtx(status=ag_ctx.Status.DISABLED):
--> 302       return func(*args, **kwargs)
    303 
    304   if inspect.isfunction(func) or inspect.ismethod(func):

TypeError: call() missing 1 required positional argument: 'movie_title'

How can I save a ranking savedmodel which is to be used in tf serving?
Thanks in advance!

validation metrics (like 'val_factorized_top_k/top_100_categorical_accuracy') does not appear in the history dictionary

Even running the following notebook has some errors:
https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/deep_recommenders.ipynb

KeyError Traceback (most recent call last)
in ()
11 verbose=0)
12
---> 13 accuracy = one_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
14 print(f"Top-100 accuracy: {accuracy:.2f}.")

KeyError: 'val_factorized_top_k/top_100_categorical_accuracy'

FactorizedTopK candidates need to be unique?

I was just wondering if the candidates= value in tfrs.metrics.FactorizedTopK as a metric for the tfrs.tasks.Retrieval have to be unique movies. As in the dataset can't have duplicates?

split train/test by users

I have run this examples https://tensorflow.google.cn/recommenders/examples/multitask?hl=en
at the data splitting part, it using a random splitting method, does tfrs provide a method to partition dataset user-by-user. For example, random select 20% data for each user as the test set.

How to deal with multiple retrieval tasks?

in the multi-task tutorial, it is said that "We can expect better results when we can transfer knowledge from a data-abundant task (such as clicks) to a closely related data-sparse task (such as purchases)". So, I assume that there will be two retrieval tasks on different type of data.

For example, if I have transaction data with columns (type, user, item):
click, user A, item X
click, user A, item Y
click, user A, item Z
purchase, user A, item Z
click, user B, item X
click, user B, item Y
purchase, user B, item X
etc

How to separate these two tasks (view and purchase)?

and how to set the weights for each task?
for example, final_loss = 1 * purchase_retrieval_loss + 0.1 * view_retrieval_loss