awarebayes / recnn Goto Github PK

Reinforced Recommendation toolkit built around pytorch 1.7

License: Apache License 2.0

Python 100.00%

pytorch reinforcement-learning news-recommendation pytorch-rl recommender-system recommendation-system library toolkit

recnn's Introduction

This is my school project. It focuses on Reinforcement Learning for personalized news recommendation. The main distinction is that it tries to solve online off-policy learning with dynamically generated item embeddings. I want to create a library with SOTA algorithms for reinforcement learning recommendation, providing the level of abstraction you like.

recnn.readthedocs.io

📊 The features can be summed up to

Abstract as you decide: you can import the entire algorithm (say DDPG) and tell it to ddpg.learn(batch), you can import networks and the learning function separately, create a custom loader for your task, or can define everything by yourself.
Examples do not contain any of the junk code or workarounds: pure model definition and the algorithm itself in one file. I wrote a couple of articles explaining how it functions.
The learning is built around sequential or frame environment that supports ML20M and like. Seq and Frame determine the length type of sequential data, seq is fully sequential dynamic size (WIP), while the frame is just a static frame.
State Representation module with various methods. For sequential state representation, you can use LSTM/RNN/GRU (WIP)
Parallel data loading with Modin (Dask / Ray) and caching
Pytorch 1.7 support with Tensorboard visualization.
New datasets will be added in the future.

📚 Medium Articles

The repo consists of two parts: the library (./recnn), and the playground (./examples) where I explain how to work with certain things.

Pretty much what you need to get started with this library if you know recommenders but don't know much about reinforcement learning:

Top-K Off-Policy Correction for a REINFORCE Recommender System:

Algorithms that are/will be added

Algorithm	Paper	Code
Deep Q Learning (PoC)	https://arxiv.org/abs/1312.5602	examples/0. Embeddings/ 1.DQN
Deep Deterministic Policy Gradients	https://arxiv.org/abs/1509.02971	examples/1.Vanilla RL/DDPG
Twin Delayed DDPG (TD3)	https://arxiv.org/abs/1802.09477	examples/1.Vanilla RL/TD3
Soft Actor-Critic	https://arxiv.org/abs/1801.01290	examples/1.Vanilla RL/SAC
Batch Constrained Q-Learning	https://arxiv.org/abs/1812.02900	examples/99.To be released/BCQ
REINFORCE Top-K Off-Policy Correction	https://arxiv.org/abs/1812.02353	examples/2. REINFORCE TopK

‍Repos I used code from

Sfujim's BCQ (not implemented yet)
Higgsfield's RL Adventure 2 (great inspiration)

🤔 What is this

This is my school project. It focuses on Reinforcement Learning for personalized news recommendation. The main distinction is that it tries to solve online off-policy learning with dynamically generated item embeddings. Also, there is no exploration, since we are working with a dataset. In the example section, I use Google's BERT on the ML20M dataset to extract contextual information from the movie description to form the latent vector representations. Later, you can use the same transformation on new, previously unseen items (hence, the embeddings are dynamically generated). If you don't want to bother with embeddings pipeline, I have a DQN embeddings generator as a proof of concept.

✋ Getting Started

p.s. Image is clickable. here is direct link:

To learn more about recnn, read the docs: recnn.readthedocs.io

⚙️ Installing

pip install git+git://github.com/awarebayes/RecNN.git

PyPi is on its way...

🚀 Try demo

I built a Streamlit demo to showcase its features. It has 'recommend me a movie' feature! Note how the score changes when you rate the movies. When you start and the movies aren't rated (5/10 by default) the score is about ~40 (euc), but as you rate them it drops to <10, indicating more personalized and precise predictions. You can also test diversity, check out the correlation of recommendations, pairwise distances, and pinpoint accuracy.

Run it:

git clone [email protected]:awarebayes/RecNN.git 
cd RecNN && streamlit run examples/streamlit_demo.py

Docker image is available here

📁 Downloads

📁 Download the Models

📄 Citing

If you find RecNN useful for an academic publication, then please use the following BibTeX to cite it:

@misc{RecNN,
  author = {M Scherbina},
  title = {RecNN: RL Recommendation with PyTorch},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/awarebayes/RecNN}},
}

recnn's People

Contributors

Stargazers

Watchers

Forkers

lkyllx0 truongthanh96 yerinmin saleh-hassan randyhrlv elvirasun28 gavinljj leavelove dsp6414 sethips davidjiangt q710245300 realcodebase miracle24 awesome-archive trendingtechnology allensmile rotcx vargnatt kiminh bmpcc6k hninthant gordinmitya pauseman fdoperezi robot-ai-machinelearning aishgrt1 cmi-simon joseph-chan felixpun seeker1943 saurabh3949 meelement davidhuang123 aprilxiaoyanliu shashwatwork yijiezh alaizaz tongluocq dotapetro nazifberat ericwang915 dantodor stchau4work gouterz chenaddsix elnaaz masinazarian edsonavelar sanghyeon16 fptk7 tuanlase02874 hw810 arita37 zining-wang qianrenjian yang0110 lauracrln hebowei2000 xrosliang govindchennu yueyedeai reginafelicia yonghangzhou wgcn96 lxy-z mzhuang1 zahra-rou sarshaw duanchao zhangqianjin lhz-97 wacoder demonbibi zjj-cathy xinruliu zhao-tun eru1206 betsyhj nancochow shixiaoyu0216 whonor stjordanis akehoe-hs brian1203-zz isaac009 greend93 shubhampachori12110095 pkulwj1994 fanyuzeng phdimplementation cleancoindev kpn3569 jasondarkblue lpworld savagelob 10points yongjaenala dxhguangz gg-big-org

recnn's Issues

How is ml20_pca128.pkl embedding generated and how to generate the similar embeddings for ml_1m and ml_100k datasets ?

Question about the weight for correction in the importance sampling

Hello. Thanks for your great work, from which I learned a lot about reinforcement learning. I am confused about the computation of the correction weight in the importance sampling.

According to the paper "Top-K Off-Policy Correction for a REINFORCE Recommender System", the correction weight is , in which, I think, is an action sampled from the behavior policy, i.e. , and thus the reward of a sequence is corrected by dividing the likelihood of action given in the updated policy. i.e. by the likelihood of the same action given in . Noted will be input into both and . However, in your implementation, I found that actions are not the same for and , seeing the function "pi_beta_sample" in here.

Am I wrong about this?

Thanks!

Getting error when trying to run the given sample. AttributeError: 'Series' object has no attribute 'progress_apply'

Hi,

I can't test the sample on my machine. This line
env = recnn.data.env.FrameEnv('ml20_pca128.pkl','ml-20m/ratings.csv')
Gives the following error stack:

Traceback (most recent call last):
  File "./main.py", line 4, in <module>
    env = recnn.data.env.FrameEnv('ml20_pca128.pkl','ml-20m/ratings.csv')
  File "./venv/lib64/python3.8/site-packages/recnn/data/env.py", line 143, in __init__
    super(FrameEnv, self).__init__(embeddings, ratings, min_seq_size=frame_size+1, *args, **kwargs)
  File "./venv/lib64/python3.8/site-packages/recnn/data/env.py", line 109, in __init__
    self.prepare_dataset(df=self.ratings, key_to_id=self.key_to_id,
  File "./venv/lib64/python3.8/site-packages/recnn/data/dataset_functions.py", line 52, in prepare_dataset
    df['rating'] = df['rating'].progress_apply(lambda i: 2 * (i - 2.5))
  File "./venv/lib64/python3.8/site-packages/pandas/core/generic.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'progress_apply'

Any help would be appreciated.

How to make recommendation for a specific user ?

As the title!
I have gone through the docs, but don't know how to do that, or it can even be executed ?
I have just jumped into reinforcement learning based recommendations since yesterday, so it can be a silly question. I thought it would work the same as other recommender system algorithms, but I got confused in Recommending part with Actor and Critic.
Thanks!

TypeError: 'module' object is not callable

when I run example on https://recnn.readthedocs.io/en/latest/examples/your_data.html#writing-custom-preprocessing-function. Error in "for batch in tqdm(env.train_dataloader)"

Where can I locate mekd.pkl you're using in a streamlit demo?

Hey @awarebayes, amazing project! Trying to run your streamlit demo, but can't locate mekd.pkl. Where can it be downloaded from? Thanks!

arXiv abstracts recommender based on specific user preferences

Hi @awarebayes ,

Hope you are all well !

I was wondering if RecNN can be used for recommending papers from the full arXiv dataset (1.7 abtsracts).

More precisely, I would like to use the categories or authors attributes for setting preferences for recommendation.

To download:

wget -nc https://paper2code.com/public/arxiv-metadata-oai-weaviate.tar.gz
tar xvf arxiv-metadata-oai-weaviate.json.tar.gz

Excerpt:

{
  "authors": [
    "Maxim A. Yurkin",
    "Valeri P. Maltsev",
    "Alfons G. Hoekstra"
  ],
  "abstract": "We performed a rigorous theoretical convergence analysis of the discrete dipole approximation (DDA). We prove that errors in any measured quantity are bounded by a sum of a linear and quadratic term in the size of a dipole d, when the latter is in the range of DDA applicability. Moreover, the linear term is significantly smaller for cubically than for non-cubically shaped scatterers. Therefore, for small d errors for cubically shaped particles are much smaller than for non-cubically shaped. The relative importance of the linear term decreases with increasing size, hence convergence of DDA for large enough scatterers is quadratic in the common range of d. Extensive numerical simulations were carried out for a wide range of d. Finally we discuss a number of new developments in DDA and their consequences for convergence.",
  "categories": [
    "Optics",
    "Computational Physics"
  ],
  "comments": "23 pages, 5 figures",
  "doi": "10.1364/JOSAA.23.002578",
  "id": "0704.0033",
  "journal-ref": "J.Opt.Soc.Am.A 23(10): 2578-2591 (2006)",
  "report-no": "",
  "submitter": "Maxim A. Yurkin",
  "title": "Convergence of the discrete dipole approximation. I. Theoretical  analysis",
  "versions": [
    "v1"
  ]
}

Questions:

Is it possible to create such recommender with RecNN ?
Do you provide a restful api with RecNN ? dockerized ? ^^ :-)

Thanks for any insights or inputs on that.

Cheers,
X

Hi , i am getting RuntimeError: Could not infer dtype of numpy.int64 in DDPG . I am not able to fix this How to solve this ? Thanks

1 for epoch in range(n_epochs):
----> 2 for batch in tqdm(env.train_dataloader):
3 loss = ddpg_update(batch, params, step=step)
4 plotter.log_losses(loss)
5 step += 1

~\Anaconda3\lib\site-packages\tqdm_tqdm_notebook.py in iter(self, *args, **kwargs)
221 def iter(self, *args, **kwargs):
222 try:
--> 223 for obj in super(tqdm_notebook, self).iter(*args, **kwargs):
224 # return super(tqdm...) will not catch exception
225 yield obj

~\Anaconda3\lib\site-packages\tqdm_tqdm.py in iter(self)
1003 """), fp_write=getattr(self.fp, 'write', sys.stderr.write))
1004
-> 1005 for obj in iterable:
1006 yield obj
1007 # Update and possibly print the progressbar.

~\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in next(self)
334 self.reorder_dict[idx] = batch
335 continue
--> 336 return self._process_next_batch(batch)
337
338 next = next # Python 2 compatibility

~\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _process_next_batch(self, batch)
355 self._put_indices()
356 if isinstance(batch, ExceptionWrapper):
--> 357 raise batch.exc_type(batch.exc_msg)
358 return batch
359

RuntimeError: Traceback (most recent call last):
File "C:\Users\Lenovo\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "../..\recnn\data\env.py", line 246, in prepare_batch_wrapper
frame_size=self.frame_size,
File "../..\recnn\data\utils.py", line 177, in prepare_batch_static_size
users_t = torch.tensor(users_t)
RuntimeError: Could not infer dtype of numpy.int64

Interpreting the recommendation results

Hi, i was trying to understand the recommendation results but i could not come up with an explanation.

value_net  = recnn.nn.Critic(1290, 128, 256, 54e-2)
policy_net = recnn.nn.Actor(1290, 128, 256, 6e-1)
recommendation = policy_net(state)
value = value_net(state, recommendation)
print(recommendation)
print(value)

tensor([[ 3.7613, -3.4141,  3.0177,  ..., -4.2616, -3.1591, -0.4270],
        [ 3.7313, -3.1471,  5.0571,  ..., -2.9717, -1.8467, -4.7726],
        [ 1.9474,  1.0796,  6.5727,  ..., -6.5865,  1.0699, -5.3639],
        ...,
        [ 2.6672, -4.2484, -0.3397,  ..., -4.4179, -1.1133, -2.9916],
        [ 5.5805, -4.7790, -4.0367,  ..., -4.2819, -0.4999, -3.0007],
        [-2.6792,  0.4703,  5.4456,  ..., -2.2477,  4.8281, -0.8356]],
       grad_fn=<AddmmBackward>)
tensor([[-0.7228],
        [ 0.6361],
        [-4.5583],
        ...,
        [-3.4151],
        [-2.1304],
        [-6.2599]], grad_fn=<AddmmBackward>)

>>>print(recommendation.shape)
>>>print(value.shape)
torch.Size([1785, 128])
torch.Size([1785, 1])

My question is that what does 1785 represent? Does it represent 1785 movie vectors and does value represent their relevance score? And how can i get the recommended movie id or movie name from these two variables.

Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling

Is this a complete implementation of the paper (Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling)?

Pandas Performance suggestion

Hello, i am working with your awesome paper and awesome project. And I think I can improve your performance when working ratings data in pandas.
According to https://realpython.com/fast-flexible-pandas/

The code with long loops and dict management can be speed up by replacing:

ratings = pd.read_csv('../data/ml-20m/ratings.csv')
users = np.array(list(set(ratings['userId'])))
ratings_by_user = dict([(i, pd.DataFrame()) for i in users])
for u in tqdm(list(users)):
        ratings_by_user[u] = ratings.loc[ratings['userId'] == u].sort_values(by='timestamp').reset_index(drop=True)

import pickle

ratings_by_user = pickle.load(open('../data/rbu.pkl', 'rb'))

for user in tqdm(ratings_by_user.keys()):
    ratings_by_user[user] = ratings_by_user[user].drop(['timestamp', 'userId'], axis=1)
    ratings_by_user[user]['rating'] = ratings_by_user[user]['rating'].apply(lambda i: 2*(i-2.5))

to_delete = []
for user in tqdm(list(ratings_by_user.keys())):
    ratings_by_user[user]['rating'] = ratings_by_user[user]['rating'].apply(lambda i: 2*(i-2.5))
    ratings_by_user[user] = ratings_by_user[user][ratings_by_user[user][:,1] >= 0]
    ratings_by_user[user] = ratings_by_user[user].values
    # specify your frame size here! 1 end choice + # of movies to be fed into the model
    if len(ratings[i]) < 11:
        to_delete.append(ratings_by_user[user])
        
for i in to_delete:
    del ratings_by_user[user]



class ML20mDataset(Dataset):
    def __init__(self):
        self.set_dataset(1)
    
    def set_dataset(self, u):
        self.user = u
        self.dataset = ratings[u]
        
    def __len__(self):
        return max(len(self.dataset) - frame_size, 0)
    
    def __getitem__(self, idx):
        ratings = self.dataset[idx:frame_size+idx+1]
        movie_chosen = ratings[:, 0][-1]
        films_watched = ratings[:, 0][:-1]
        
        films_lookup = torch.stack([movies[id_to_index[i]] for i in ratings[:, 0]])
        
        state = films_lookup[:-1].to(cuda).float()
        next_state = films_lookup[1:].to(cuda).float()
        
        rewards = torch.tensor(ratings[:, 1][:frame_size]).to(cuda).float()
        next_rewards = torch.tensor(ratings[:, 1][1:frame_size+1]).to(cuda).float()
        
        action = films_lookup[-1].to(cuda)
        
        reward = torch.tensor(ratings[:, 1][-1].tolist()).to(cuda).float()
        done = torch.tensor(idx == self.__len__() - 1).to(cuda).float()
        
        state = (state, rewards)
        next_state = (next_state, next_rewards)
        
        return state, action, reward, next_state, done

with arrow processing when working with pandas df but still yield corresponding result

        print("Start update ratings %s" % (datetime.datetime.now(),))
        train_df = train_df.copy()
        train_df["rating"] = train_df['rating'].apply(lambda i: 2 * (i - 2.5))
        train_df = train_df[train_df["rating"] >= 0]
        users = train_df[["user","item"]].groupby(["user"]).size()
        users = users[users >= self.frame_size + 1]
        train_df = train_df[train_df["user"].isin(users.index)]
        user_rated = train_df.sort_values(by=["user", 'timestamp']).drop(columns=["timestamp"]).set_index("user")

        self.user_rated = user_rated

        print("Done update ratings %s" % (datetime.datetime.now(),))

        batch_bar = tqdm(total=self.get_total_user())

        for user, df in self.user_rated.groupby(level=0):
            batch_bar.update(1)
            size = max(len(df) - self.frame_size, 0)
            for idx in range(0, size):
                if np.random.rand() > 0.2:  # intake percents
                    continue
                ratings = df[idx:self.frame_size + idx + 1]
                ratings = ratings[["item", "rating"]].values

                movie_chosen = ratings[:, 0][-1]
                films_watched = ratings[:, 0][:-1]
                ...

Above process took
Start update ratings 2019-04-09 16:44:01.957343
Done update ratings 2019-04-09 16:44:24.071254

Instead of manually looping over each element to get it's index

id_to_index = dict([(u, i) for i, u in enumerate(pd.read_csv('../data/ml-20m/movies.csv')['movieId'].values)])

You can use

movies_series = pd.read_csv('../data/ml-20m/movies.csv')["movieId"].reset_index(drop=True)
movies_series = pd.Series(movies_series.index.values, index=movies_series)
id_to_index = movies_series.to_dict()

This is my first issues on github, and i want to contribute to your awesome project.

Hi can you please provide the ml20_pca128.pkl file needed for the project ? Thanks

when will top-k off-policy be implemented?

when will top-k off-policy be implemented? I was reading this paper and looking forward to its implementation released, haha

Questions about DDPG

Hi! I'm new to RL and currently doing a project in music recommender system using DDPG. Its kinda similar with your DDPG project, and I got some things that I still confused..
If you don't mind, please answer my question..

How many user history data did you use for your movie recommendation?
Did epoch matter in RL especially DDPG? (Sorry if it sounds stupid, but I saw some tutorial and got really confused, some tutorial use randomize data for the environment initial state, which therefore I assume it didn't really care for epoch. But almost all RL environment like in openai gym use one initial state and train it for thousand episode, for example in continuous mountain car, the episode always start in the same position)
If I have an environment which have 15 steps for 1 episode, is it fine if i used discount factor 0.97

Sorry if it didn't really related to your github:( But I hope that you can help me because I got no expert to consult to..
Thank you so much.

AssertionError: Torch not compiled with CUDA enabled in DDPG.pynb notebook ?

for batch in tqdm(env.train_dataloader):
---> 38 loss = ddpg_update(batch, params, step=step)
39 plotter.log_losses(loss)
40 step += 1

in ddpg_update(batch, params, learn, step, pin_memory)
70 def ddpg_update(batch, params, learn=True, step=-1,pin_memory=False):
71
---> 72 state, action, reward, next_state, done = recnn.data.get_base_batch(batch)
73
74 # --------------------------------------------------------#

~/recnn/data/utils.py in get_base_batch(batch, device, done)
219 else:
220 batch.append(torch.zeros_like(batch['reward']))
--> 221 return [i.to(device) for i in b]
222
223

~/recnn/data/utils.py in (.0)
219 else:
220 batch.append(torch.zeros_like(batch['reward']))
--> 221 return [i.to(device) for i in b]
222
223

~/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py in _lazy_init()
160 raise RuntimeError(
161 "Cannot re-initialize CUDA in forked subprocess. " + msg)
--> 162 _check_driver()
163 torch._C._cuda_init()
164 _cudart = _load_cudart()

~/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py in _check_driver()
73 def _check_driver():
74 if not hasattr(torch._C, '_cuda_isDriverSufficient'):
---> 75 raise AssertionError("Torch not compiled with CUDA enabled")
76 if not torch._C._cuda_isDriverSufficient():
77 if torch._C._cuda_getDriverVersion() == 0:

AssertionError: Torch not compiled with CUDA enabled

confused about train

In the existing literature, they use agents to explore and then get a model based on the interaction data. But I found that in your code, you use the data directly from the Dataset to train.I was wondering if I didn't understand your code or...

running streamlit_demo but error

File of movie_counts.csv is not exist;running streamlit_demo get this error;

Why DDPG always takes the same action

The project you did is so great!
But when I use ddpg(not yours) to train, it will fall into some fixed value , I want to know how you solved this problem.I will be grateful if you can help me.

ModuleNotFoundError: No module named 'tqdm.auto'

I Run this code :

import numpy as np
import pandas as pd
from tqdm.auto import tqdm
import pickle
import gc
import json
import h5py

from IPython.display import clear_output
import matplotlib.pyplot as plt
%matplotlib inline


# == recnn ==
import sys
sys.path.append("../../")
import recnn

device = torch.device('cuda')
# ---
frame_size = 10
batch_size = 10
embed_dim  = 128
# --- 

tqdm.pandas()

Result:

ModuleNotFoundError Traceback (most recent call last)
in
1 import numpy as np
2 import pandas as pd
----> 3 from tqdm.auto import tqdm
4 import pickle
5 import gc

ModuleNotFoundError: No module named 'tqdm.auto'

Getting size mismatch error when trying to train on custom data

Hi,
I've followed the guide for using your own dataset in this guide. But I get a bizarre error about shape mismatch.

RuntimeError                              Traceback (most recent call last)
<ipython-input-6-fc16f914cab3> in <module>
      6 n_epochs = 2
      7 
----> 8 learn()

<ipython-input-4-9ce1ba07689f> in learn()
      9     for epoch in range(n_epochs):
     10         for batch in env.train_dataloader:
---> 11             loss = ddpg.update(batch, learn=True)
     12             plotter.log_losses(loss)
     13             ddpg.step()

./venv/lib64/python3.8/site-packages/recnn/nn/algo.py in update(self, batch, learn)
     46 
     47     def update(self, batch, learn=True):
---> 48         return self.algorithm(batch, self.params, self.nets, self.optimizers,
     49                               device=self.device, debug=self.debug, writer=self.writer,
     50                               learn=learn, step=self._step)

./venv/lib64/python3.8/site-packages/recnn/nn/update/ddpg.py in ddpg_update(batch, params, nets, optimizer, device, debug, writer, learn, step)
     55     # Value Learning
     56 
---> 57     value_loss = value_update(batch, params, nets, optimizer,
     58                               writer=writer, device=device,
     59                               debug=debug, learn=learn, step=step)

./venv/lib64/python3.8/site-packages/recnn/nn/update/misc.py in value_update(batch, params, nets, optimizer, device, debug, writer, learn, step)
     20 
     21     with torch.no_grad():
---> 22         next_action = nets['target_policy_net'](next_state)
     23         target_value = nets['target_value_net'](next_state, next_action.detach())
     24         expected_value = temporal_difference(reward, done, params['gamma'], target_value)

./venv/lib64/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

./venv/lib64/python3.8/site-packages/recnn/nn/models.py in forward(self, state, tanh)
     64         :return: action
     65         """
---> 66         action = F.relu(self.linear1(state))
     67         action = self.drop_layer(action)
     68         action = F.relu(self.linear2(action))

./venv/lib64/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

./venv/lib64/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

./venv/lib64/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1608     if input.dim() == 2 and bias is not None:
   1609         # fused op is marginally faster
-> 1610         ret = torch.addmm(bias, input, weight.t())
   1611     else:
   1612         output = input.matmul(weight.t())

RuntimeError: size mismatch, m1: [52039 x 1010], m2: [1290 x 256] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41

This is the network's defition:

value_net  = recnn.nn.Critic(1290, 128, 256, 54e-2)
policy_net = recnn.nn.Actor(1290, 128, 256, 6e-1)

cpu_device = torch.device('cpu')
ddpg = recnn.nn.DDPG(policy_net, value_net)
ddpg = ddpg.to(cpu_device)
plotter = recnn.utils.Plotter(ddpg.loss_layout, [['value', 'policy']],)

And this is the learning function:

# learn function
def learn():
    for epoch in range(n_epochs):
        for batch in env.train_dataloader:
            loss = ddpg.update(batch, learn=True)
            plotter.log_losses(loss)
            ddpg.step()
            if ddpg._step % plot_every == 0:
                clear_output(True)
                print('step', ddpg._step)
                test_loss = run_tests()
                plotter.log_losses(test_loss, test=True)
                plotter.plot_loss()
            if ddpg._step > 100:
                return

As you can see, I did almost nothing different to the sample code. So I really have no idea why this shape mismatch would happen. My data also seems identical to ML-20M in terms of structure. For embedding I've used matrix factorization via SVD and each item in it is an array of size 100.

[Error Report] RecNN on Colab isn't working

Thank you for your contribution to the community.
I'd like to remind you of the error I encountered just now.
It seems that the current your notebook isn't working. Please check it once when you have time.

getting error: Expected sequence or array-like, got <class 'NoneType'>

helo,
im using my own dataset

i keep getting error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-153-65892292bdab> in <module>()
     10 )
     11 
---> 12 env = recnn.data.env.FrameEnv(dirs, frame_size, batch_size,prepare_dataset= prepare_my_dataset)

4 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in _num_samples(x)
    189             x = np.asarray(x)
    190         else:
--> 191             raise TypeError(message)
    192 
    193     if hasattr(x, 'shape') and x.shape is not None:

TypeError: Expected sequence or array-like, got <class 'NoneType'>

Everytime i tried

env = recnn.data.env.FrameEnv(dirs, frame_size, batch_size,prepare_dataset= prepare_my_dataset)

this is my prepare dataset function:

import numpy as np
import datetime
import random
import time

def string_time_to_unix(s):
  return int(time.mktime(datetime.datetime.strptime(s, "%H:%M:%S").timetuple()))

def prepare_my_dataset(args_mut, kwargs):

  frame_size = kwargs.get('frame_size')
  key_to_id = args_mut.base.key_to_id
  df = args_mut.df

  df['vendor_rating'] = df['vendor_rating'].apply(lambda i: 2 * (i - 2.5))
  df['timestamp'] = df['timestamp'].apply(string_time_to_unix)
  df['vendor_id'] = df['vendor_id'].apply(key_to_id.get)
  
  
  customer = df[['customer_id', 'vendor_id']].groupby(['customer_id']).size()
  customer = customer[customer > frame_size].sort_values(ascending=False).index

  ratings = df.sort_values(by='timestamp').set_index('customer_id').drop('timestamp', axis=1).groupby('customer_id')
  
  cust_dict ={}

  def app(x):
    customer_id = x.index[0]
    cust_dict [int(customer_id)] = {}
    cust_dict[int(customer_id)]['items'] = x['vendor_id'].values
    cust_dict[int(customer_id)]['ratings']= x['vendor_rating'].values
  

  ratings.apply(app)

  args_mut.cust_dict = cust_dict
  args_mut.customer = customer



  return args_mut,kwargs

this is how my customer looks like :
Int64Index([199, 62, 72, 71, 70, 69, 68, 67, 66, 65, ... 135, 134, 133, 132, 131, 130, 129, 128, 127, 0], dtype='int64', name='customer_id', length=200)

and this is how a little of my cust_dict looks like :

{0: {'items': array([221, 225, 237, 250, 259, 265, 271, 274, 288, 289, 294, 295, 298, 299, 300, 304, 356, 386, 391, 398, 401, 419, 459, 537, 547, 573, 575, 216, 577, 207, 201, 90, 92, 104, 105, 106, 110, 113, 115, 134, 145, 148, 149, 154, 157, 159, 160, 161, 176, 180, 188, 189, 191, 192, 193, 195, 197, 199, 203, 86, 85, 83, 84, 4, 13, 20, 23, 28, 33, 43, 44, 55, 66, 67, 75, 76, 78, 79, 81, 82]), 'ratings': array([3.4, 3.4, 4.2, 4. , 3.6, 3.6, 4. , 2.4, 4.2, 4. , 3.8, 4.4, 4.4, 3.4, 3.8, 3. , 3.4, 4. , 3.4, 3.4, 4. , 3.4, 3.4, 3.8, 3.8, 4.2, 3.2, 4.4, 4. , 3.2, 3. , 3.8, 4.2, 4. , 4. , 4. , 4.2, 4.4, 4.6, 4. , 1.4, 3.2, 3.4, 4. , 3.6, 4. , 3.6, 3.4, 3.6, 2.6, 4.2, 3.6, 4. , 3.6, 3.6, 3.4, 3.8, 4. , 3. , 4. , 4.2, 3.4, 3.6, 3.8, 4.4, 4. , 4. , 3.8, 4.2, 3.6, 3.6, 4. , 3. , 3.6, 4.2, 4.2, 3.8, 4.4, 2.6, 3.8])}, 1: {'items': array([115, 134, 145, 148, 149, 154, 157, 159, 160, 161, 176, 180, 188, 189, 191, 192, 193, 195, 197, 199, 201, 203, 207, 216, 221, 225, 237, 113, 250, 110, 105, 4, 13, 20, 23, 28, 33, 43, 44, 55, 66, 67, 75, 76, 78, 79, 81, 82, 83, 84, 85, 86, 90, 92, 104, 106, 259, 265, 271, 274, 288, 289, 294, 295, 298, 299, 300, 304, 356, 386, 391, 398, 401, 459, 537, 547, 573, 575, 577, 419]), 'ratings': array([4.6, 4. , 1.4, 3.2, 3.4, 4. , 3.6, 4. , 3.6, 3.4, 3.6, 2.6, 4.2, 3.6, 4. , 3.6, 3.6, 3.4, 3.8, 4. , 3. , 3. , 3.2, 4.4, 3.4, 3.4, 4.2, 4.4, 4. , 4.2, 4. , 3.8, 4.4, 4. , 4. , 3.8, 4.2, 3.6, 3.6, 4. , 3. , 3.6, 4.2, 4.2, 3.8, 4.4, 2.6, 3.8, 3.4, 3.6, 4.2, 4. , 3.8, 4.2, 4. , 4. , 3.6, 3.6, 4. , 2.4, 4.2, 4. , 3.8, 4.4, 4.4, 3.4, 3.8, 3. , 3.4, 4. , 3.4, 3.4, 4. , 3.4, 3.8, 3.8, 4.2, 3.2, 4. , 3.4])}

can you help me?

Show the user recommendation

I read your work, becosuse i want to create a RS using RL. I was interested in how i can print the user id of each recomendation obtanied?

from scipy.spatial import distance
import json
import pandas as pd
value_net  = recnn.nn.Critic(1290, 128, 256, 54e-2)
policy_net = recnn.nn.Actor(1290, 128, 256, 6e-1)
meta = json.load(open('omdb.json'))
recommendation = policy_net(state)
x = np.random.randint(0, state.size(0), 1)
recommendation = recommendation[x[0]].detach().cpu().numpy()
rank(recommendation, distance.euclidean)

I guess that the x is the index of the recommendation of a user having 1731 recommendations of 128 movies, but to know how good is the recommendation i want to know the user ID of each recommendation.
Thanks

Questions about Topk REINFORCE

Hello, thanks for sharing!
I have some questions about pi_beta_sample in models.py, you use this function in _select_action_with_TopK_correction, but it seems only sample one item each time?
I am also confused by Equation 6 in the original paper,

as we want to sample a set of top k item, shouldn't it be
? a_{t, i} represent the ith item at time t.
I appreciate any comments for my question since it's been bothering me for a long time