This requires the PyTorch modules in here to be enhanced to generate NVT, PyTorch and ensemble model config files with multi-hot input inference support.
Benchmark the new PyT data loader with the REES46 ecommerce dataset, using multiple GPUs
Train set: All train.parquet files for 31 days (1 parquet file by week). P.s. Set row group size accordingly
Eval set: All valid.parquet files concatenated
Create a recsys_main.py variation for non-incremental training
Train with 3 weeks and evaluate on the last week
Run experiments varying the number of GPUs: Single GPU, Multi-GPU Data Parallel, Multi-GPU Distributed DataParallel
The current version returns 2D tensor where sequential inputs are combined using EmbeddingBag with 'mean' combiner. Instead, we should support returning a 3-D tensor to build item interaction embedding for session-based task.
This is a requirement for RecSys21 tutorial. Once we load the models to the Triton server, we then should be able to create a client.py or an example code in jp notebook to send request to the server, and generate final prediction results for the next item to be clicked.
When mf_constrained_embeddings is set to True in SequentialPrediction task, the output layer is tied to the embedding table of item-id. the prediction head should therefore get this table from the model block.
Create an API for the Transformers4Rec modules (Meta-Architecture, Evaluation, Logging), allowing users to define their own training and evaluation pipeline (not necessarily using our recsys_main.py script)
Extending the embedding tables of categorical features for new values seen on incremental training.
P.s. requires incremental preprocessing ( NVIDIA-Merlin/NVTabular#798 )
For multi-task learning, head class defines add methods related to binary and classification tasks. We need to add a method for item_prediction_task. The labels could be retrieved either from: ColumnGroup or MaskedSequence.
Keras propagates static shape information. This allows things like a Dense layer where you can just specify the hidden-dimension, and then we it builds the layer with the shapes in can init the weights. We would like to have something similar in the torch-side of the library so that we could enable things like a MLPBlock.
We currently have a custom DataLoader (that inherits from the NVT PyT Data loader) that converts the offsets representation of list columns to sparse tensors.
We need to test and adapt our code to check if Julio's PR fully replaces our custom Data loader.
Test Transformers4Rec pipeline with multi-GPU PyTorch NVT data loader.
Check if NVT supports DataParallel or DistributedDataparallel and if HF supports Horovod
Currently we use PyTorch DataParallel, which is not optimal. Move to DistributedParallel (recommended) using the Multi-GPU support of NVT Data loaders and check which Transformer architectures work