- Reddit Dataset from this repository Put splitted data under data/reddit directory
- Memotion Dataset from kaggle Put splitted data under data/memotion directory
- Eimages from MET-Meme: A Multi-modal Meme Dataset Rich in Metaphors Meme class: happiness(1);love(2);anger(3);sorrow(4);fear(5);hate(6);surprise(7). Put images under data/Multi_category_Meme/images/ and csv files directly under data/Multi_category_Meme
- Pretrain ALBERT and Multi-Modal models by following instructions in Memotion Multi-Modal Model section.
- Preprocess Eimages data:
python utils/multi_category_meme_data_to_pkl.py
- Train model for multi category meme sentiment analysis
bash utils/run_model_w_multi_category_meme.sh
Pretrained models and results will be stored in the pretrained_models directory.
- Create a directory and store your meme images under the data directory.
- Go to utils/meme_prediction.sh and change YOUR_IMAGE_DIRECTORY_NAME to the name of the directory you just created
- Run the following script to first preprocess your meme images
Probability distribution of each meme image over 7 sentiment classes will be stored in meme_filename_to_prob_dist.json file
bash utils.meme_prediction.sh
The growing ubiquity of Internet memes on social media platforms, such as Facebook, Twitter, Instagram and Reddit have become a unstoppable trend. Comprising of visual and textual information, memes typically convey various emotion (e.g. humor, sarcasm, offensive, motivational, and sentiment). Nevertheless, there is not much research attention toward meme emotion analysis. The main objective of this project is to bring the attention of the research community toward meme studying. In this project, we propose the Memotion Multimodal Model (M4 Model) newtork for humor detection on memotion dataset. Inspired by ConcatBERT model, we propose our model while the core "Gated Multimodal Layer" (GML) are borrowed by this arXiv paper
In this project, memotion Dataset are mainly used while we apply transfer learning using Reddit Dataset.
- Download offical Memotion Dataset from kaggle
- Download and prepare sampled Reddit Dataset from this repository
- Put your Reddit and Memotion Dataset to ./data/reddit and ./data/memotion accordingly.
- Training the ALBERT model using Reddit Dataset.
- go to utils/util_args/
- set "train_reddit" to 1
- set "dataset" to "reddit"
- set "model" to "RedditAlbert"
- set "bert_model" to "albert-base-v2"
- run "python main.py"
- After training, the trained model and corresponding log file should be stored under ./pretrained_models/.
- Training the MultiModal model using Memotion Dataset.
- go to utils/util_args/
- set "train_reddit" to 0
- set "dataset" to "memotion"
- set "model" to "GatedAverageBERT"
- load the pretrained bert model by modify the model name in initiate() function of main.py.
- run "python main.py"
Resulted log files can be found under pretrained_models/.
Variant | Model | Task | Test Acc % | Macro-F1 % | Benchmark % | Download Link |
---|---|---|---|---|---|---|
A2 | ALBERT+FC | 60.79 | 55.96 | 72.40 (Acc) | โ | |
A2 (GML) | ALBERT+FC+VGG16 | Memotion | 68.32 | 54.57 | 52.99 (F1) | โ |
This code is partial borrowed from: