Giter VIP home page Giter VIP logo

creep-image-dataset's Introduction

Multimodal dataset of potentially offensive comments and images

In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely created and shared, limitations set by image-based social media platforms like Instagram make it difficult for researchers to with multimodal hate speech data. We therefore developed a multimodal dataset of images and potentially offensive comments in Italian, created during awareness-raising activities in classes by students between 15 and 18 years old. The dataset was created using the CREENDER tool (Palmero Aprosio et al., 2020).

Images randomly collected from the web were displayed to students, with a prompt asking "If you saw this picture on Instagram, would you make fun of the person who posted it?". If yes, they had to add a comment and select a category explaining the reason for the comment. The available categories were Body, Clothing, Pose, Facial expression, Location, Activity, Other. The same images were displayed several times so to collect multiple judgements. Overall, 17,912 images have been judged at least once by the students. For 1,018 of them, at least an offensive comment has been written during the annotation sessions. Overall, the number of comments in the dataset is 1,135, which is higher than the number of pictures with a comment because the same image may be commented more than once by different students. Since the comments were collected with the written consent of parents and teachers, they can be freely used for research purposes, without the ethical implications that would derive from using real data posted by teenage users. The images, instead, are released as a ResNet-18 neural network trained on ImageNet

Format

The dataset consists in a unique json file containing an array of records, each of them containing:

  • comments: a list of annotations, where annotation is a value between 0 and 100, and comment is the text inserted by the user.

    • 0 - None
    • 1 - Body
    • 2 - Clothing
    • 3 - Pose
    • 4 - Facial expression
    • 5 - Location
    • 6 - Activity
    • 100 - Other
  • vector: a list of 512 float numbers representing the picture, using ResNet-18 network pre-trained on ImageNet as the image encoder.

Publications

Stefano Menini, Alessio Palmero Aprosio and Sara Tonelli. A Multimodal Dataset of Images and Text to Study Abusive Language. In Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020), Bologna, Italy

creep-image-dataset's People

Contributors

ziorufus avatar satonelli avatar stefanomenini avatar

Watchers

James Cloos avatar  avatar  avatar Giovanni Moretti avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.