Giter VIP home page Giter VIP logo

dreamy's Introduction

dreamy_logo

DReAMy is a python library built around pytorch and Hugging Face's ๐Ÿค— transformers to automatically analyse (for now only) textual dream reports. At the moment, annotations are based on the Hall & Van de Castle emotions framewokr, but we are looking forward to expand our set applications. The theretical backbone of DReAMy and its model is based on a fuiifull collaboration betweem NLP and sleep research. More detailed results can be found here.

Installation and usage

DReAMy can be easly installed via pip! If you wish to play/query a set of DReAMy's model, you can do so in the dream ๐Ÿค— Space.

pip install dreamy

Current Features

At the moment, DReAMy has four main features:

  • Datasts, allowing to download and use two dream-report datasets from DreamBank.
  • Emotion Classification, allowing to easly classify lists of reports for HVDC emotions.
  • NER (or character annotation), that allow to extract relevant characters appearing in a given report.
  • Encodigns: that easly collects and explores enbeddings of textual reports.

Use example can be found in the code below, and in the tutorial folder.

Dataset

DReAMy has direct access to two datasets. A smaller English-only (~ 20k), with more descriptive variables (such as gender and year), and a larger and multilingual one (En/De, ~ 30 k). You can easly choose between the two of them with the simple code below.

import dreamy

language   = "english" # choose between english/multi
dream_bank = dreamy.get_HF_DreamBank(as_dataframe=True, language=language)

n_samples     = 10
dream_sample  = dream_bank.sample(n_samples, random_state=35).reset_index(drop=True)
dream_as_list = dream_sample["dreams"].tolist()

dream_bank.sample(2)
index series description dreams gender year
5875 blind-f Blind dreamers (F) I'm at work in the office of a rehab teacher named D, a transistor radio is on, [...]. female mid-1990s
12888 emma Emma: 48 years of dreams I go to Pedro's house, he is fixing his bike. I think I will take my bike out too, but [...]. female 1949-1997

Emotion Classification

DReAMy comes equipped with a set of model tuned to reproduce expert-annotators labels accoding to the Hall & Van de Castle system. These models can perform emotion classification. (a.k.a. sentiment analysis) following 2 main patterns.

Presence

Two model are currenlty available to detect the presence of difference emotions: base-en and large-multi, easly querible with the short code below.

classification_type = "presence"
model_type          = "base-en"
return_type         = "distribution" # set "present" for above-threshold only
device              = "cpu"

predictions = dreamy.predict_emotions(
    dream_as_list, 
    classification_type, 
    model_type,
    return_type=return_type, 
    device=device,
)

predictions
[[{'label': 'CO', 'score': 0.7488341331481934},
  {'label': 'HA', 'score': 0.09567967802286148},
  {'label': 'AN', 'score': 0.03418444097042084},
  {'label': 'AP', 'score': 0.019197145476937294},
  {'label': 'SD', 'score': 0.012466167099773884}],
 [{'label': 'HA', 'score': 0.9818947911262512},
  {'label': 'SD', 'score': 0.03642113506793976},
  {'label': 'AP', 'score': 0.03470420092344284},
  {'label': 'CO', 'score': 0.024184534326195717},
  {'label': 'AN', 'score': 0.023663492873311043}]

You can get the HVDC decodings via

dreamy.Coding_emotions
{'AN': 'anger',
 'AP': 'apprehension',
 'SD': 'sadness',
 'CO': 'confusion',
 'HA': 'happiness'}

Generation

Under this variance, a T5 model is trained with the same data to generate the emotion-based reports, with two extra feature. First, the emotion are "numbered". This refers to the fact that if the same emotion was found more than once in the same report, the model should be able to identify so. Second, the model is also trained to recognise to which character the emotion are associated with. See the examples below.

English-only, characters + numbered emotions
classification_type = "generation"
model_type          = "char-en"
device              = "cpu"

predictions = dreamy.generate_emotions(
    dream_as_list, 
    classification_type, 
    model_type,
    device=device,
)

predictions
['The dreamer experienced apprehension.',
 'The group joint stranger adult experienced happiness. The dreamer experienced anger.',]

NER

An important aspect of each dream report is the character that appear in it. In thi notebook, we will see how to use dreamy to extract character appearing in each report. As always, character are defined with respect to the Hall & Van de Castle system. CHAR are in this case spelled out, and do not/should not include the dreamer themself. Please note that CHAR data used in training is not linked to any specii feature. In other words, prediction should not be interpreted in any other way other than their presence.

classification_type = "full"
model_type          = "base-en"
device              = "cpu"

predictions = dreamy.get_CHAR(
    dream_as_list, 
    classification_type, 
    model_type,
    device=device,
    max_new_token=60,
)
predictions
['individual female known adult; group female uncertian adult; individual indefinite uncertian adult;',
 'individual female known adult;']

Encoding, reduction and visualisation

You can also use DReAMy to easily extract, reduce, cluster, and visualise encodings (i.e., vector embeddings) of dream reports, with few and simple lines of codee. At the moment, you can chose betweem four model, that are combination of small/large Engish-ony/multilingual models.

import dreamy

# get some data
n_samples  = 10
language   = "english" # choose between english/multi

dream_bank = dreamy.get_HF_DreamBank(as_dataframe=True, language=language)
dream_bank = dream_bank.sample(n_samples).reset_index(drop=True)

dream_as_list = dream_sample["dreams"].tolist()

# set up model and  get encodings
model_size = "small"   # or large
model_lang = "english" # or multi, for multilingual
device     = "cpu"     # se to "cuda" for GPUs

report_encodings = dreamy.get_encodings(
    dream_as_list, 
    model_size=model_size,
    language=model_lang, 
    device=device,
)

# reduce space
# you can choose between pca/t-sne
X, Y = dreamy.reduce_space(report_encodings, method="pca") 

# Update your original dataframe with cohordinates and plot
dream_bank["DR_X"], dream_bank["DR_Y"] = X, Y

Then use your favourite visualisation library to explore the results.

import seaborn as sns

sns.set_context("talk")
sns.set_style("whitegrid")

g = sns.scatterplot(
    data=dream_sample, 
    x="DR_X", 
    y="DR_Y", 
    hue="series",
    palette="Set2"
)
g.legend(loc='center left', title="DreamBank Series", bbox_to_anchor=(1, 0.5))

Check the starting tutorial for more, like unuspervised K-Mean clustering. alt text

In-Progress Development

Topic-Modelling

Planned Development

Audio-to-Text pipeline

EEG interface

Contribute

If you wish to contribute, collaborate, or just ask question, feel free to contact Lorenzo, or use the issue section.

Cite

If you use DReAMy, please cite the pre-print

@misc{https://doi.org/10.48550/arxiv.2302.14828,
  doi = {10.48550/ARXIV.2302.14828},
  url = {https://arxiv.org/abs/2302.14828},
  author = {Bertolini, Lorenzo and Elce, Valentina and Michalak, Adriana and Bernardi, Giulio and Weeds, Julie},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Automatic Scoring of Dream Reports' Emotional Content with Large Language Models},
  publisher = {arXiv},
  year = {2023},
  copyright = {Creative Commons Attribution 4.0 International}
}

dreamy's People

Contributors

lorenzoscottb avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.