DReAMy is a python library built around pytorch and Hugging Face's ๐ค transformers
to automatically analyse (for now only) textual dream reports. At the moment, annotations are based on the Hall & Van de Castle emotions framewokr, but we are looking forward to expand our set applications. The theretical backbone of DReAMy and its model is based on a fuiifull collaboration betweem NLP and sleep research. More detailed results can be found here.
DReAMy can be easly installed via pip! If you wish to play/query a set of DReAMy's model, you can do so in the dream
๐ค Space.
pip install dreamy
At the moment, DReAMy has four main features:
- Datasts, allowing to download and use two dream-report datasets from DreamBank.
- Emotion Classification, allowing to easly classify lists of reports for HVDC emotions.
- NER (or character annotation), that allow to extract relevant characters appearing in a given report.
- Encodigns: that easly collects and explores enbeddings of textual reports.
Use example can be found in the code below, and in the tutorial folder.
DReAMy has direct access to two datasets. A smaller English-only (~ 20k), with more descriptive variables (such as gender and year), and a larger and multilingual one (En/De, ~ 30 k). You can easly choose between the two of them with the simple code below.
import dreamy
language = "english" # choose between english/multi
dream_bank = dreamy.get_HF_DreamBank(as_dataframe=True, language=language)
n_samples = 10
dream_sample = dream_bank.sample(n_samples, random_state=35).reset_index(drop=True)
dream_as_list = dream_sample["dreams"].tolist()
dream_bank.sample(2)
index | series | description | dreams | gender | year |
---|---|---|---|---|---|
5875 | blind-f | Blind dreamers (F) | I'm at work in the office of a rehab teacher named D, a transistor radio is on, [...]. | female | mid-1990s |
12888 | emma | Emma: 48 years of dreams | I go to Pedro's house, he is fixing his bike. I think I will take my bike out too, but [...]. | female | 1949-1997 |
DReAMy comes equipped with a set of model tuned to reproduce expert-annotators labels accoding to the Hall & Van de Castle system. These models can perform emotion classification. (a.k.a. sentiment analysis) following 2 main patterns.
Two model are currenlty available to detect the presence of difference emotions: base-en
and large-multi
, easly querible with the short code below.
classification_type = "presence"
model_type = "base-en"
return_type = "distribution" # set "present" for above-threshold only
device = "cpu"
predictions = dreamy.predict_emotions(
dream_as_list,
classification_type,
model_type,
return_type=return_type,
device=device,
)
predictions
[[{'label': 'CO', 'score': 0.7488341331481934},
{'label': 'HA', 'score': 0.09567967802286148},
{'label': 'AN', 'score': 0.03418444097042084},
{'label': 'AP', 'score': 0.019197145476937294},
{'label': 'SD', 'score': 0.012466167099773884}],
[{'label': 'HA', 'score': 0.9818947911262512},
{'label': 'SD', 'score': 0.03642113506793976},
{'label': 'AP', 'score': 0.03470420092344284},
{'label': 'CO', 'score': 0.024184534326195717},
{'label': 'AN', 'score': 0.023663492873311043}]
You can get the HVDC decodings via
dreamy.Coding_emotions
{'AN': 'anger',
'AP': 'apprehension',
'SD': 'sadness',
'CO': 'confusion',
'HA': 'happiness'}
Under this variance, a T5 model is trained with the same data to generate the emotion-based reports, with two extra feature. First, the emotion are "numbered". This refers to the fact that if the same emotion was found more than once in the same report, the model should be able to identify so. Second, the model is also trained to recognise to which character the emotion are associated with. See the examples below.
classification_type = "generation"
model_type = "char-en"
device = "cpu"
predictions = dreamy.generate_emotions(
dream_as_list,
classification_type,
model_type,
device=device,
)
predictions
['The dreamer experienced apprehension.',
'The group joint stranger adult experienced happiness. The dreamer experienced anger.',]
An important aspect of each dream report is the character that appear in it. In thi notebook, we will see how to use dreamy
to extract character appearing in each report. As always, character are defined with respect to the Hall & Van de Castle system. CHAR are in this case spelled out, and do not/should not include the dreamer themself. Please note that CHAR data used in training is not linked to any specii feature. In other words, prediction should not be interpreted in any other way other than their presence.
classification_type = "full"
model_type = "base-en"
device = "cpu"
predictions = dreamy.get_CHAR(
dream_as_list,
classification_type,
model_type,
device=device,
max_new_token=60,
)
predictions
['individual female known adult; group female uncertian adult; individual indefinite uncertian adult;',
'individual female known adult;']
You can also use DReAMy to easily extract, reduce, cluster, and visualise encodings (i.e., vector embeddings) of dream reports, with few and simple lines of codee. At the moment, you can chose betweem four model, that are combination of small/large Engish-ony/multilingual models.
import dreamy
# get some data
n_samples = 10
language = "english" # choose between english/multi
dream_bank = dreamy.get_HF_DreamBank(as_dataframe=True, language=language)
dream_bank = dream_bank.sample(n_samples).reset_index(drop=True)
dream_as_list = dream_sample["dreams"].tolist()
# set up model and get encodings
model_size = "small" # or large
model_lang = "english" # or multi, for multilingual
device = "cpu" # se to "cuda" for GPUs
report_encodings = dreamy.get_encodings(
dream_as_list,
model_size=model_size,
language=model_lang,
device=device,
)
# reduce space
# you can choose between pca/t-sne
X, Y = dreamy.reduce_space(report_encodings, method="pca")
# Update your original dataframe with cohordinates and plot
dream_bank["DR_X"], dream_bank["DR_Y"] = X, Y
Then use your favourite visualisation library to explore the results.
import seaborn as sns
sns.set_context("talk")
sns.set_style("whitegrid")
g = sns.scatterplot(
data=dream_sample,
x="DR_X",
y="DR_Y",
hue="series",
palette="Set2"
)
g.legend(loc='center left', title="DreamBank Series", bbox_to_anchor=(1, 0.5))
Check the starting tutorial for more, like unuspervised K-Mean clustering.
If you wish to contribute, collaborate, or just ask question, feel free to contact Lorenzo, or use the issue section.
If you use DReAMy, please cite the pre-print
@misc{https://doi.org/10.48550/arxiv.2302.14828,
doi = {10.48550/ARXIV.2302.14828},
url = {https://arxiv.org/abs/2302.14828},
author = {Bertolini, Lorenzo and Elce, Valentina and Michalak, Adriana and Bernardi, Giulio and Weeds, Julie},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Automatic Scoring of Dream Reports' Emotional Content with Large Language Models},
publisher = {arXiv},
year = {2023},
copyright = {Creative Commons Attribution 4.0 International}
}