Giter VIP home page Giter VIP logo

psyche's Introduction

What if HAL Breathed? Enhancing Empathy in Human-AI Interactions with Breathing Speech Synthesis

Main topics: Text-to-Speech Synthesis, Natural Language Processing, Affective Computing, Data Engineering in Python.

Welcome to the repository for the study and thesis titled "What if HAL breathed? Enhancing Empathy in Human-AI Interactions with Breathing Speech Synthesis".

A scientific paper is on the way about this study.

Read the full thesis here.

Abstract

Modern Artificial Agents will increasingly leverage AI speech synthesis models to verbally communicate with their users. This study explores the integration of breathing patterns into synthesized speech and their potential to deepen empathy towards said agents, testing the hypothesis that the inclusion of breathing capabilities can significantly enhance the emotional connection in human-AI interaction.

Breathing patterns have not been unequivocally linked to human emotional states, but respiration has been consistently proven to be involved in emotions' appraisal and regulation, and literature suggests that an inestimable expressive potential may lie behind respiratory noises and their rhythm. Despite this, breathing is hardly involved in speech synthesis models, and literature on breathing agents is still limited.

We first perform a thorough evaluation of open-source and commercial Speech Synthesis models to understand the breathing synthesis capabilities of state-of-the-art architectures. We then proceed to assess the influence of breathing on the capacity of the voice to evoke empathy. The research methodologically diverges from traditional empathy studies by proposing to the subjects the resolution of an emotional dilemma within a cooperative game scenario, where they face a choice reflecting their empathic engagement with an AI partner.

The findings indicate that breathing in synthesized speech significantly enhances agents' perceived naturalness and users' empathy towards them. These insights underscore the importance of breathing in speech synthesis for AI design and call for its consideration in future models and interactive Artificial Agents. Ultimately, the study aims to contribute to the development of a more empathetic digital world through enhanced human-AI interaction.

Try it out!

Interested in experiencing the gamified dilemma first-hand? Click here to try it out (you will be assigned to either the breathing or not breathing condition).

To Listen to the synthesized speech in both breathing and not breathing conditions, visit this link.

The comprehensive dataset of the study results is available on Kaggle. Find the Results Dataset here and the results analysis in this Kaggle notebook.

This repository contains materials used for the study and the preprocessing tool. Additional repositories relevant to the study are listed below.

Speech Synthesis Deep Learning Models

  • We analyzed the state-of-the-art speech synthesis models to understand the possibilities of integrating breathing and spontaneous speech patterns. A broad list of open-source and commercial synthesizers that we tested is present at the full thesis link.
  • We attempted to train two open-source models: VITS and Flowtron, with some modifications to their architecture, which can be found at these repos: psyche-vits, psyche-flowtron. However, due to limitations in computational resources, we finally adopted the pre-trained model BARK.
  • We found BARK to be the only model suitable to be applied in our study, highlighting a lack of models that can achieve speech-breathing synthesis.
  • We applied iterative prompt engineering techniques to the BARK model to synthesize spontaneous speech with emotional features and integrated breathing patterns.

Data Processing

  • Various Speech-to-Text services and cloud computing platforms were employed for transcription of audio databases, including IBM Cloud, AssemblyAI, and Google Cloud.
  • A Preprocessing and Breath Labeling tool was developed to process speech databases. This software is part of this repository at the folder Preprocessingi/ but is still undergoing refinement for broader usability.

Gamified Experiment

  • We designed a gamified experience using Unity Engine and C# to evaluate the emotional impact of breathing in synthesized voices for virtual agents. The game-based approach offered a unique angle for this assessment. The gamified experience's code is hosted on PSYCHE-Gamified.

psyche's People

Contributors

nicoloddo avatar

Stargazers

 avatar koloni avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.