Giter VIP home page Giter VIP logo

hslu-nlp's Introduction

HSLU Project: Computational Language Technologies

Contains the NLP project for the "Detecting greenwashing signals through a comparison of ESG reports and public media" challenge. The analyzed datasets can be downloaded from the following Google drive: https://drive.google.com/file/d/1D1qOuJvQx69afYyz4rwf2C-Nm0D4goKy/view?usp=sharing

Further information to the challenge/project can be found in the document in /docs or the following links:

The project is structured in different stages. For each stage, the development artifacts are stored in the respective folders (e.g. stage1)

Run the analysis / EDA

  • Clone the repository
  • Download the data files to the /data directory
  • Install the requirements
  • Run the preprocessing notebook (stage1/00-data-preprocessing.ipynb)
  • Run the analysis notebook (stage1/01-data-exploration-and-visualization.ipynb)

Stage 1

Contains text pre-processing, cleaning and enrichment (00-data-preprocessing.ipynb). After the pre-processing, an Exploratory Data Analysis was conducted (01-data-exploration-and-visualization.ipynb)

Stage 2

A "gold standard" of 1'000 manually annotated (positive, negative, neutral) sentences was created. Several LLMs were evaluated and tested with different prompting strategies. Afterward one model was selected to annoate/segment all sentences. See 02-data-annotation.ipynb for the full code and explanations. The manually annotated and LLM annotated dataset is stored on Google Drive: https://drive.google.com/drive/folders/1ATtCPPll6n_qvfMBu4ka84E3DZDS7d11

Stage 3

Based on the data annotated in stage 2, a sentiment analysis classifier was trained and evaluated. The classifier outputs scores on a continuous scale between 0 and 1. After training, the classifier was used to predict/classify sentiment of sentences in the full data set. Finally, the average sentiments of internal vs. external were compared to determine whether greenwashing patterns are present or not.

Stage 4

Relevance of specific SDGs to the different companies are determined. A sentence embedding approach was used on the SDG descriptions and the documents related to a company to compare them in terms of similarity. Finally, the results of the sentence embedding analysis were displayed in heatmaps and bar charts to represent the SDG relevance for the different companies.

Stage 5

A project report containing key findings and methodology of stages 1 to 4 was written.

Repository structure

hslu-nlp/
┣ data/
┃ ┣ checkpoints/
┃ ┣ .gitkeep
┃ ┣ esg_documents_for_dax_companies.csv (Download the dataset and place it here)
┃ ┗ sdg_descriptions_with_targetsText.csv (Download the dataset and place it here)
┣ docs/
┃ ┗ SwissText Shared Task 2023 - Greenwashing Detection.pdf
┣ stage1/
┃ ┣ 00-data-preprocessing.ipynb
┃ ┗ 01-data-exploration-and-visualization.ipynb
┣ stage2/
┃ ┣ annotated/
┃ ┣ checkpoints/
┃ ┗02-data-annotation.ipynb
┣ stage3/
┣ stage4/
┣ stage5/
┣ .gitignore
┣ README.md
┗ requirements.txt

Stage 1 & 2 developed and authored by: Tim Giger
Stage 3, 4 & 5 developed and authored by: Arian Contessotto, Levin Reichmuth & Tim Giger

hslu-nlp's People

Contributors

syx113 avatar ariancontessotto avatar elvinelk avatar

Stargazers

 avatar  avatar jist avatar  avatar  avatar  avatar Qu Yanbo avatar  avatar  avatar Ester00 avatar  avatar  avatar

Watchers

 avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.