Giter VIP home page Giter VIP logo

notebooks's Introduction

Notebooks of the project Navigating Stories

This repository contains the notebooks of the project Navigating Stories of the University of Twente and the Netherlands eScience Center. The notebooks deal with these topics:

  • Language processing tests:
    • Danish: dannlp_test.ipynb
    • Dutch: stroll_srl_test.ipynb, stroll_twitter.ipynb and liwc.ipynb
    • English: allennlp-tests.ipynb
    • German: ger_nlp_test.ipynb
    • Multilingual: multilingual_dsg.ipynb
  • Data collection:
    • Dutch: coronaindestad.ipynb
    • English: storycenter.ipynb
    • German:
      • stadtbonn.ipynb
      • hspvnrw.ipynb
      • psychologieheute.ipynb
      • stadtfrankfurt.ipynb
      • zusammengegencorona.ipynb

All notebooks are written in Python. Some of them rely on external modules for processing natural language.

notebooks's People

Contributors

eriktks avatar kevinpijpers avatar maltelueken avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

maltelueken

notebooks's Issues

DSG for Danish

Create a working demo for Digital Stiory Grammar for Danish.

Reference: Stefan Bastholm Andrade & Ditte Andersen (2020) Digital story grammar: a quantitative methodology for narrative analysis, International Journal of Social Research Methodology, 23:4, 405-421, DOI: 10.1080/13645579.2020.1723205 abstract

For this task we created the Jupyter notebook dannlp_test.ipynb

Tasks

  • find dependency parser for Danish: we chose DaNLP
  • find a semantic role labeler for Danish: we did not find a usable semantic role labeler for Danish
  • convert dependency parsing output to table
  • convert semantic role labeler output to table
  • aggregate the table outputs to answer the narrative questions who, where, when, what, why and how
  • use verb classes to model narrative topics
  • collect ideas on how to proceed after this

Determine storage format for text files

In what format will we store the data (text) files of the project? Including metadata.

Consider:

  • existing standards
  • compatibility
  • future integration with other systems if possible
  • ease of use

Comparison of different storage formats in this document: https://www.overleaf.com/project/6213b346e75ce35ccc7d811b

Addition by Kevin (bullet point duplicate in CLARIAH story):

  • Ineo is the new front-end for CLARIAH tools (including the Media Suite). It is supposed to be delivered in early 2022. There is discussion on an import function with Ineo in JSON format. This might impact the data standards we want to use if we decide to move toward CLARIAH.

Code style testing for Jupyter notebooks

  • find a workable automatic code style tester (linter) for Jupyter notebook
  • test the software on a few notebooks

We chose the Jupyter linter nbqa. After installation, a Jupyter Python notebook can be tested with the command line instruction nbqa pylint my_notebook.ipynb.

Erik tested the software with the notebook coronaindestad.ipynb and Malte with psychologieheute.ipynb, stadtfrankfurt.ipynb, storycenter.ipynb, and zusammengegencorona.ipynb

The software works well. It generates many warnings, some of which one might not want to deal with. But such warnings can be ignored by adding # pylint: disable=..... to the notebook code, with the five-character id of the warning at the five dots.

DSG for Dutch

Create a working demo for Digital Stiory Grammar for Dutch.

Reference: Stefan Bastholm Andrade & Ditte Andersen (2020) Digital story grammar: a quantitative methodology for narrative analysis, International Journal of Social Research Methodology, 23:4, 405-421, DOI: 10.1080/13645579.2020.1723205 abstract

For this task we created the Jupyter notebook stroll_srl_test.ipynb

Tasks

  • find dependency parser for Dutch: we chose stanza
  • find a semantic role labeler for Dutch: we chose stroll
  • convert dependency parsing output to table
  • convert semantic role labeler output to table
  • aggregate the table outputs to answer the narrative questions who, where, when, what, why and how
  • use verb classes to model narrative topics
  • collect ideas on how to proceed after this

Apply LIWC to Dutch data

Apply LIWC to Dutch data

The relevant notebook for this task is liwc.ipynb

  • apply LIWC to Dutch data
  • visualize results in graphs
  • determine extreme cases based on LIWC analysis
  • create clusters based on LIWC analysis

Collect German data

Story corpora collected so far:

The question is whether this is enough German data for the moment or whether we need more (given that I am not working on the project for long anymore).

Once we have decided on a format to store the data (#15), I will add the code for storing to the notebooks and make another notebook for an overview of the collected data so far.

  • Enough data?
  • #15
  • Add code for data storage

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.