Giter VIP home page Giter VIP logo

lnu_ma's Introduction

Information about the repository

The repository contains the scripts used in the master's thesis Kus on kodukoht? Analysing the Meaning of Home and Improving OCR Quality in Estonian Exile Newspapers Published in Sweden. This thesis was created as part of the digital humanities master's programme at Linneaus University in the spring semester of 2024. This study uses as sources the largest Estonian exile newspapers published in Sweden between 1944 and 1991: Teataja/Eesti Teataja, Välis-Eesti and Eesti Päevaleht/Stockholms-Tidningen Eestlastele. The scripts have been used to answer the first research question of the thesis, which studies the context in which Estonian exile newspapers talk about home and places in occupied Estonia.

The scripts are divided into two folders by method. The scripts in the text analysis folder are used for text analysis and the scripts in the spatial analysis folder are used for named entity recognition, data cleaning and geocoding. Thesis is primarily an experiment and therefore there was no specific aim to create a coherent workflow. Therefore, there is generally one script per analysis phase. However, this approach makes it easier to find a tool when an user wants to perform only one step of the analysis, for example to identify NEs (Named Entities) with EstNLTK or to filter unique results. The scripts are also written in both Python and R programming languages. This choice is based on the author's previous experience and skills.

The full texts of the exile newspapers have been obtained from the National Library of Estonia, instructions for accessing and using the material can be found in Digilab. The scripts have been developed using material from a previous project and with the help of LLM models GPT-3.5 and GPT-4o in the ChatGPT environment.

The full texts of the newspapers may be used only under the same conditions as indicated in the digital archive of the National Library of Estonia (example). The scripts created are licensed under CC BY 4.0.

lnu_ma's People

Contributors

lauranemvalts avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.