Giter VIP home page Giter VIP logo

gerparcor's Introduction

Paper Conference version

GerParCor

GerParCor

German Parliamentary Corpus (GerParCor)

Abstract

Parliamentary debates represent a large and partly unexploited treasure trove of publicly accessible texts. In the German-speaking area, there is a certain deficit of uniformly accessible and annotated corpora covering all German-speaking parliaments at the national and federal level. To address this gap, we introduce the German Parliament Corpus (GerParCor). GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.

GerParCor is available via http://gerparcor.texttechnologylab.org

# Parliament Sessions From Until Status / Download
1 Reichstag (NG + Zoll) 1990 02/25/1867 05/24/1895 Download
2 Reichstag (Empire) 2183 12/03/1895 10/26/1918 Download
3 Weimar Republic 1328 02/06/1919 12/09/1932 Download
4 ThirdReich 20 03/21/1933 04/24/1942 Download
5 Bundesrat 1008 09/07/1949 10/08/2021 Download
6 Bundestag 4158 09/07/1949 09/07/2021 Download
7 Baden-Würtemberg 412 06/05/1984 09/29/2021 Download
8 Bayern 2221 12/16/1946 10/14/2021 Download
9 Berlin 582 04/02/1989 09/16/2021 Download
10 Brandenburg 442 10/26/1990 08/27/2021 Download
11 Bremen 1102 07/04/1995 09/16/2021 Download
12 Hamburg 586 10/08/1997 11/03/2021 Download
13 Hessen 1297 02/04/1947 09/29/2021 Download
14 Mecklenburg-Vorpommern 659 10/26/1990 06/11/2021 Download
15 Niedersachsen 1109 06/22/1982 09/15/2021 Download
16 Nordrhein-Westfalen 2041 05/21/1947 10/08/2021 Download
17 Rheinland-Pfalz 1562 07/24/1947 09/22.2021 Download
18 Saarland 876 07/23/1959 09/15/2021 Download
19 Sachsen 690 10/27/1990 11/18/2021 Download
20 Sachsen-Anhalt 607 10/28/1990 09/17/2021 Download
21 Schleswig-Holstein 1776 02/26/1946 02/11/2021 Download
22 Thüringen 761 10/25/1990 11/19/2021 Download
23 Liechtenstein 504 03/13/1997 11/06/2021 Download
24 Nationalrat (AT) 4267 10/21/1918 05/17/2021 Download
25 Nationlarat (CH) 368 12/06/1999 12/09/2021 Download

Cite

If you want to use the project or the corpus, please quote this as follows:

G. Abrami, M. Bagci, L. Hammerla, and A. Mehler, “German Parliamentary Corpus (GerParCor),” in Proceedings of the Language Resources and Evaluation Conference, Marseille, France, 2022, pp. 1900-1906. [Link] [PDF]

BibTeX

@InProceedings{Abrami:Bagci:Hammerla:Mehler:2022,
  author         = {Abrami, Giuseppe and Bagci, Mevl\"{u}t and Hammerla, Leon and Mehler, Alexander},
  title          = {German Parliamentary Corpus (GerParCor)},
  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages          = {1900--1906},
  url            = {https://aclanthology.org/2022.lrec-1.202}
}

gerparcor's People

Contributors

mevbagci avatar abrami avatar leonhammerla avatar dependabot[bot] avatar

Forkers

myyyvothrr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.