Giter VIP home page Giter VIP logo

agents-genitive's Introduction

Agent simulation for historical linguistics

This program simulates interaction between agents (language speakers) of Old Norse, with probabilities extracted from an Old Norse corpus. Intrusive Middle Low German agents can be added, to simulate language contact

Run agent simulation

To run an agent simulation, issue the following command:

python agents.py --lm_icelandic <ICELANDIC_PICKLE>

Pickle

<ICELANDIC_PICKLE> is the pickle containing Icelandic/Old Norse (the basis language) language model probabilities. Pre-generated pickles are available in the pickles folder, the next section describes how to generate those from a corpus.

You could use pickles/lm-icelandic-merged.p/ as pickle for the language model with most construction and function types. Pickles ending in -order add word order as feature. Pickles ending in -fmerged reduce the number of constructions and functions, -fmerged-dropdetails further reduces the number of categories by not taking into account the details of a construction (preposition form or noun ending).

Settings

Several settings can be changed via command line options, such as the number of iterations and the number of agents. python agents.py --help shows the available options. The standard values for the command line arguments are listed in the source code, at the top of agents.py.

Intruders

To add (Middle Low German) intruders, use the --lm_intruders <INTRUDERS_PICKLE> flag. <INTRUDERS_PICKLE> is the pickle file containing the pickle of the intruders. File names follow the same pattern as the Icelandic (basis language) pickles. The number of intruders, number of intruder batches and the intervals between the batches can be set using command line arguments. Issue --help for the possible options.

Plots

Plots of the distribution p(construction) are generated and stored in the plots/ folder. Plots of the conditional distributions p(construction|function) are stored per function in the plots/<FUNCTION>/ folder.

Generate language model pickles from corpus

It is possible to generate the language model pickles from corpora. The Icelandic language model is generated from two sources: the Saga corpus, for constructions/functions which can be automatically processed, and a qualitative, manually-annotated file which contains manual annotations for constructions/functions from the Saga corpus which could not be automatically detected. The Middle Low German language model only depends on a manually annotated input file.

Follow the following steps:

  • Download the Saga corpus. Extract the contents of the .zip archive in a directory under the Agent simulation working directory, for example Saga
  • Issue the following command, to generate all language model pickles (with coarse- and fine-grained categories):
python counts.py --saga_input_dir Saga --qual_icelandic corpus/20170103-qualitative-icelandic.csv --qual_intruders corpus/20170103-qualitative-mlg.csv

--saga_input_dir should be the place where the Saga corpus was downloaded. An --output_dir can also be specified, this is pickles by default.

Authors

Simulation code written by Peter Dekker (firstname AT firstnamelastname DOT eu), with contributions on corpus extraction by Myrthe Bil. The manual annotations of Icelandic and Middle Low German data were performed by Justin Case.

agents-genitive's People

Contributors

myrthebil avatar mbil avatar

Stargazers

Gloria Ruz (Gloria de Andrade) avatar

Watchers

James Cloos avatar Peter Dekker avatar  avatar

Forkers

anaphory

agents-genitive's Issues

License

Do you have a preferred open source license to publish this under? Would you mind MIT or similar?

Pickles/Data generation code

You mentioned you intend to commit the pickles containing the initial Norse data. Alternatively/additionally, you could also give the code to build them from source, to improve reproduceability.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.