Giter VIP home page Giter VIP logo

konstantinoskokos / aethel Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 3.0 33.56 MB

Python module for (i) the representation of typelogical grammar derivations and (ii) their extraction from Lassy and Alpino-style dependency graphs.

License: GNU General Public License v3.0

Python 100.00%
lassy type-logical-grammar categorial-grammar dutch linear-logic lambda-calculus alpino corpus natural-deduction proof-net aethel-dataset

aethel's Introduction

æthel

In the beginning was the word, and the word had a Type.

Source code and Python module for the representation of typelogical grammar derivations and their extraction from Lassy-style dependency graphs.

Cool things to look out for:

  • an interface to Alpino outputs -- convert dependency graphs to λ-terms and annoy your informal linguist friends!
  • a faithful implementation of (temporal) modal linear logic proofs and its terms -- almost as good and twice as slow as doing it on Haskell!

Installing & Using æthel

This repository is required to access and play with the æthel dataset, which contains typelogical analyses for the majority of the Lassy Small corpus. Begin by cloning the project locally and unzipping the dump file in data/ (remember to unzip). You can then install the project by running:

python3 -m pip install .

Afterwards, you can load the dump by running:

from aethel import ProofBank

dataset = ProofBank.load_data("PATH_TO_DUMP_FILE")

Note that loading might take a short while, as proofs are reconstructed bottom up and type-checked along the way.

Example usage:

Please refer to examples/ for some hands-on guides on how to use the module.

Major Changelog

Different major versions are not backward compatible. Train/dev/test segmentation is respected to the largest extent possible. If looking for older versions, take a look at other branches of this repository.

1.0.1 (02/2023)

Tidier packaging and installation. Backwards compatible with 1.0.0x.

1.0.0a (06/2022)

Implementation of long-postponed changes to the type system. In practical terms, type assignments are a bit more complicated but all proofs are now sound wrt. to the underlying logic.

  • Adjunction: diamonds and boxes are no longer treated as ad-hoc type constructors, and their adjoint nature is made explicit (i.e. ◇□A⇒A⇒□◇A). The diamond elimination and arrow introduction rules are now explicit to which variable is being abstracted or replaced.
  • Bracketing Structure: brackets are now explicated to regulate the applicability of diamond eliminations and box introductions.
  • Structural Rules: rules are now split between logical and structural. For the purpose of allowing extraction of hypotheses nested within unary structures, we now use an extraction modality reserved for the higher-order types that supply them. The modality is paired to a single structural rule which permits outwards movement: <Χ>!, <Γ, Δ> → <Γ, <Χ>!, Δ>
  • Proofs & Terms: Due to the presence of term-neutral structural rules (and the ambiguity inherent to the implicit cut of the diamond elimination rule) proofs are now defined separately from syntactically valid terms.
  • Multi-word Phrases: Multi-word phrases are heuristically fixed to the extent possible; this permits the extraction of correct proofs when (previously unstructured) constituents are elided under a conjunction. Coverage is slightly increased.

Minor Changelog

1.0.0a5 (01/2023)

  • Explicit objects are hypothesized from within the local phrase.

1.0.0a4 (12/2022)

  • Past participles now have explicit objects.

1.0.0a3 (10/2022)

  • Structural extraction now properly renames variables.

1.0.0a2 (10/2022)

  • Diamond elimination terms now act as variable binders that explicate their position via the case [τ:term] of [x:variable] in [σ:term] term constructor (meaning "replace any occurrence of x in σ with τ").
  • To ease comprehension, the left and right side occurrences of the "same" variable in diamond eliminations are no longer identified, i.e. the substitute and substituted variables have unique names.

1.0.0a1 (07/2022)

  • Proofs are delivered in η-normal form, and variables are assigned unique names.
  • Natural deduction is back-and-forth compatible with proof nets again.
  • A few leftover proofs failing the linearity criterion are removed.

Project Structure

The package source code can be found in directory src.

  • aethel.frontend is the high-level interface that allows user access to the processed corpus.
  • aethel.mill contains an implementation of the grammar's type system, namely Implication-Only Multiplicative Intuitionistic Linear Logic with Unary Temporal Modalities. Each of the following modules provide implementations for specific aspects of the logic:
    • aethel.mill.proofs - classes & methods for the representation and manipulation of judgements, rules and proofs in natural deduction format.
    • aethel.mill.nets - ditto for decomposition formulas, axiom links and proof nets, and conversion to and from natural deduction.
    • aethel.mill.terms - a rudimentary implementation of the term calculus.
    • aethel.mill.types - an implementation of logical formulas.
    • aethel.mill.structures - a handy abstraction over the antecedent structure of logical judgements.
    • aethel.mill.serialization - parsers and serializers for the above.
  • aethel.alpino contains utilities to interface with Alpino-style graphs
    • aethel.alpino.transformations contains the linguistic transformations necessary to make Alpino analyses amenable to proof-theoretic analyses, and packs them into a single pipeline. Warning: Not for the faint of heart.
    • aethel.alpino.extraction implements the extraction algorithm, whereby transformed Alpino graphs are gradually proven in a bottom-up fashion.
    • aethel.alpino.lassy - a wrapper for reading Lassy xml trees.
  • aethel.utils contains general utilities:
    • aethel.utils.graph - definitions of simple graph-theoretic operations.
    • aethel.utils.viz - visualization of Lassy dependency graphs, useful for debugging.
    • aethel.utils.tex - export proofs and samples to latex code.

Subdirectory examples contains (or will contain) introductory jupyter notebooks. Subdirectory scripts contains (or will contain) high-level scripts for processing the corpus and lexicon.


Requirements

Python 3.10+

If you intend to use the visualization utilities you will also need GraphViz.


Citing

If you found this software or data useful in your research, please cite the corresponding paper:

@inproceedings{kogkalidis2020aethel,
  title={{\AE}THEL: Automatically Extracted Typelogical Derivations for Dutch},
  author={Kogkalidis, Konstantinos and Moortgat, Michael and Moot, Richard},
  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
  pages={5257--5266},
  year={2020}
}

aethel's People

Contributors

gijswijnholds avatar konstantinoskokos avatar prednaz avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aethel's Issues

Extraction to-dos

  • check cases where a ppart or an inf dominates a coindexed subject object that is still primary after conditioning
    • infinitive chaining
  • conjunction needs to be dealt with
  • removing modifiers kills off secondary dependencies
  • secondary edge being head disallows recursion
  • keep iterating with previously inferred types to remove conjunctions :)
  • rewrite functions in a comp manner to allow for iterative pass through larger corpus
  • IMPORTANT: find a way to order the words properly post-reduction

Bugs

  • gap-copies
  • non-polymorphic ellipsis

  • conjunction of complex head
  • conjunction of modifier
  • chain modifier mistyping
  • modifier gap lacks naming convention
  • modifier gap
  • modifiers other than 'mod'
  • forgot to update a Rel call
  • conjunction kills off information
  • modifier gaps need extra treatment now (just give parent into the type assigner and let it manage modifiers and NP/AP constructions?)
  • alignment bug in reporting
  • IMPORTANT: collapsing intermediate node assigns wrongs dependencies
  • IMPORTANT: gaps not taken into account when comparing types
  • Duplicate words in the lexicon?
  • => byproduct of converting dicts to list
  • - remove_deps does not keep modality
  • MWUs are not assigned a vector
  • Fix type implementation is wrong

Annotation Errors

A list of possible errors in the original annotation (v.4)

  • discrepancy in app labels
  • 319 (determiner is labeled as a noun?)
  • 372 3 determiner should be vnw
  • 1054 (intra-cnj mod)
  • 1377 (predc / predm over conjuction)
  • 2507 (np conjunct should be s)
  • 2975 (misplaced mod)
  • 29655 (misplaced mod)
  • 29897 (node 50 has a np as its head)
  • 15775 (app instead of invdet to 16)
  • 30778 (intra-cnj mod)
  • 49555 (intra-cnj mod)

to-do new

  • type lexicon
  • word lexicon
  • 484 inv fails
  • dual crd/det treatment
  • mwp pos propagation

repeated applications of lambda vars do not reduce de Bruijn index

((((((w₇ λx₀.λx₁.λx₂.((x₁ (x₁ (w₃ᵈᵉᵗ x₂)ᵐᵉ)ᵖʳᵉᵈᶜ) (w₀ᵈᵉᵗ w₁)ˢᵘ)ᶜⁿʲ) λx₂.λx₃.λx₄.((x₃ (x₃ (w₆ᵈᵉᵗ x₄)ᵐᵉ)ᵖʳᵉᵈᶜ) (w₄ᵈᵉᵗ w₅)ˢᵘ)ᶜⁿʲ) λx₄.λx₅.λx₆.((x₅ (x₅ (w₁₀ᵈᵉᵗ x₆)ᵐᵉ)ᵖʳᵉᵈᶜ) (w₈ᵈᵉᵗ w₉)ˢᵘ)ᶜⁿʲ) λx₆.λx₇.((w₂ x₇ᵖʳᵉᵈᶜ) x₆ˢᵘ)) λx₆.(w₁₂ x₆ᵐᵉ)) w₁₁)
36083 De koningin is 22 - 26 de werkster 12 - 16 en het mannetje 14 - 16 mm lang

((((((w₇ λx₀.λx₁.λx₂.((x₁ (x₁ (w₁₀ᵈᵉᵗ x₂)ᵐᵉ)ᵖʳᵉᵈᶜ) (w₀ᵈᵉᵗ w₁)ˢᵘ)ᶜⁿʲ) λx₂.λx₃.λx₄.((x₃ (x₃ (w₃ᵈᵉᵗ x₄)ᵐᵉ)ᵖʳᵉᵈᶜ) (w₈ᵈᵉᵗ w₉)ˢᵘ)ᶜⁿʲ) λx₄.λx₅.λx₆.((x₅ (x₅ (w₆ᵈᵉᵗ x₆)ᵐᵉ)ᵖʳᵉᵈᶜ) (w₄ᵈᵉᵗ w₅)ˢᵘ)ᶜⁿʲ) λx₆.λx₇.((w₂ x₇ᵖʳᵉᵈᶜ) x₆ˢᵘ)) λx₆.(w₁₂ x₆ᵐᵉ)) w₁₁)
37826 De koningin is 18 - 22 de werkster 9 - 15 en het mannetje 12 - 14 mm lang

WordType

  • improve the printing, assert uniqueness
  • overload equality for colored type
  • extract simple type from colored type
  • refactor use of types in main
  • implement colored type for gaps
  • assert that in the general case the new assignments typecheck

Unique Cases:

  • case of both copy and gap
  • better case management for copy/gap
  • secondary edge heads
  • multiple secondary edges
  • dual primary/secondary edges :: copying

Potential Issue:

  • when removing empty arglists, might need to also move colors around
  • Current implementation may not suffice: is (A,a)->(B,b)->C the same as ((A,a) -> B), b) -> C?

to-do

  • refine modifiers to account for internal hidden structure
  • modifiers over intermediate nodes do not get function types (5)
  • majority voting no longer correct
  • meta-type for polymorphic conjuction (*)
  • filter by argument to make sense of promotion
  • collapsing permutations post-extraction
  • cp
  • mwu lexical entries (more/than etc)
  • vnw: det -> (np: su -> ssub) : body -> whrel (wiens/welk/etc.)
  • inventorize coordination (treat on lexical level)
  • coordination non-polymorphic
  • prioritize between coordinators
  • head/funcwords distinction for determiner
  • search utilities
  • decorate lexicons with file id for backwards checking
  • remove everything unrelated to type extraction
  • lassy large
  • post-extraction viz
  • noun/noun-phrase distinction
  • documentation, gradual typing, overall code sanity
  • modifiers
  • ignore secomdary heads but only in the case of gap
  • Remove intermediate nodes and check for new errors
  • Coordinator type assignment
  • Extend type system to additives (!)
  • Assign multiple types to a word with extended system
  • Refactor the supertagger
    --
  • chart parsing or any sort of proof search

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.