Giter VIP home page Giter VIP logo

simplenlg-nl's Introduction

SimpleNLG-NL

SimpleNLG-NL is a Dutch surface realiser used for Natural Language Generation in Dutch. It is based on version 1.1 of the bilingual SimpelNLG-EnFr. With that basis, it can be used for all three languages: English, French and Dutch.

The original SimpleNLG is a Java library originally developed by Ehud Reiter, Albert Gatt and Dave Westwater, of the University of Aberdeen.

The Dutch version contains multiple lexicons based on Wiktionary data. The largest lexicon has 79.438 entries. The default lexicon is reduced to 8601 words matched with the top 10.000 most common words from a word frequency list. An even smaller lexicon of 3387 entries is also provided.

SimpleNLG-NL was developed as part of the master's thesis of Ruud de Jong. The thesis describing the process can be found at the theses repository of Twente University.

Usage

To use this library, you have three options: cloning this repo, downloading the JAR release file, or import it with Maven using Jitpack. To use Jitpack, add the following repository and dependency to your POM file:

    <repositories>
        <repository>
            <id>jitpack.io</id>
            <url>https://jitpack.io</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>com.github.rfdj</groupId>
            <artifactId>SimpleNLG-NL</artifactId>
            <version>1.1</version>
        </dependency>
    </dependencies>

The API is intentionally kept close to that of SimpleNLG-EnFr, which in turn is based on SimpleNLG.

A basic tutorial can be found in the wiki for SimpleNLG-NL (based on the SimpleNLG wiki).

One noteworthy addition is the DutchFeature.PREVERB feature. Separable Complex Verbs (SCVs) can be split into a preverb and a main verb (e.g. vrijkomen is split into vrij and komen). SimpleNLG-NL tries to detect SCVs, but in case it is unsuccessful, the user can set the feature on the verb or add a pipe in the verb input string, e.g. factory.createVerbPhrase("vrij|komen").

License

SimpleNLG-NL is licensed under the MPL. The Dutch lexicons are based on data from Wiktionary.org, which is licensed under the GNU Free Documentation License and the CC BY-SA 3.0.

simplenlg-nl's People

Contributors

dependabot[bot] avatar rfdj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

simplenlg-nl's Issues

Interrogative Types for Second Person

Describe the bug
When creating an interrogative type question, the verb does not pick the correct form.

To Reproduce
Steps to reproduce the behavior:

Lexicon lexicon = new XMLLexicon();
NLGFactory nlg = new NLGFactory(lexicon);
Realiser realiser = new Realiser();
SPhraseSpec clause = nlg.createClause("you", "think");
PPPhraseSpec aboutJohn = nlg.createPrepositionPhrase("about", "John");
clause.addPostModifier(aboutJohn);
clause.setFeature(Feature.INTERROGATIVE_TYPE,InterrogativeType.WHAT_OBJECT);
String realisation = realiser.realiseSentence(clause);

The result is:

What does you think about John?

Expected behavior
The result should be:

What do you think about John?

Screenshots
If applicable, add screenshots to help explain your problem.

Java version
Tested with JDK 8 and 13.

Additional context
It seems that in the original SimpleNLG there is a test in which it selects the appropriate verb. See the method testWhatObjectInterrogative() at line 814. The same goes for a Dutch sentence, configured like this:

Lexicon lexicon = new simplenlg.lexicon.dutch.XMLLexicon();
NLGFactory nlg = new NLGFactory(lexicon);
Realiser realiser = new Realiser();

SPhraseSpec phrase = nlg.createClause();
NPPhraseSpec subject = nlg.createNounPhrase("Jij");
subject.setFeature(Feature.PRONOMINAL, true);
subject.setFeature(Feature.PERSON, Person.SECOND);
phrase.setSubject(subject);
phrase.setVerb("doen");
phrase.setObject("dat");
phrase.setFeature(Feature.INTERROGATIVE_TYPE, InterrogativeType.WHY);
phrase.setFeature(Feature.TENSE, Tense.PRESENT);
String result = realiser.realiseSentence(phrase);

The result: Waarom doet jij dat?, the expected result is Waarom doe jij dat?

For the What do you think about John? sentence, the contents of the realization:

SimpleNLG NL:
{realisation=null, category=SENTENCE, features={interrogative=true, textComponents=[InflectedWordElement[what:NOUN], [InflectedWordElement[do:VERB]], [InflectedWordElement[you:PRONOUN]], [InflectedWordElement[think:VERB], [[InflectedWordElement[about:PREPOSITION], [InflectedWordElement[John:ANY]]]]]]}}

Debug: link

Original
{realisation=null, category=SENTENCE, features={interrogative=true, textComponents=[InflectedWordElement[what:PRONOUN], InflectedWordElement[do:VERB], InflectedWordElement[you:PRONOUN], [InflectedWordElement[think:VERB], [[InflectedWordElement[about:PREPOSITION], InflectedWordElement[John:ANY]]]]]}}

Debug: link

It could have to do with that What is a pronoun in the one case, and a noun in the other case?

Complements in interrogative types should go to the back of the sentence in Dutch

Describe the bug
When formulating an interrogative type and adding a postmodifier, this will not be added to the end of the sentence

To Reproduce
Steps to reproduce the behavior:

final private static Lexicon lexicon_nl = new simplenlg.lexicon.dutch.XMLLexicon();
final private static NLGFactory factory_nl = new NLGFactory(lexicon_nl);
final private static Realiser realiser_nl = new Realiser();

SPhraseSpec clause3 = factory_nl.createClause();
NPPhraseSpec subject = factory_nl.createNounPhrase("JIJ");
PPPhraseSpec aboutJan = factory_nl.createPrepositionPhrase("over","Jan");
subject.setFeature(Feature.PRONOMINAL, true);
subject.setFeature(Feature.PERSON, Person.SECOND);
clause3.setSubject(subject);
clause3.setVerb("denk");
clause3.addPostModifier(aboutJan);
clause3.setFeature(Feature.INTERROGATIVE_TYPE, InterrogativeType.WHAT_OBJECT);
String output3 = realiser_nl.realiseSentence(clause3);
System.out.println(output3);	

Expected behavior
I added a fix for taking the right morphology for question types for Person.SECOND, so the verb is denkt most likely in other cases, though in the debug output, also the inflection might differ.

The expected output:
Wat denk jij over Jan?

The actual output:
Wat denk over Jan jij?

Java version
JDK 8 and JDK 13

Additional context
Debug output: link

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.