davidjurgens / crown Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 8.0 20.38 MB

The Community-enRiched Open WordNet (CROWN)

Shell 0.74% Java 99.26%

crown's People

Contributors

Stargazers

Watchers

Forkers

alvations myedibleenso malkocb anukat2015 susfert semtle codeaudit artleon

crown's Issues

missing lexnames file

Hi! I'm running into some problems using CROWN with a popular WordNet library for python.

I've been trying to use CROWN with the nltk WordNet library, but it seems that all of the available versions of the CROWN dataset I found at http://cs.mcgill.ca/~jurgens/crown/ are incompatible with the "newest" official lexnames file (released with WordNet 3.0).

It is my understanding that I cannot generate a new lexnames file using grind without the 45 lexicographer files that are used to create the database files (those files under dict).

Right now I'm trying to generate this file using your library. My guess is that the change needed to include the generated lexnames file in the release only requires a small change to this line:

crown/src/main/java/ca/mcgill/cs/crown/Grind.java

Line 102 in c6294e6

if (name.startsWith("data.") || name.startsWith("index."))

Is that the case?

I think it would be a good idea to include a new lexnames file in the subsequent CROWN datasets you distribute to better guarantee compatibility with existing libraries.

If you feel the same way, I'd be happy to put in a pull request with the needed change. Anyway, based on what I've read in the NAACL paper, I'm very excited to make use of your library. Thanks!

Create mapping from WordNet 3.0 synset IDs to Crown synset IDs

Feature enhancement to automatically produce this output during the build process to allow for easier integration and transition from WordNet to Crown.

Integrate etymology data into the build process

Wiktionary and possibly other resources contain etymology data that may be useful for downstream resources. This information should be extracted integrated into the build process. One option is to produce stand-off files that map synsets to their source language, if known.

Identify mapping between Wiktionary senses and WordNet synsets

Parts of this mapping are already implicitly found during the enrichment process when each EnrichmentProceedure tries to find an existing sense for a given definition and then (correctly) rejects making a new integration when an existing sense is found. However, the sense-mapping data is never explicitly reported, which makes it difficult to enrich these existing synsets with any other information that may be present in Wiktionary or other resources (e.g., domain links, antonym links, or etymological information).

Integrate ADW as a similarity function

The current GST and NAACL-paper similarity functions are purely surface-form based. It would be good to integrate ADW (https://github.com/pilehvar/ADW) as an option.

davidjurgens / crown Goto Github PK

crown's People

Contributors

Stargazers

Watchers

Forkers

crown's Issues

missing lexnames file

Create mapping from WordNet 3.0 synset IDs to Crown synset IDs

Integrate etymology data into the build process

Identify mapping between Wiktionary senses and WordNet synsets

Integrate ADW as a similarity function

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent