fnielsen / ordia Goto Github PK

View Code? Open in Web Editor NEW

24.0 7.0 13.0 760 KB

Wikidata lexemes presentations

Home Page: https://ordia.toolforge.org

License: Apache License 2.0

Python 32.47% HTML 25.53% JavaScript 1.50% Jupyter Notebook 40.49%

lexical-resource wikidata lexeme

ordia's Introduction

Ordia

Ordia is a Python package for working with lexicographical data on Wikidata. Ordia includes a webservice that can be run locally. A public webservice with Ordia runs from:

https://ordia.toolforge.org/

To start the local webserver go to the ordia directly:

python app.py

Introductory video

Daniel Mietchen, "Demo of Ordia, a Wikidata tool to visualize lexicographic information from Wikidata", YouTube, 18 May 2021.

License

The license for Ordia is Apache License 2.0 except for included third-party packages which included Bootstrap (MIT License), jQuery (MIT License) and dataTables (MIT License).

References

Finn Årup Nielsen, "Ordia: A Web application for Wikidata lexemes", In "The Semantic Web: ESWC 2019 Satellite Events", 141-146, 2019, DOI: 10.1007/978-3-030-32327-1_28.

Finn Årup Nielsen, "Danish in Wikidata lexemes", In "Proceedings of the Tenth Global Wordnet Conference", 33-38, 2019, ISBN 978-83-7493-108-3.

ordia's People

Contributors

Stargazers

Watchers

Forkers

lucaswerkmeister nyurik johnsamuelwrites bodhisattwawiki arthurpsmith daniel-mietchen h0m3brew 62mkv jhsoby bgo-eiu nikkiwd situx dpriskorn

ordia's Issues

Mobile styling

The application does not show well on a mobile phone:

First line break in three lines
dataTables break the header.

property-value subaspect with count for languages

Bubble chart for count of lexemes wrt. Lexical category in language aspect

#defaultView:BubbleChart
# Count of lexemes wrt. lexical category for a language
SELECT
  ?count
  ?lexical_category ?lexical_categoryLabel
WITH {
  SELECT
    (COUNT(?lexeme) AS ?count)
    ?lexical_category 
   WHERE {
    ?lexeme a ontolex:LexicalEntry ;
            dct:language wd:Q9035 ; 
            wikibase:lemma ?lexemeLabel .
    ?lexeme wikibase:lexicalCategory ?lexical_category .
  }
  GROUP BY ?lexical_category
} AS %results
WHERE {
  INCLUDE %results
  OPTIONAL {        
    ?lexical_category rdfs:label ?lexical_categoryLabel .
    FILTER (LANG(?lexical_categoryLabel) = "en")
  }
}
ORDER BY DESC(?count)

Add "Attested in" to the lexeme table

Add "Attested in" to the lexeme table, e.g., https://tools.wmflabs.org/ordia/L47364

Usage example no longer show in lexeme aspect table

Usage example no longer show in lexeme aspect table.

Example: https://tools.wmflabs.org/ordia/L42410

It is due to this commit: 05169c1

Show language statistics in statistics aspect

  UNION
  {
    { SELECT (COUNT(DISTINCT ?language) AS ?count) { [] dct:language ?language . } }
    BIND("Number of different languages" AS ?description)
    BIND("[] dct:language ?language" AS ?query)
  }

Find forms in text

Find forms in text, e.g., text = "formand bøjer sig". Split the words and perform a SPARQL query.

SELECT
  ?word ?form
  ?lexeme ?lexemeLabel
  (GROUP_CONCAT(?featureLabel; separator=" // ") AS ?features)
WHERE {
  VALUES ?word { "formand"@da "bøjer"@da "sig"@da }
  ?form ontolex:representation ?word . 
  OPTIONAL {
    ?form wikibase:grammaticalFeature ?feature . 
    ?feature rdfs:label ?featureLabel .
    FILTER (LANG(?featureLabel) = "en")
  }
  ?lexeme ontolex:lexicalForm ?form .
  ?lexeme wikibase:lemma ?lexemeLabel . 
}
GROUP BY ?word ?form ?lexeme ?lexemeLabel

Tables for form and sense in property page

BubbleChart for language in lexical category pages

#defaultView:BubbleChart
SELECT
  ?count
  ?language ?languageLabel
WHERE {
  {
    SELECT (COUNT(*) AS ?count) ?language {
      wd:Q1084 ^wikibase:lexicalCategory / dct:language ?language .
    }
    GROUP BY ?language
  }
  OPTIONAL {
    ?language rdfs:label ?languageLabel .
    FILTER (LANG(?languageLabel) = 'en')
  }
}
ORDER BY DESC(?count)

Is Wikidata lexeme able to answer narrative questions?

Is Wikidata lexeme able to answer narrative questions? For instance with SPARQL queries

For instance for Fyrtøjet:

Which characters are involved in the story?
Who killed the witch?
Who is the dog carrying

"Wikidata for property" is displayed twice in property-value aspect

"Wikidata for property" is displayed twice in property-value aspect, e.g., https://tools.wmflabs.org/ordia/property/P31/value/Q1520033

Link for value in lexemes panel for property aspect does not go to property-value aspect

Auxiliary verb should display in lexeme table

Reference to "attested in" and "usage example" does not show in lexeme aspect

Examples:

Use wbsearchentities to search

Use wbsearchentities to search as suggested by @lucaswerkmeister at https://twitter.com/LucasWerkmeistr/status/1000084576725807104

Note license/copyright on included images

Support for other languages in Text-to-lexemes

Currently https://tools.wmflabs.org/ordia/text-to-lexemes supports seven languages. Is it possible to include other languages? Or are you supporting the languages based on the number of lexemes in a language. So, the seven already chosen languages are the ones with the highest number of lexemes.

Search does not work

Search does not work and produces "Internal Server Error"

https://twitter.com/fnielsen/status/1123879436888301570

https://phabricator.wikimedia.org/T222347

Add "first attested from" to lexeme table

https://www.wikidata.org/wiki/Property:P6684

Example: https://www.wikidata.org/wiki/Lexeme:L47364

Show compounds

Aspect for property does not work for non-URI values

Aspect for property does not work for non-URI values, eg., https://tools.wmflabs.org/ordia/property/P5187

Statistics over hyphenation parts

Statistics over hyphenation parts.

The problem seems to be to construct a SPARQL query that will split the string.
This does not work:

SELECT * {
  ?lexeme dct:language wd:Q9035 .
  ?lexeme ontolex:lexicalForm ?form .
  ?form wdt:P5279 ?hyphenation . 
  { BIND(REPLACE(?hyphenation, "^([^‧]+)‧.*$", "$1") AS ?hyphenation_parts) }
  UNION
  { BIND(REPLACE(?hyphenation, "^.+‧([^‧]+).*$", "$1") AS ?hyphenation_parts2) }
}

There is a similiar query by Daniel Mietchen.

https://query.wikidata.org/#SELECT%20%28SAMPLE%28DISTINCT%20%3Fx%29%20AS%20%3Fitem%29%20%3Fw%20%28COUNT%28DISTINCT%20%3Fx%29%20AS%20%3Fc%29%20%28STRLEN%28%3Fw%29%20AS%20%3Fl%29%20WHERE%20%7B%0A%20%20%7B%0A%20%20%20%20SELECT%20DISTINCT%20%3Fx%20%3Ftitle%20WHERE%20%7B%0A%20%20%20%20%20%20%3Fx%20schema%3AdateModified%20%3Fdate_modified%20%3B%0A%20%20%20%20%20%20%20%20%20wdt%3AP31%20wd%3AQ13442814%20%3B%0A%20%20%20%20%20%20%20%20%20wdt%3AP1476%20%3Ftitle.%0A%20%20%20%20%20%20BIND%20%28now%28%29%20-%20%3Fdate_modified%20as%20%3Fdate_range%29%0A%20%20%20%20%20%20FILTER%20%28%3Fdate_range%20%3C%2040%29%0A%20%20%20%20%20%20FILTER%28STRLEN%28%3Ftitle%29%20%3E%3D%2010%29%0A%20%20%20%20%7D%0A%20%20%20%20LIMIT%2010000%0A%20%20%7D%0A%20%20FILTER%20NOT%20EXISTS%20%7B%3Fx%20wdt%3AP921%20%3Ftopic%7D%0A%20%20BIND%28LCASE%28%3Ftitle%29%20AS%20%3Fltitle%29%0A%20%20BIND%28REPLACE%28%3Fltitle%2C%20%22%5E.%2A%3F%28%5C%5Cb%5C%5Cw%7B13%2C%7D%5C%5Cb%29.%2A%24%22%2C%20%22%241%22%29%20AS%20%3Fw1%29%0A%20%20BIND%28REPLACE%28STRAFTER%28%3Fltitle%2C%20%3Fw1%29%2C%20%22%5E.%2A%3F%28%5C%5Cb%5C%5Cw%7B13%2C%7D%5C%5Cb%29.%2A%24%22%2C%20%22%241%22%29%20AS%20%3Fw2%29%0A%20%20BIND%28REPLACE%28STRAFTER%28%3Fltitle%2C%20%3Fw2%29%2C%20%22%5E.%2A%3F%28%5C%5Cb%5C%5Cw%7B13%2C%7D%5C%5Cb%29.%2A%24%22%2C%20%22%241%22%29%20AS%20%3Fw3%29%0A%20%20VALUES%20%3Fw_%20%7B%201%202%203%20%7D%0A%20%20BIND%28IF%28%3Fw_%20%3D%201%2C%20%3Fw1%2C%20IF%28%3Fw_%20%3D%202%2C%20%3Fw2%2C%20%3Fw3%29%29%20AS%20%3Fw%29%0A%20%20FILTER%28REGEX%28%3Fw%2C%20%22%5E%5C%5Cw%2B%24%22%29%29%20%23%20since%20%3Fw%20may%20evaluate%20to%20an%20empty%20string%2C%20e.g.%20for%20one-word%20titles%0A%7D%0AGROUP%20BY%20%3Fitem%20%3Fw%0AORDER%20BY%20DESC%28%3Fc%29%20DESC%28%3Fl%29%0ALIMIT%202000

Handle hypernym in sense

Search for items

Search for items rather than just lemmas and forms.

Provide a way to link to results of text-to-lexemes

e.g. as per https://twitter.com/EvoMRI/status/1136312031693328385

List for property could show language


SELECT
  ?lexeme ?lexemeLabel
  ?language ?languageLabel
  ?value ?valueLabel
WHERE {
  ?lexeme wikibase:lemma ?lexemeLabel .
  ?lexeme a ontolex:LexicalEntry .
  ?lexeme wdt:P5323 ?value .
  ?lexeme dct:language ?language
  OPTIONAL {
    ?language rdfs:label ?languageLabel .
    FILTER (LANG(?languageLabel) = 'en')
  }
  OPTIONAL {
    { ?value wikibase:lemma ?valueLabel1 . }
    UNION
    { ?value rdfs:label ?valueLabel2 . FILTER (LANG(?valueLabel2) = 'en') }
    BIND(COALESCE(?valueLabel1, ?valueLabel2) AS ?valueLabel)
  }
}
LIMIT 1000

Compound graph could be extended

 ?lexeme ^wdt:P5238+ | wdt:P5238+ ?lexeme1 , ?lexeme2 .

list aspect, e.g., with weekdays

SELECT
  ?language ?languageLabel 
  ?monday ?mondayLabel
  ?tuesday ?tuesdayLabel
  ?wednesday ?wednesdayLabel
  ?thursday ?thursdayLabel
  ?friday ?fridayLabel
  ?saturday ?saturdayLabel
  ?sunday ?sundayLabel
WHERE {
  VALUES (?monday_concept ?tuesday_concept ?wednesday_concept
          ?thursday_concept ?friday_concept ?saturday_concept ?sunday_concept) {
    (wd:Q105 wd:Q127 wd:Q128 wd:Q129 wd:Q130 wd:Q131 wd:Q132) }
  ?monday ontolex:sense / wdt:P5137 ?monday_concept ; dct:language ?language ; wikibase:lemma ?mondayLabel . MINUS { ?monday wikibase:lexicalCategory wd:Q102786 }
  OPTIONAL {
    ?tuesday ontolex:sense / wdt:P5137 ?tuesday_concept ; dct:language ?language ; wikibase:lemma ?tuesdayLabel . MINUS { ?tuesday wikibase:lexicalCategory wd:Q102786. }
  }
  OPTIONAL {
    ?wednesday ontolex:sense / wdt:P5137 ?wednesday_concept ; dct:language ?language ; wikibase:lemma ?wednesdayLabel . MINUS { ?wednesday wikibase:lexicalCategory wd:Q102786 }
  }
  OPTIONAL {
    ?thursday ontolex:sense / wdt:P5137 ?thursday_concept ; dct:language ?language ; wikibase:lemma ?thursdayLabel . MINUS { ?thursday wikibase:lexicalCategory wd:Q102786 }
  }
  OPTIONAL {
    ?friday ontolex:sense / wdt:P5137 ?friday_concept ; dct:language ?language ; wikibase:lemma ?fridayLabel . MINUS { ?friday wikibase:lexicalCategory wd:Q102786 }
  }
  OPTIONAL {
    ?saturday ontolex:sense / wdt:P5137 ?saturday_concept ; dct:language ?language ; wikibase:lemma ?saturdayLabel . MINUS { ?saturday wikibase:lexicalCategory wd:Q102786 }
  }
  OPTIONAL {
  ?sunday ontolex:sense / wdt:P5137 ?sunday_concept ; dct:language ?language ; wikibase:lemma ?sundayLabel . MINUS { ?sunday wikibase:lexicalCategory wd:Q102786 }
  }
    OPTIONAL { ?language rdfs:label ?languageLabel . FILTER (LANG(?languageLabel) = "en") }
  
  # Exclude British Sign Language
  FILTER (?language != wd:Q33000)
}
ORDER BY (?languageLabel)

Handle q id if typed in search edit

For instance, if user type Q9035 it should redirect to that item.

Graph of grammatical features

SELECT ?child ?childLabel ?parent ?parentLabel 
WITH {
  SELECT ?feature WHERE {
    VALUES ?feature { 
  
    wd:Q1817208
    wd:Q2054517
    wd:Q102047
    wd:Q6581072
    wd:Q51929403
    wd:Q108709
    wd:Q324982
    wd:Q21470140
    wd:Q47088290
    wd:Q51929290
    wd:Q18012653
    wd:Q47088293
    wd:Q501405
    wd:Q4348304
    wd:Q51929049
    wd:Q623734
    wd:Q51927507
    wd:Q110022
    wd:Q54152717
    wd:Q1230649
    wd:Q1233197
    wd:Q202142
    wd:Q53608953
    wd:Q16527322
    wd:Q53999547
    wd:Q131105
    wd:Q24577575
    wd:Q53997851
    wd:Q156986
    wd:Q319822
    wd:Q192613
    wd:Q2105891
    wd:Q192997
    wd:Q324305
    wd:Q2898727
    wd:Q146078
    wd:Q694268
    wd:Q655020
    wd:Q1775461
    wd:Q51929074
    wd:Q15737187
    wd:Q51929218
    wd:Q146786
    wd:Q179230
    wd:Q1923028
    wd:Q953129
    wd:Q148465
    wd:Q47088292
    wd:Q51929154
    wd:Q21714344
    wd:Q52431955
    wd:Q10509119
    wd:Q51929131
    wd:Q5483481
    wd:Q53609593
    wd:Q2888577
    wd:Q24133704
    wd:Q54176537
    wd:Q51929517
    wd:Q604984
    wd:Q2114906
    wd:Q332734
    wd:Q3150154
    wd:Q146233
    wd:Q282031
    wd:Q1450795
    wd:Q474668
    wd:Q110786
    wd:Q14169499
    wd:Q499327
    wd:Q747019
    wd:Q145599
    wd:Q1763348
    wd:Q51929369
    wd:Q1182686
    wd:Q3910936
    wd:Q950170
    wd:Q22716
    wd:Q51927539
    wd:Q838581
    wd:Q442485
    wd:Q47088295
    wd:Q1994301
    wd:Q53998049
    wd:Q394253
    wd:Q11078
    wd:Q576271
    wd:Q857325
    wd:Q185077
    wd:Q27918551
    wd:Q1305037
    wd:Q1562262
    wd:Q281954
    wd:Q53997857
    wd:Q682111
    wd:Q1317831
    wd:Q1392475
    wd:Q51929447
    wd:Q1775415
    }
  }
} AS %features
WITH {
  SELECT DISTINCT (?feature AS ?child) ?parent WHERE {
    INCLUDE %features
    ?feature wdt:P279 ?parent .
  }
} AS %results1
WITH {
  SELECT DISTINCT ?child ?parent WHERE {
    INCLUDE %features
    ?feature wdt:P279 ?child .
    ?child wdt:P279 ?parent .
  }
} AS %results2   
WHERE {
  { INCLUDE %results1 } UNION { INCLUDE %results2 }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Determine most similar words: odd-one-out

SELECT DISTINCT
  ?form1 ?word1 ?form2 ?word2
  (COUNT(DISTINCT ?grammatical_feature) +
   COUNT(DISTINCT ?lexical_category) + 
   COUNT(DISTINCT ?value) + 
   COUNT(DISTINCT ?concept_property) 
   AS ?score)

  (GROUP_CONCAT(DISTINCT STR(?grammatical_feature)) AS ?f1)
  (GROUP_CONCAT(DISTINCT STR(?lexical_category)) AS ?f2)
  (GROUP_CONCAT(DISTINCT STR(?value)) AS ?f3)
     (GROUP_CONCAT(DISTINCT STR(?concept_property)) AS ?f4)

WHERE {
  hint:Query hint:optimizer "None".
  VALUES ?word1 { "sjov"@da "dårlig"@da "vanvittig"@da "papir"@da }
  VALUES ?word2 { "sjov"@da "dårlig"@da "vanvittig"@da "papir"@da }

  ?form1 ontolex:representation ?word1 . 
  ?form2 ontolex:representation ?word2 . 
  ?lexeme1 ontolex:lexicalForm ?form1 .
  ?lexeme2 ontolex:lexicalForm ?form2 .
  
  OPTIONAL {
   ?grammatical_feature ^wikibase:grammaticalFeature ?form1, ?form2 .
  }

  OPTIONAL {
    ?lexical_category ^wikibase:lexicalCategory ?lexeme1, ?lexeme2 .
  }      

  OPTIONAL {
    ?lexeme1 ?lexeme_property ?value . 
    ?lexeme2 ?lexeme_property ?value .
    [] ?ref ?lexeme_property ; rdf:type wikibase:Property .
  }
  
  OPTIONAL {
   ?lexeme1 ontolex:sense / wdt:P5137 ?concept1 .
   ?lexeme2 ontolex:sense / wdt:P5137 ?concept2 .
   ?concept1 ?concept_property ?concept_value . 
   ?concept2 ?concept_property ?concept_value .
   ?concept_property a owl:ObjectProperty .
  }      

  FILTER (STR(?word1) != STR(?word2))
}
GROUP BY
  ?form1 ?word1 ?form2 ?word2
ORDER BY DESC(?score)

List most missing concept for language

Conjugation class value is not link in lexeme aspect

Example: https://tools.wmflabs.org/ordia/L36288

Word tokenization may not work in Bengali.

Word tokenization may not work in Bengali, see #48

Add admin page with update

Imagegrid for senses of lexemes in reference aspect

#defaultView:ImageGrid
SELECT 
  ?image
  ?lexeme ?lexemeLabel
  ?use
WHERE {
  ?lexeme p:P5831 ?use_statement . 
  ?lexeme wikibase:lemma ?lexemeLabel .
  ?use_statement ps:P5831 ?use .
  ?use_statement pq:P6072 / wdt:P5137? / wdt:P18 ?image .
  ?use_statement prov:wasDerivedFrom / pr:P248 wd:Q1167862 .
}

Grammar checking in SPARQL?

With a simple nominal phrase: "en lille mand":

SELECT (COUNT(*) AS ?count) ?o ?oLabel WHERE {
  VALUES ?word {
    wd:L2022-F1  # en
    wd:L34834-F1  # lille
    wd:L34797-F1  # mand
  }
  ?word wikibase:grammaticalFeature ?o .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?o ?oLabel

Singular has three counts, i.e., as many as the words
Indefinite only two, but there is no definite.

Support for option to highlight/create non-existing lexemes on Text-to-lexemes

When I add a text to https://tools.wmflabs.org/ordia/text-to-lexemes, I see the list of all the words in the text. It will be useful to separate these words into two categories: words that have associated lexemes on Wikidata and words without them. For those with no associated lexemes, it will be great to either link it to https://www.wikidata.org/wiki/Special:NewLexeme or https://tools.wmflabs.org/lexeme-forms/ (or any other tool).

No feedback from Text to lexemes in case of timeout from Wikidata Query Service

Text to lexemes does not give a good feedback if there is timeout from the Wikidata Query Service.

Handle "attested in" in reference aspect

Handle "attested in" in reference aspect, e.g., https://tools.wmflabs.org/ordia/reference/Q761660 should show, e.g., "klimatosse".

Service for various language mining

Service for various language mining, e.g., persons with a surname that matches the past participle form of a Danish verb:

SELECT 
  (COUNT(?person) AS ?count)
  ?lexeme ?lemma ?surname
  (SAMPLE(?person) AS ?example_person)
  (GROUP_CONCAT(?person_labels; separator=", ") AS ?names)
WHERE {
  hint:Query hint:optimizer "None".     

  ?lexeme dct:language wd:Q9035 .
  ?lexeme ontolex:lexicalForm ?form .
  ?form wikibase:grammaticalFeature wd:Q52434448 .
  ?lexeme wikibase:lemma ?lemma .
  ?form ontolex:representation ?word .
  BIND(STRLANG(CONCAT(UCASE(SUBSTR(STR(?word), 1, 1)), SUBSTR(STR(?word), 2)), "en") AS ?surname)
  ?surname_item rdfs:label ?surname .
  ?person wdt:P734 ?surname_item  .
  ?person rdfs:label ?person_labels . FILTER(LANG(?person_labels) = "en")
}
GROUP BY ?lexeme ?lemma ?surname
ORDER BY DESC(?count) 
LIMIT 100

Handle property/<property>/value/<lexeme>

Handle property//value/ that may be used for auxiliary verb property.

Count for property values

SELECT
  ?count
  ?value ?valueLabel
  ?example_lexeme ?example_lexemeLabel
WITH {
  SELECT
    (COUNT(?lexeme) AS ?count)
    ?value
    (SAMPLE(?lexeme) AS ?example_lexeme)
  WHERE {
    ?lexeme a ontolex:LexicalEntry .
    ?lexeme wdt:P31 ?value . 
  }
  GROUP BY ?value
} AS %counts
WHERE {
  INCLUDE %counts
  OPTIONAL {
    { ?value wikibase:lemma ?valueLabel1 . }
    UNION
    { ?value rdfs:label ?valueLabel2 . FILTER (LANG(?valueLabel2) = 'en') }
    BIND(COALESCE(?valueLabel1, ?valueLabel2) AS ?valueLabel)
  }
  ?example_lexeme wikibase:lemma ?example_lexemeLabel
}
ORDER BY DESC(?count)
LIMIT 1000

SELECT * WHERE {
  ?lexeme ontolex:sense ?sense .
  ?lexeme dct:language wd:Q9035 .
  ?sense wdt:P18 ?image .
}

http://tinyurl.com/ybh6tde7

Concept link overlap matrix

Concept link overlap matrix, e.g., by a tree map:

#defaultView:TreeMap
SELECT 
  # ?language1
  ?language1Label
  # ?language2
  ?language2Label
  (COUNT(?concept) AS ?count) 
{
  ?concept ^wdt:P5137 / ^ontolex:sense ?lexeme1, ?lexeme2 .
  ?lexeme1 dct:language ?language1 .
  ?lexeme2 dct:language ?language2 .
  ?language1 wdt:P218 ?language1Label .
  ?language2 wdt:P218 ?language2Label .
  # FILTER (?language1 != ?language2)
}
GROUP BY ?language1 ?language1Label ?language2 ?language2Label
HAVING (COUNT(?concept) > 10)

https://w.wiki/3od