Giter VIP home page Giter VIP logo

ordia's Introduction

Ordia

Ordia is a Python package for working with lexicographical data on Wikidata. Ordia includes a webservice that can be run locally. A public webservice with Ordia runs from:

https://ordia.toolforge.org/

To start the local webserver go to the ordia directly:

python app.py

Introductory video

Daniel Mietchen, "Demo of Ordia, a Wikidata tool to visualize lexicographic information from Wikidata", YouTube, 18 May 2021.

License

The license for Ordia is Apache License 2.0 except for included third-party packages which included Bootstrap (MIT License), jQuery (MIT License) and dataTables (MIT License).

References

Finn Årup Nielsen, "Ordia: A Web application for Wikidata lexemes", In "The Semantic Web: ESWC 2019 Satellite Events", 141-146, 2019, DOI: 10.1007/978-3-030-32327-1_28.

Finn Årup Nielsen, "Danish in Wikidata lexemes", In "Proceedings of the Tenth Global Wordnet Conference", 33-38, 2019, ISBN 978-83-7493-108-3.

ordia's People

Contributors

62mkv avatar arthurpsmith avatar bgo-eiu avatar bodhisattwawiki avatar daniel-mietchen avatar fnielsen avatar jhsoby avatar johnsamuelwrites avatar lucaswerkmeister avatar nikkiwd avatar nyurik avatar salgo60 avatar snyk-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ordia's Issues

Mobile styling

The application does not show well on a mobile phone:

  • First line break in three lines
  • dataTables break the header.

Bubble chart for count of lexemes wrt. Lexical category in language aspect

#defaultView:BubbleChart
# Count of lexemes wrt. lexical category for a language
SELECT
  ?count
  ?lexical_category ?lexical_categoryLabel
WITH {
  SELECT
    (COUNT(?lexeme) AS ?count)
    ?lexical_category 
   WHERE {
    ?lexeme a ontolex:LexicalEntry ;
            dct:language wd:Q9035 ; 
            wikibase:lemma ?lexemeLabel .
    ?lexeme wikibase:lexicalCategory ?lexical_category .
  }
  GROUP BY ?lexical_category
} AS %results
WHERE {
  INCLUDE %results
  OPTIONAL {        
    ?lexical_category rdfs:label ?lexical_categoryLabel .
    FILTER (LANG(?lexical_categoryLabel) = "en")
  }
}
ORDER BY DESC(?count)  

Show language statistics in statistics aspect

  UNION
  {
    { SELECT (COUNT(DISTINCT ?language) AS ?count) { [] dct:language ?language . } }
    BIND("Number of different languages" AS ?description)
    BIND("[] dct:language ?language" AS ?query)
  }

Find forms in text

Find forms in text, e.g., text = "formand bøjer sig". Split the words and perform a SPARQL query.

SELECT
  ?word ?form
  ?lexeme ?lexemeLabel
  (GROUP_CONCAT(?featureLabel; separator=" // ") AS ?features)
WHERE {
  VALUES ?word { "formand"@da "bøjer"@da "sig"@da }
  ?form ontolex:representation ?word . 
  OPTIONAL {
    ?form wikibase:grammaticalFeature ?feature . 
    ?feature rdfs:label ?featureLabel .
    FILTER (LANG(?featureLabel) = "en")
  }
  ?lexeme ontolex:lexicalForm ?form .
  ?lexeme wikibase:lemma ?lexemeLabel . 
}
GROUP BY ?word ?form ?lexeme ?lexemeLabel

BubbleChart for language in lexical category pages

#defaultView:BubbleChart
SELECT
  ?count
  ?language ?languageLabel
WHERE {
  {
    SELECT (COUNT(*) AS ?count) ?language {
      wd:Q1084 ^wikibase:lexicalCategory / dct:language ?language .
    }
    GROUP BY ?language
  }
  OPTIONAL {
    ?language rdfs:label ?languageLabel .
    FILTER (LANG(?languageLabel) = 'en')
  }
}
ORDER BY DESC(?count)

Statistics over hyphenation parts

Statistics over hyphenation parts.

The problem seems to be to construct a SPARQL query that will split the string.
This does not work:

SELECT * {
  ?lexeme dct:language wd:Q9035 .
  ?lexeme ontolex:lexicalForm ?form .
  ?form wdt:P5279 ?hyphenation . 
  { BIND(REPLACE(?hyphenation, "^([^‧]+)‧.*$", "$1") AS ?hyphenation_parts) }
  UNION
  { BIND(REPLACE(?hyphenation, "^.+‧([^‧]+).*$", "$1") AS ?hyphenation_parts2) }
}

There is a similiar query by Daniel Mietchen.

https://query.wikidata.org/#SELECT%20%28SAMPLE%28DISTINCT%20%3Fx%29%20AS%20%3Fitem%29%20%3Fw%20%28COUNT%28DISTINCT%20%3Fx%29%20AS%20%3Fc%29%20%28STRLEN%28%3Fw%29%20AS%20%3Fl%29%20WHERE%20%7B%0A%20%20%7B%0A%20%20%20%20SELECT%20DISTINCT%20%3Fx%20%3Ftitle%20WHERE%20%7B%0A%20%20%20%20%20%20%3Fx%20schema%3AdateModified%20%3Fdate_modified%20%3B%0A%20%20%20%20%20%20%20%20%20wdt%3AP31%20wd%3AQ13442814%20%3B%0A%20%20%20%20%20%20%20%20%20wdt%3AP1476%20%3Ftitle.%0A%20%20%20%20%20%20BIND%20%28now%28%29%20-%20%3Fdate_modified%20as%20%3Fdate_range%29%0A%20%20%20%20%20%20FILTER%20%28%3Fdate_range%20%3C%2040%29%0A%20%20%20%20%20%20FILTER%28STRLEN%28%3Ftitle%29%20%3E%3D%2010%29%0A%20%20%20%20%7D%0A%20%20%20%20LIMIT%2010000%0A%20%20%7D%0A%20%20FILTER%20NOT%20EXISTS%20%7B%3Fx%20wdt%3AP921%20%3Ftopic%7D%0A%20%20BIND%28LCASE%28%3Ftitle%29%20AS%20%3Fltitle%29%0A%20%20BIND%28REPLACE%28%3Fltitle%2C%20%22%5E.%2A%3F%28%5C%5Cb%5C%5Cw%7B13%2C%7D%5C%5Cb%29.%2A%24%22%2C%20%22%241%22%29%20AS%20%3Fw1%29%0A%20%20BIND%28REPLACE%28STRAFTER%28%3Fltitle%2C%20%3Fw1%29%2C%20%22%5E.%2A%3F%28%5C%5Cb%5C%5Cw%7B13%2C%7D%5C%5Cb%29.%2A%24%22%2C%20%22%241%22%29%20AS%20%3Fw2%29%0A%20%20BIND%28REPLACE%28STRAFTER%28%3Fltitle%2C%20%3Fw2%29%2C%20%22%5E.%2A%3F%28%5C%5Cb%5C%5Cw%7B13%2C%7D%5C%5Cb%29.%2A%24%22%2C%20%22%241%22%29%20AS%20%3Fw3%29%0A%20%20VALUES%20%3Fw_%20%7B%201%202%203%20%7D%0A%20%20BIND%28IF%28%3Fw_%20%3D%201%2C%20%3Fw1%2C%20IF%28%3Fw_%20%3D%202%2C%20%3Fw2%2C%20%3Fw3%29%29%20AS%20%3Fw%29%0A%20%20FILTER%28REGEX%28%3Fw%2C%20%22%5E%5C%5Cw%2B%24%22%29%29%20%23%20since%20%3Fw%20may%20evaluate%20to%20an%20empty%20string%2C%20e.g.%20for%20one-word%20titles%0A%7D%0AGROUP%20BY%20%3Fitem%20%3Fw%0AORDER%20BY%20DESC%28%3Fc%29%20DESC%28%3Fl%29%0ALIMIT%202000

List for property could show language


SELECT
  ?lexeme ?lexemeLabel
  ?language ?languageLabel
  ?value ?valueLabel
WHERE {
  ?lexeme wikibase:lemma ?lexemeLabel .
  ?lexeme a ontolex:LexicalEntry .
  ?lexeme wdt:P5323 ?value .
  ?lexeme dct:language ?language
  OPTIONAL {
    ?language rdfs:label ?languageLabel .
    FILTER (LANG(?languageLabel) = 'en')
  }
  OPTIONAL {
    { ?value wikibase:lemma ?valueLabel1 . }
    UNION
    { ?value rdfs:label ?valueLabel2 . FILTER (LANG(?valueLabel2) = 'en') }
    BIND(COALESCE(?valueLabel1, ?valueLabel2) AS ?valueLabel)
  }
}
LIMIT 1000

list aspect, e.g., with weekdays

SELECT
  ?language ?languageLabel 
  ?monday ?mondayLabel
  ?tuesday ?tuesdayLabel
  ?wednesday ?wednesdayLabel
  ?thursday ?thursdayLabel
  ?friday ?fridayLabel
  ?saturday ?saturdayLabel
  ?sunday ?sundayLabel
WHERE {
  VALUES (?monday_concept ?tuesday_concept ?wednesday_concept
          ?thursday_concept ?friday_concept ?saturday_concept ?sunday_concept) {
    (wd:Q105 wd:Q127 wd:Q128 wd:Q129 wd:Q130 wd:Q131 wd:Q132) }
  ?monday ontolex:sense / wdt:P5137 ?monday_concept ; dct:language ?language ; wikibase:lemma ?mondayLabel . MINUS { ?monday wikibase:lexicalCategory wd:Q102786 }
  OPTIONAL {
    ?tuesday ontolex:sense / wdt:P5137 ?tuesday_concept ; dct:language ?language ; wikibase:lemma ?tuesdayLabel . MINUS { ?tuesday wikibase:lexicalCategory wd:Q102786. }
  }
  OPTIONAL {
    ?wednesday ontolex:sense / wdt:P5137 ?wednesday_concept ; dct:language ?language ; wikibase:lemma ?wednesdayLabel . MINUS { ?wednesday wikibase:lexicalCategory wd:Q102786 }
  }
  OPTIONAL {
    ?thursday ontolex:sense / wdt:P5137 ?thursday_concept ; dct:language ?language ; wikibase:lemma ?thursdayLabel . MINUS { ?thursday wikibase:lexicalCategory wd:Q102786 }
  }
  OPTIONAL {
    ?friday ontolex:sense / wdt:P5137 ?friday_concept ; dct:language ?language ; wikibase:lemma ?fridayLabel . MINUS { ?friday wikibase:lexicalCategory wd:Q102786 }
  }
  OPTIONAL {
    ?saturday ontolex:sense / wdt:P5137 ?saturday_concept ; dct:language ?language ; wikibase:lemma ?saturdayLabel . MINUS { ?saturday wikibase:lexicalCategory wd:Q102786 }
  }
  OPTIONAL {
  ?sunday ontolex:sense / wdt:P5137 ?sunday_concept ; dct:language ?language ; wikibase:lemma ?sundayLabel . MINUS { ?sunday wikibase:lexicalCategory wd:Q102786 }
  }
    OPTIONAL { ?language rdfs:label ?languageLabel . FILTER (LANG(?languageLabel) = "en") }
  
  # Exclude British Sign Language
  FILTER (?language != wd:Q33000)
}
ORDER BY (?languageLabel)

Graph of grammatical features

SELECT ?child ?childLabel ?parent ?parentLabel 
WITH {
  SELECT ?feature WHERE {
    VALUES ?feature { 
  
    wd:Q1817208
    wd:Q2054517
    wd:Q102047
    wd:Q6581072
    wd:Q51929403
    wd:Q108709
    wd:Q324982
    wd:Q21470140
    wd:Q47088290
    wd:Q51929290
    wd:Q18012653
    wd:Q47088293
    wd:Q501405
    wd:Q4348304
    wd:Q51929049
    wd:Q623734
    wd:Q51927507
    wd:Q110022
    wd:Q54152717
    wd:Q1230649
    wd:Q1233197
    wd:Q202142
    wd:Q53608953
    wd:Q16527322
    wd:Q53999547
    wd:Q131105
    wd:Q24577575
    wd:Q53997851
    wd:Q156986
    wd:Q319822
    wd:Q192613
    wd:Q2105891
    wd:Q192997
    wd:Q324305
    wd:Q2898727
    wd:Q146078
    wd:Q694268
    wd:Q655020
    wd:Q1775461
    wd:Q51929074
    wd:Q15737187
    wd:Q51929218
    wd:Q146786
    wd:Q179230
    wd:Q1923028
    wd:Q953129
    wd:Q148465
    wd:Q47088292
    wd:Q51929154
    wd:Q21714344
    wd:Q52431955
    wd:Q10509119
    wd:Q51929131
    wd:Q5483481
    wd:Q53609593
    wd:Q2888577
    wd:Q24133704
    wd:Q54176537
    wd:Q51929517
    wd:Q604984
    wd:Q2114906
    wd:Q332734
    wd:Q3150154
    wd:Q146233
    wd:Q282031
    wd:Q1450795
    wd:Q474668
    wd:Q110786
    wd:Q14169499
    wd:Q499327
    wd:Q747019
    wd:Q145599
    wd:Q1763348
    wd:Q51929369
    wd:Q1182686
    wd:Q3910936
    wd:Q950170
    wd:Q22716
    wd:Q51927539
    wd:Q838581
    wd:Q442485
    wd:Q47088295
    wd:Q1994301
    wd:Q53998049
    wd:Q394253
    wd:Q11078
    wd:Q576271
    wd:Q857325
    wd:Q185077
    wd:Q27918551
    wd:Q1305037
    wd:Q1562262
    wd:Q281954
    wd:Q53997857
    wd:Q682111
    wd:Q1317831
    wd:Q1392475
    wd:Q51929447
    wd:Q1775415
    }
  }
} AS %features
WITH {
  SELECT DISTINCT (?feature AS ?child) ?parent WHERE {
    INCLUDE %features
    ?feature wdt:P279 ?parent .
  }
} AS %results1
WITH {
  SELECT DISTINCT ?child ?parent WHERE {
    INCLUDE %features
    ?feature wdt:P279 ?child .
    ?child wdt:P279 ?parent .
  }
} AS %results2   
WHERE {
  { INCLUDE %results1 } UNION { INCLUDE %results2 }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Determine most similar words: odd-one-out

SELECT DISTINCT
  ?form1 ?word1 ?form2 ?word2
  (COUNT(DISTINCT ?grammatical_feature) +
   COUNT(DISTINCT ?lexical_category) + 
   COUNT(DISTINCT ?value) + 
   COUNT(DISTINCT ?concept_property) 
   AS ?score)

  (GROUP_CONCAT(DISTINCT STR(?grammatical_feature)) AS ?f1)
  (GROUP_CONCAT(DISTINCT STR(?lexical_category)) AS ?f2)
  (GROUP_CONCAT(DISTINCT STR(?value)) AS ?f3)
     (GROUP_CONCAT(DISTINCT STR(?concept_property)) AS ?f4)

WHERE {
  hint:Query hint:optimizer "None".
  VALUES ?word1 { "sjov"@da "dårlig"@da "vanvittig"@da "papir"@da }
  VALUES ?word2 { "sjov"@da "dårlig"@da "vanvittig"@da "papir"@da }

  ?form1 ontolex:representation ?word1 . 
  ?form2 ontolex:representation ?word2 . 
  ?lexeme1 ontolex:lexicalForm ?form1 .
  ?lexeme2 ontolex:lexicalForm ?form2 .
  
  OPTIONAL {
   ?grammatical_feature ^wikibase:grammaticalFeature ?form1, ?form2 .
  }

  OPTIONAL {
    ?lexical_category ^wikibase:lexicalCategory ?lexeme1, ?lexeme2 .
  }      

  OPTIONAL {
    ?lexeme1 ?lexeme_property ?value . 
    ?lexeme2 ?lexeme_property ?value .
    [] ?ref ?lexeme_property ; rdf:type wikibase:Property .
  }
  
  OPTIONAL {
   ?lexeme1 ontolex:sense / wdt:P5137 ?concept1 .
   ?lexeme2 ontolex:sense / wdt:P5137 ?concept2 .
   ?concept1 ?concept_property ?concept_value . 
   ?concept2 ?concept_property ?concept_value .
   ?concept_property a owl:ObjectProperty .
  }      

  FILTER (STR(?word1) != STR(?word2))
}
GROUP BY
  ?form1 ?word1 ?form2 ?word2
ORDER BY DESC(?score)

Imagegrid for senses of lexemes in reference aspect

#defaultView:ImageGrid
SELECT 
  ?image
  ?lexeme ?lexemeLabel
  ?use
WHERE {
  ?lexeme p:P5831 ?use_statement . 
  ?lexeme wikibase:lemma ?lexemeLabel .
  ?use_statement ps:P5831 ?use .
  ?use_statement pq:P6072 / wdt:P5137? / wdt:P18 ?image .
  ?use_statement prov:wasDerivedFrom / pr:P248 wd:Q1167862 .
}

Grammar checking in SPARQL?

Grammar checking in SPARQL?

With a simple nominal phrase: "en lille mand":

SELECT (COUNT(*) AS ?count) ?o ?oLabel WHERE {
  VALUES ?word {
    wd:L2022-F1  # en
    wd:L34834-F1  # lille
    wd:L34797-F1  # mand
  }
  ?word wikibase:grammaticalFeature ?o .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?o ?oLabel            
  • Singular has three counts, i.e., as many as the words
  • Indefinite only two, but there is no definite.

Support for option to highlight/create non-existing lexemes on Text-to-lexemes

When I add a text to https://tools.wmflabs.org/ordia/text-to-lexemes, I see the list of all the words in the text. It will be useful to separate these words into two categories: words that have associated lexemes on Wikidata and words without them. For those with no associated lexemes, it will be great to either link it to https://www.wikidata.org/wiki/Special:NewLexeme or https://tools.wmflabs.org/lexeme-forms/ (or any other tool).

Service for various language mining

Service for various language mining, e.g., persons with a surname that matches the past participle form of a Danish verb:

SELECT 
  (COUNT(?person) AS ?count)
  ?lexeme ?lemma ?surname
  (SAMPLE(?person) AS ?example_person)
  (GROUP_CONCAT(?person_labels; separator=", ") AS ?names)
WHERE {
  hint:Query hint:optimizer "None".     

  ?lexeme dct:language wd:Q9035 .
  ?lexeme ontolex:lexicalForm ?form .
  ?form wikibase:grammaticalFeature wd:Q52434448 .
  ?lexeme wikibase:lemma ?lemma .
  ?form ontolex:representation ?word .
  BIND(STRLANG(CONCAT(UCASE(SUBSTR(STR(?word), 1, 1)), SUBSTR(STR(?word), 2)), "en") AS ?surname)
  ?surname_item rdfs:label ?surname .
  ?person wdt:P734 ?surname_item  .
  ?person rdfs:label ?person_labels . FILTER(LANG(?person_labels) = "en")
}
GROUP BY ?lexeme ?lemma ?surname
ORDER BY DESC(?count) 
LIMIT 100

Count for property values

SELECT
  ?count
  ?value ?valueLabel
  ?example_lexeme ?example_lexemeLabel
WITH {
  SELECT
    (COUNT(?lexeme) AS ?count)
    ?value
    (SAMPLE(?lexeme) AS ?example_lexeme)
  WHERE {
    ?lexeme a ontolex:LexicalEntry .
    ?lexeme wdt:P31 ?value . 
  }
  GROUP BY ?value
} AS %counts
WHERE {
  INCLUDE %counts
  OPTIONAL {
    { ?value wikibase:lemma ?valueLabel1 . }
    UNION
    { ?value rdfs:label ?valueLabel2 . FILTER (LANG(?valueLabel2) = 'en') }
    BIND(COALESCE(?valueLabel1, ?valueLabel2) AS ?valueLabel)
  }
  ?example_lexeme wikibase:lemma ?example_lexemeLabel
}
ORDER BY DESC(?count)
LIMIT 1000

Highest number game

Game with two buttons: Press the button with the highest number. The buttons have the text, e.g., "vingt" (French 20) and 39, i.e., the number written with letters in a specific language and the number written with numbers.

Concept link overlap matrix

Concept link overlap matrix, e.g., by a tree map:

#defaultView:TreeMap
SELECT 
  # ?language1
  ?language1Label
  # ?language2
  ?language2Label
  (COUNT(?concept) AS ?count) 
{
  ?concept ^wdt:P5137 / ^ontolex:sense ?lexeme1, ?lexeme2 .
  ?lexeme1 dct:language ?language1 .
  ?lexeme2 dct:language ?language2 .
  ?language1 wdt:P218 ?language1Label .
  ?language2 wdt:P218 ?language2Label .
  # FILTER (?language1 != ?language2)
}
GROUP BY ?language1 ?language1Label ?language2 ?language2Label
HAVING (COUNT(?concept) > 10)

https://w.wiki/3od

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.