clics / clics1 Goto Github PK

Database of Cross-Linguistic Colexifications

Python 3.99% TeX 7.44% JavaScript 30.20% CSS 9.68% PHP 5.63% Shell 0.04% HTML 28.31% Hack 13.33% SCSS 1.38%

clics1's Introduction

CLICS - The Database of Cross-Linguistic Colexifications

The original Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns. It has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations, ranging from studies on semantic change, patterns of conceptualization, and linguistic paleontology. But CLICS has also been criticized for obvious shortcomings. Building on standardization efforts reflected in the CLDF initiative and novel approaches for fast, efficient, and reliable data aggregation, CLICS² expanded the original CLICS database. CLICS³ - the third installment of CLICS - exploits the framework pioneered in CLICS² to more than double the amount of data aggregated in the database.

Publications

CLICS Release	Authors	Title	Reference
CLICS	List, Terhalle, and Urban	Using network approaches to enhance the analysis of cross-linguistic polysemies	List2013a
CLICS	Mayer, List, Terhalle, and Urban	An Interactive Visualization of Crosslinguistic Colexification Patterns	Mayer2014
CLICS²	List, Greenhill, Anderson, Mayer, Tresoldi, and Forkel	CLICS². An improved database of cross-linguistic colexifications assembling lexical data with help of cross-linguistic data formats	List2018e
CLICS³	Rzymski, Tresoldi, et al.	The Database of Cross-Linguistic Colexifications, reproducible analysis of cross- linguistic polysemies	PREPRINT

Datasets and Software

Datasets providing lexical data aggregated in CLICS and software tooling the CLICS processing workflow is accesible and archived on Zenodo via the CLICS community.

Web application

Since CLICS², the latest release of the CLICS database and colexification network can be explored in a clld application at clics.clld.org.

Contributors

Find information about contributors and grants on CONTRIBUTORS.md.

clics1's People

Contributors

Stargazers

Watchers

Forkers

so2jia

clics1's Issues

Link to Mandarin-Chinese [cmn] vocabulary in WOLD leads nowhere

Wayampi [oym] dashes

Here there appears also an issue with unjustified deletion of morpheme breaks indicated by dashes, thus we get colexification of aa 'day' with a-a 'go'. Unless there is a need to delete them for purposes of automatic processing it seems to me that dashes should generally be retained, although a question is what to do with cases like -lɛ-kɨʔɨ 'older brother' (presumably needs a possessive prefix to be complete)...

GML file is invalid

The CLICS gml file, http://clics.lingpy.org/data/CLICS_gml.zip, is invalid GML and therefore cannot be read by networkx, because the GML standard only allows [A-Za-z][0-9A-Za-z]* as keys, not _ as used in body_part (eg. https://github.com/clics/clics/blob/master/gml.py#L71)

Check Venetian

According to our data, both 'straight' and 'hard' in Venetian [vec] is a. This is a Romance language, I can't imagine that this is right. Could you please check, I don't understand how I can access the LOGOS data.

Problems with links due to the movement of WOLD to clld-domain

We still have the old links for WOLD in the data, so they don't work, since they link to "livingsources". however, I will leave that as is for the moment, since a new version of IDS is about to be launched soon, and it seems that it is the most useful thing to do to handle this when we make the next release of clics, based on plain IDS.

Manangba/Manange tones

Numbers appearing in front of words appear to have been deleted in the conversion to CLICS, but I think these are tones!! certainly they do not distinguish homonyms because this is done by numbers after the item in parentheses...

Decide for new citation in "how to quote clics"

As M. Haspelmath mentioned, we should add publisher and address to our "how to cite" in the FAQ, and adapt it accordingly.

Errors in Ghulfan (ghl)

The Ghulfan list is erroneously coded. It should be copied from IDS and re-coded, since concepts and word-forms have been shifted and lead to erroneous results.

URL of data sources

Interessantes Projekt! Bitte überprüft die Links auf der Webseite (http://clics.lingpy.org/languages.php), z.B. http://wold.livingsources.org/vocabulary/14 --> http://wold.clld.org/vocabulary/14. M. Rießler

Clean and correct IDS_metadata.txt in order to provide links to ALL sources

Not all sources are correctly linked in the language table. This seems to be some bug in IDS_metadata.txt. This needs to be identified and cleaned.

[gji] is Gurindji in WOLD, but Gurinji in CLICS

Sow (noun vs. verb) is ambiguosly coded

Thanks to W. S. Annis', who mentioned this to me:

The map for 'farmer' exposes a POS coding issue
in the CLICS database. The word 'sow' is catching
both of the English words 'sow (seed)' and 'sow,'
the female pig.

http://clics.lingpy.org/browse.php?gloss=farmer&view=community

As mentioned in former issue #1, these cases need to be traced down and cleaned up. Alternatively, we can also just use the keys and forget about the "glosses" (unless someone's searching for them).

Problematic isocodes

I was just pointed to some erroneous or outdated iso-codes (thanks a lot, Charles!):

kij_std.csv: kij -> kjj
ray_std.csv: ray -> rap
src_std.csv: src -> hbs or srp or bos or hrv
mcq_std.csv: mcq -> ese
mcq_Huarayo.csv: mcq -> ese
noo_std.csv: noo -> nuk
tzz_std.csv: tzz -> tzo

Although future releases will be strictly based on input data we receive from IDS, we should keep these codes in mind and check them with the new release in IDS to make sure that this is not an error of IDS that has been overseen...

Add "how to contribute"-section to the FAQ

For invitation of people to contribute, we should -- at least for the moment -- add few sentences in which we write that direct contact per email is the best way to contribute at the moment, since we do not yet have interfaces where data could be submitted, or structures, where our code could be employed by other people.

asymmetric connections?

one more: "strike (hit, beat)" shows a connection to "stab" (data checked, looks good), but not the other way aroudn, i.e. when searching for connections of "stab".

Toba list (tob) misses many borrowings originally marked in brackets

Borrowings from Spanish (or similar) in this list are indicated using [brackets]. The computational procedure ignores them completely. The list needs to be updated completely, or ignored.

Spaces in Romanian

many (all?) Romanian verbs have the shape a (verb), e.g. a bate 'strike'. In CLICS we have abate instead, i.e. the space has been deleted. Have all spaces been removed automatically during conversion in all datasets? This may be far too broad-sweeping...

Trace down duplicate glosses in clusters

As @brochhagen detected, we have some concepts in the database which have separate ids but the same gloss identifier. At least for "knife" this is the case. The problem is that these concepts cannot be found in the search interface on http://clics.lingpy.org/all.php. This bug was now fixed for "knife" by inserting two separate entries for the communities in the sqlite database, but it needs to be covered explicitly in the future by either adjusting the community detection procedure or by modifying our identifiers for the glosses. I won't fix this bug at the moment, since if we fix it, it could change our clusters, and the clusters on the website for version 1.0 would be different from the clusters given in the paper.

Expand "about" tab by "language varieties" and "concepts"

We should, as M. Haspelmath suggested, insert two tables showing

all language varieties in the sample
all concepts in the sample,

Having prepared these tables, the "about" tab in the navigation bar should be split up into three subtabs:

FAQ (= currently only "ABOUT")
Language Varieties
Concepts

This will make it more transparent for the users to check which languages and which concepts are covered in CLICS.

Spraakbanken lists are erroneously coded

All lists taken from the spraakbanken project (or at least some of them) have shifted concept-meaning slots, probably due to the automatic conversion procedure. These need to be updated and thoroughly checked before the next release is undertaken.

Clicking on Katukína, Panoan doesn't link to IDS vocabulary

Clicking on Katukína, Panoan [knt] doesn't link to IDS vocabulary but to the IDS start page. Since the lg appears as Catuquina there it is difficult to find it back. Also, in CLICS this appears Katukína, Panoan. Presumably there is another Katukína somewhere, hence the language family Panoan is also named. I don't think we need this.

Hungarian

we have siiw 'heart', whereas IDS has szív. What happened?

JavaScript code for bibliographic display prevents internal href-links

To reproduce this error, just go to any link, e.g., http://clics.lingpy.org/all.php?gloss=village and try to hit any of the links in the IDS-key column. The reason why this one does not work is that the bibliograhic popup searches for href-keys with the keyword "key". In this case, it thus misunderstands that we're not at all talking about bibtex-keys, and therefore produces the error.

clics / clics1 Goto Github PK

clics1's Introduction

CLICS - The Database of Cross-Linguistic Colexifications

Publications

Datasets and Software

Web application

Contributors

clics1's People

Contributors

Stargazers

Watchers

Forkers

clics1's Issues

Recommend Projects

Recommend Topics

Recommend Org