Giter VIP home page Giter VIP logo

clics1's Introduction

CLICS - The Database of Cross-Linguistic Colexifications

The original Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns. It has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations, ranging from studies on semantic change, patterns of conceptualization, and linguistic paleontology. But CLICS has also been criticized for obvious shortcomings. Building on standardization efforts reflected in the CLDF initiative and novel approaches for fast, efficient, and reliable data aggregation, CLICS² expanded the original CLICS database. CLICS³ - the third installment of CLICS - exploits the framework pioneered in CLICS² to more than double the amount of data aggregated in the database.

Publications

CLICS Release Authors Title Reference
CLICS List, Terhalle, and Urban Using network approaches to enhance the analysis of cross-linguistic polysemies List2013a
CLICS Mayer, List, Terhalle, and Urban An Interactive Visualization of Crosslinguistic Colexification Patterns Mayer2014
CLICS² List, Greenhill, Anderson, Mayer, Tresoldi, and Forkel CLICS². An improved database of cross-linguistic colexifications assembling lexical data with help of cross-linguistic data formats List2018e
CLICS³ Rzymski, Tresoldi, et al. The Database of Cross-Linguistic Colexifications, reproducible analysis of cross- linguistic polysemies PREPRINT

Datasets and Software

Datasets providing lexical data aggregated in CLICS and software tooling the CLICS processing workflow is accesible and archived on Zenodo via the CLICS community.

Web application

Since CLICS², the latest release of the CLICS database and colexification network can be explored in a clld application at clics.clld.org.

Contributors

Find information about contributors and grants on CONTRIBUTORS.md.

clics1's People

Contributors

lingulist avatar tmayer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

so2jia

clics1's Issues

Wayampi [oym] dashes

Here there appears also an issue with unjustified deletion of morpheme breaks indicated by dashes, thus we get colexification of aa 'day' with a-a 'go'. Unless there is a need to delete them for purposes of automatic processing it seems to me that dashes should generally be retained, although a question is what to do with cases like -lɛ-kɨʔɨ 'older brother' (presumably needs a possessive prefix to be complete)...

Check Venetian

According to our data, both 'straight' and 'hard' in Venetian [vec] is a. This is a Romance language, I can't imagine that this is right. Could you please check, I don't understand how I can access the LOGOS data.

Problems with links due to the movement of WOLD to clld-domain

We still have the old links for WOLD in the data, so they don't work, since they link to "livingsources". however, I will leave that as is for the moment, since a new version of IDS is about to be launched soon, and it seems that it is the most useful thing to do to handle this when we make the next release of clics, based on plain IDS.

Manangba/Manange tones

Numbers appearing in front of words appear to have been deleted in the conversion to CLICS, but I think these are tones!! certainly they do not distinguish homonyms because this is done by numbers after the item in parentheses...

Errors in Ghulfan (ghl)

The Ghulfan list is erroneously coded. It should be copied from IDS and re-coded, since concepts and word-forms have been shifted and lead to erroneous results.

Sow (noun vs. verb) is ambiguosly coded

Thanks to W. S. Annis', who mentioned this to me:

The map for 'farmer' exposes a POS coding issue
in the CLICS database. The word 'sow' is catching
both of the English words 'sow (seed)' and 'sow,'
the female pig.

As mentioned in former issue #1, these cases need to be traced down and cleaned up. Alternatively, we can also just use the keys and forget about the "glosses" (unless someone's searching for them).

Problematic isocodes

I was just pointed to some erroneous or outdated iso-codes (thanks a lot, Charles!):

kij_std.csv: kij -> kjj
ray_std.csv: ray -> rap
src_std.csv: src -> hbs or srp or bos or hrv
mcq_std.csv: mcq -> ese
mcq_Huarayo.csv: mcq -> ese
noo_std.csv: noo -> nuk
tzz_std.csv: tzz -> tzo

Although future releases will be strictly based on input data we receive from IDS, we should keep these codes in mind and check them with the new release in IDS to make sure that this is not an error of IDS that has been overseen...

Add "how to contribute"-section to the FAQ

For invitation of people to contribute, we should -- at least for the moment -- add few sentences in which we write that direct contact per email is the best way to contribute at the moment, since we do not yet have interfaces where data could be submitted, or structures, where our code could be employed by other people.

asymmetric connections?

one more: "strike (hit, beat)" shows a connection to "stab" (data checked, looks good), but not the other way aroudn, i.e. when searching for connections of "stab".

Spaces in Romanian

many (all?) Romanian verbs have the shape a (verb), e.g. a bate 'strike'. In CLICS we have abate instead, i.e. the space has been deleted. Have all spaces been removed automatically during conversion in all datasets? This may be far too broad-sweeping...

Trace down duplicate glosses in clusters

As @brochhagen detected, we have some concepts in the database which have separate ids but the same gloss identifier. At least for "knife" this is the case. The problem is that these concepts cannot be found in the search interface on http://clics.lingpy.org/all.php. This bug was now fixed for "knife" by inserting two separate entries for the communities in the sqlite database, but it needs to be covered explicitly in the future by either adjusting the community detection procedure or by modifying our identifiers for the glosses. I won't fix this bug at the moment, since if we fix it, it could change our clusters, and the clusters on the website for version 1.0 would be different from the clusters given in the paper.

Expand "about" tab by "language varieties" and "concepts"

We should, as M. Haspelmath suggested, insert two tables showing

  • all language varieties in the sample
  • all concepts in the sample,

Having prepared these tables, the "about" tab in the navigation bar should be split up into three subtabs:

  • FAQ (= currently only "ABOUT")
  • Language Varieties
  • Concepts

This will make it more transparent for the users to check which languages and which concepts are covered in CLICS.

Spraakbanken lists are erroneously coded

All lists taken from the spraakbanken project (or at least some of them) have shifted concept-meaning slots, probably due to the automatic conversion procedure. These need to be updated and thoroughly checked before the next release is undertaken.

Clicking on Katukína, Panoan doesn't link to IDS vocabulary

Clicking on Katukína, Panoan [knt] doesn't link to IDS vocabulary but to the IDS start page. Since the lg appears as Catuquina there it is difficult to find it back. Also, in CLICS this appears Katukína, Panoan. Presumably there is another Katukína somewhere, hence the language family Panoan is also named. I don't think we need this.

Hungarian

we have siiw 'heart', whereas IDS has szív. What happened?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.