Giter VIP home page Giter VIP logo

pysem's Introduction

LingPy: A Python Library for Automatic Tasks in Historical Linguistics

This repository contains the Python package lingpy which can be used for various tasks in computational historical linguistics.

Build Status DOI PyPI version Documentation

Authors (Version 2.6.12): Johann-Mattis List and Robert Forkel

Collaborators: Christoph Rzymski, Simon J. Greenhill, Steven Moran, Peter Bouda, Johannes Dellert, Taraka Rama, Tiago Tresoldi, Gereon Kaiping, Frank Nagel, and Patrick Elmer.

LingPy is a Python library for historical linguistics. It is being developed for Python 2.7 and Python 3.x using a single codebase.

Quick Installation

For our latest stable version, you can simply use pip or easy_install for installation:

$ pip install lingpy

or

$ pip install lingpy

Depending on which easy_install or pip version you use, either the Python2 or the Python3 version of LingPy will be installed.

If you want to install the current GitHub version of LingPy on your system, open a terminal and type in the following:

$ git clone https://github.com/lingpy/lingpy/
$ cd lingpy
$ python setup.py install

If the last command above returns you some error regarding user permissions (usually "Errno 13"), you can install LingPy in your home Python setup:

$ python setup.py install --user

In order to use the library, start an interactive Python session and import LingPy as follows:

>>> from lingpy import *

To install LingPy to hack on it, fork the repository on GitHub, open a terminal and type:

$ git clone https://github.com/<your-github-user>/lingpy/
$ cd lingpy
$ python setup.py develop

This will install LingPy in "development mode", i.e. you will be able edit the sources in the cloned repository and import the altered code just as the regular Python package.

pysem's People

Contributors

lingulist avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pysem's Issues

mismatch between length of input and output

While working on streitberggothic I encountered following issue:

# mismatch in col len
import pandas as pd
from pysem.glosses import to_concepticon

PATH = "Streitberg-1910-3659.tsv"

def main():
    dfgot = pd.read_csv(PATH, sep="\t").fillna("")
    glosses = [{"gloss": str(g), "pos": str(p)}
                for g, p in zip(dfgot.sense, dfgot.pos)]

    print(len(glosses),
          len(to_concepticon(glosses, language="de", pos_ref="pos",
                             max_matches=1)))

if __name__ == "__main__":
    main()

prints: 3645 3274

So there are somehow less output matches than input provided.

I thought this might be valuable information for the developers, even though eventually I found a workaround, like so:

def main():
    dfgot = pd.read_csv(PATH, sep="\t")

    conid, conglo = [], []
    for g, p in zip(dfgot.sense, dfgot.pos):
        gloss = [{"gloss": g, "pos": p}]
        out = list(to_concepticon(gloss, language="de",
                                  pos_ref="pos", max_matches=1).values())[0]
        if out:
            conid.append(out[0][0])
            conglo.append(out[0][1])
        else:
            conid.append(None)
            conglo.append(None)

    dfgot["CONCEPTICON_ID"], dfgot["CONCEPTICON_GLOSS"] = conid, conglo
    del dfgot["form"]
    dfgot.to_csv("concepts.tsv", index=False, encoding="utf-8", sep="\t")

if __name__ == "__main__":
    main()

Typo in usage example

The usage example contains this line:
to_concepticon([{"gloss": "Fuß", pos: "noun"}], language="de"}])

But, that seems to be a typo for this:
to_concepticon([{"gloss": "Fuß", "pos": "noun"}], language="de")

Specify compatible concepticon version

pysem works with concepticon v2.5.0 but not with the latest version. I think it would be nice to be able to see somewhere which concepticon version the concepticon.zip file is based on, e.g. in the readme. Or maybe the file could be updated by the latest version or even replaced by an API?

multiple identical glosses as output

In [15]: to_concepticon([{"gloss": "arm / hand", "pos": "noun"}], max_matches=4)
Out[15]: 
{'arm / hand': [['1277', 'HAND', 'noun', 15],
  ['1277', 'HAND', 'noun', 15],
  ['1277', 'HAND', 'noun', 15],
  ['1277', 'HAND', 'noun', 15]]}

Slash is only rudimentarily recognized as a separator

In [15]: to_concepticon([{"gloss": "arm / hand", "pos": "noun"}], max_matches=4)
Out[15]: 
{'arm / hand': [['1277', 'HAND', 'noun', 15],
  ['1277', 'HAND', 'noun', 15],
  ['1277', 'HAND', 'noun', 15],
  ['1277', 'HAND', 'noun', 15]]}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.