Giter VIP home page Giter VIP logo

phones's Introduction

phones

Docs

phones

pdm-managed

phones is a python library for the easy handling of phones in the International Phonetic Alphabet. These IPA phones can be useful because they can describe how words are pronounced in most languages.

Feature Overview

  • Extract numeric feature vectors from phones.
  • Map phones from one language to another by finding the closest phones.
  • Convert between ARPABET, X-SAMPA/SAMPA and IPA notation.
  • Compute phone distances.
  • Do phone arithmetic on a phone and phone-feature level.
  • Visualise phones and their distances when installing phones[plots].

Installation

For the core libary:

pip install phones

For plotting:

pip install phones[plots]

phones's People

Contributors

minixc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

samx81 jaedukseo

phones's Issues

add dialect filter

As pointed out in #4, some languages are made up of a group of specific dialects in phoible.
We need to figure out how to communicate this best and have to add an option to filter by dialect to phones.

I think it might be best to go with something like this.

from phones import PhoneCollection
pc = PhoneCollection()
pc.langs("ast")
>>> ValueError: Need to select a dialect for "ast". Dialects can be listed using the list_dialects flag
pc.langs("ast", list_dialects=True)
>>> ["Asturian (Western)", "Asturian (North-Eastern)"
pc.langs("ast", "Asturian (Western)")

It would be nice to allow something like pc.langs("ast", "western"), which could be achieved by just checking if the dialect string only occurs in one of the dialect options.

basic tests

To avoid future build fails that go undetected, some basic test should be implemented that import all modules and run some basic functionality

`PhoneCollection.values` method fails

Greetings.

I see that the method PhoneCollection.values fails. Using the commands featured in the basic usage section I encounter an error:

>>> from phones import PhoneCollection
>>> pc = PhoneCollection()
>>> ph = pc.langs("eng").values[0]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1490, in array_func
    result = self.grouper._cython_operation(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 959, in _cython_operation
    return cy_op.cython_operation(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 657, in cython_operation
    return self._cython_op_ndim_compat(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 497, in _cython_op_ndim_compat
    return self._call_cython_op(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 541, in _call_cython_op
    func = self._get_cython_function(self.kind, self.how, values.dtype, is_numeric)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 173, in _get_cython_function
    raise NotImplementedError(
NotImplementedError: function is not implemented for this dtype: [how->mean,dtype->object]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/pandas/core/nanops.py", line 1692, in _ensure_numeric
    x = float(x)
ValueError: could not convert string to float: 'a aː aː'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/pandas/core/nanops.py", line 1696, in _ensure_numeric
    x = complex(x)
ValueError: complex() arg is a malformed string

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/phones/__init__.py", line 219, in values
    self.data.groupby(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1855, in mean
    result = self._cython_agg_general(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1507, in _cython_agg_general
    new_mgr = data.grouped_reduce(array_func)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1503, in grouped_reduce
    applied = sb.apply(func)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 329, in apply
    result = func(self.values, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1503, in array_func
    result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1457, in _agg_py_fallback
    res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 994, in agg_series
    result = self._aggregate_series_pure_python(obj, func)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 1015, in _aggregate_series_pure_python
    res = func(group)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1857, in <lambda>
    alt=lambda x: Series(x).mean(numeric_only=numeric_only),
  File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 11556, in mean
    return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 11201, in mean
    return self._stat_function(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 11158, in _stat_function
    return self._reduce(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/series.py", line 4670, in _reduce
    return op(delegate, skipna=skipna, **kwds)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/nanops.py", line 96, in _f
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/nanops.py", line 158, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/nanops.py", line 421, in new_func
    result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/nanops.py", line 727, in nanmean
    the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
  File "/usr/local/lib/python3.8/site-packages/pandas/core/nanops.py", line 1699, in _ensure_numeric
    raise TypeError(f"Could not convert {x} to numeric") from err
TypeError: Could not convert a aː aː to numeric

However, I see that the method that takes allophones into account works as expected:

>>> from phones import PhoneCollection
>>> pc = PhoneCollection()
>>> pc.langs("eng").values_with_allophones[:5]
[ (eng), b (eng), b (eng), d (eng), d (eng)]

I guess it must be due to the c != self.source.allophone_column part in this line:

[c for c in self.columns if c != self.source.allophone_column]

I encountered this error in versions 0.0.5 and 0.0.6

Thanks in advance

Some languages not found

First of all, thanks for the repository, it may result very helpful.

I am having some issues, though. I installed the repository via pip. I tried to check for Iberian languages, and surprisingly not Catalan nor Asturian were loaded:

>>> from phones import PhoneCollection
>>> pc = PhoneCollection()
>>> pc.langs('cat').values
[]
>>> pc.langs('ast').values
[]

I checked the .csv file the program is using from phoible (https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv) and both Asturian and Catalan are present, with those ISO codes (cat and ast). I don't know which could be the problem in this case. I imagine it could be related to the names of the dialects in some way.

Thanks in advance

TypeError: normalize() argument 2 must be str, not float

I'm creating a PhoneCollection with the drop_dialects and merge_same_language flags both set to False in order to load as many languages as possible.

>>> from phones import PhoneCollection
>>> pc=PhoneCollection(drop_dialects=False,merge_same_language=False)

but I get an exception ...

>>> pc=PhoneCollection(drop_dialects=False,merge_same_language=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python310\lib\site-packages\phones\__init__.py", line 77, in __init__
    ].apply(lambda x: unicodedata.normalize("NFC", x))
  File "C:\Python310\lib\site-packages\pandas\core\series.py", line 4433, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "C:\Python310\lib\site-packages\pandas\core\apply.py", line 1088, in apply
    return self.apply_standard()
  File "C:\Python310\lib\site-packages\pandas\core\apply.py", line 1143, in apply_standard
    mapped = lib.map_infer(
  File "pandas\_libs\lib.pyx", line 2870, in pandas._libs.lib.map_infer
  File "C:\Python310\lib\site-packages\phones\__init__.py", line 77, in <lambda>
    ].apply(lambda x: unicodedata.normalize("NFC", x))
TypeError: normalize() argument 2 must be str, not float

I don't know if you've this in hand in #5, but for now I can proceed with a function wrapping unicodedata.normalize in an exception handler

i.e.

        def normalize(x):
            try:
                return unicodedata.normalize("NFC", x)
            except:
                return x

        if self.source.allophone_column is not None:
            self.data[self.source.allophone_column] = self.data[
                self.source.allophone_column
            ].apply(lambda x: normalize(x))

Thanks for a really useful library!!

remove allophone column before grouping to avoid pandas warning

Pointed out in #9

python3.8/site-packages/phones/__init__.py:219: FutureWarning: The default value of numeric_only 
in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. 
Either specify numeric_only or select only columns which should be valid for the function.
  self.data.groupby(

Attribute `dialect_list` returns `TypeError`

I know it's not documented in the API reference, but attribute/method/property PhoneCollection.dialect_list returns TypeError:

>>> from phones import PhoneCollection
>>> pc = PhoneCollection(load_dialects=True)
>>> pc.langs("eus").dialect_list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[...]/python3.8/site-packages/phones/__init__.py", line 135, in dialect_list
    return list(sorted(self.data[self.source.dialect_column].unique()))
TypeError: '<' not supported between instances of 'float' and 'str'

It marks an error here

I am using version 0.0.4

It might be still on testing phase since it is not documented, but in case you still haven't notice this bug.

Thanks in advance!

ɶ not found

"ɶ" which appears when converting "&" from xsampa to ipa, does not seem to be found in the phoible database.

A problem with usage

I have install the library using the pip install phones command and ran the following script:

from phones.convert import Converter

converter = Converter()

but have this error:

Traceback (most recent call last):
  File "ipa_arpa.py", line 3, in <module>
    from phones.convert import Converter
  File "/Users/yehorsmoliakov/opt/miniconda3/lib/python3.8/site-packages/phones/convert.py", line 22, in <module>
    from .phonecodes.src import phonecodes
ModuleNotFoundError: No module named 'phones.phonecodes'

dialect-specific allophones

Hello again.

I found no allophones are being loaded, although marked as default in __init__ of PHOIBLE (allophone_column='Allophones').

I checked it in all phonemes:

>>> from phones import PhoneCollection
>>> pc = PhoneCollection()
>>> {tuple(sorted(p.allophones)) for p in pc.langs(pc.lang_list).values}
{()}
>>> 

I might be missing something, though...

I am using version 0.0.4

I am also having this warning, btw:

python3.8/site-packages/phones/__init__.py:219: FutureWarning: The default value of numeric_only 
in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. 
Either specify numeric_only or select only columns which should be valid for the function.
  self.data.groupby(

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.