Giter VIP home page Giter VIP logo

textgrid's Introduction

textgrid.py

Python classes for Praat TextGrid and TextTier files (and HTK .mlf files)

Kyle Gorman [email protected] and contributors (see commit history).

How to cite:

While you don't have to, if you want to cite textgrid.py in a publication, include a footnote link to the source:

http://github.com/kylebgorman/textgrid/

How to install:

The code can be placed in your working directory or in your $PYTHONPATH, and then imported in your Python script. You also can install it via pip, like so:

pip install textgrid

(if you're not working in a virtualenv, you may need to do this with sudo.)

Synopsis:

See the docstrings in textgrid.py

Example:

This is a simple example of reading a TextGrid file.

import textgrid

# Read a TextGrid object from a file.
tg = textgrid.TextGrid.fromFile('test.TextGrid')

# Read a IntervalTier object.
print("------- IntervalTier Example -------")
print(tg[0])
print(tg[0][0])
print(tg[0][0].minTime)
print(tg[0][0].maxTime)
print(tg[0][0].mark)

# Read a PointTier object.
print("------- PointTier Example -------")
print(tg[1])
print(tg[1][0])
print(tg[1][0].time)
print(tg[1][0].mark)

The content of the file test.TextGrid is as below:

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0
xmax = 1
tiers? <exists>
size = 2
item []:
    item [1]:
        class = "IntervalTier"
        name = "words"
        xmin = 0
        xmax = 1
        intervals: size = 2
        intervals [1]:
            xmin = 0
            xmax = 0.5
            text = """Is anyone home?"""
        intervals [2]:
            xmin = 0.5
            xmax = 1
            text = "asked ""Pat"""
    item [2]:
        class = "TextTier"
        name = "points"
        xmin = 0
        xmax = 1
        points: size = 2
        points [1]:
            number = 0.25
            mark = """event"""
        points [2]:
            number = 0.75
            mark = """event"" with quotes again"

The following is the output of the above snippet:

------- IntervalTier Example -------
<IntervalTier words, 2 intervals>
Interval(0.0, 0.5, "Is anyone home?")
0.0
0.5
"Is anyone home?"
------- PointTier Example -------
<PointTier points, 2 points>
Point(0.25, "event")
0.25
"event"

textgrid's People

Contributors

christianbrodbeck avatar gfetterman avatar james-tanner avatar jofrhwld avatar kylebgorman avatar lcavasso avatar maxbane avatar miopas avatar mjfox3 avatar mmcauliffe avatar seaniezhao avatar stevenbedrick avatar terriyu avatar ycchuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

textgrid's Issues

detectEncoding does not work for latin-1

The files that I need to use are encoded as latin-1 and the detectEncoding function does not support this. I suggest a new parameter for TextGrid: encoding=None, which when passed supersedes the detectEncoding function.

textgrids with overlapping time points

hey man, just an FYI: mlf format allows to segments to start/end at the same time, but that drives praat crazy. so

7400000 8100000 EY1
8100000 8100000 sp sp
8100000 10700000 IH1

just happens to make a "bad" textgrid unless you change those 8100000's slightly.

UTF-16 support

Praat text files can be either ASCII or UTF-16. Right now, textgrid.py's reading and writing functions don't handle UTF-16, however. To be made to do this, they have to recognize UTF-16 (or just non-ASCII, I'd suggest, by comparing characters against \x7F) when reading, and the class instances have to be annotated as to whether they contain any UTF-16 marks, a feature which must percolate upward to Tier and Grid.

I'm not sure what we want to do here, but since we're redoing the write functions anyways, it seems worthwhile to consider.

Gaps in IntervalTiers

Currently, it is possible to use textgrid.py to create IntervalTiers which don't explicitly mention empty (null "mark") intervals. These are readable by Praat, but which behave weirdly.

The appropriate behavior seems to be as follows. Any time an IntervalTier is being written (either IntervalTier.write or TextGrid.write when the TextGrid contains an IntervalTier), the written text file needs to explicitly include empty intervals. However, this should not change the state of the IntervalTier instance itself. (This does suggest an interesting possibility: maybe the "read" methods should ignore empty intervals. Something to think about.) This should be relatively easy to do without any additional instance variables, and just requires edits to IntervalTier.write and TextGrid.write.

I've assigned this to myself, for the moment. It's just one of those "while True:" problems I think.

Inconsistencies in MLF timestamps

Keelan Evanini and Bob Lannon both tell me that HVite in HTK may produce MLF files with rounding errors in their timestamps. There is an undocumented fix in the Penn Forced Aligner, but it doesn't generalize to arbitrary sample rates, which is a requirement I need for my aligner project. I also don't know if this is true of all versions of HTK. I'll need some long MLF files to figure this all out, but it's something to take care of in the future. I'm assigning this one to myself.

Default rounding induces length mismatch

While rounding may be useful for some applications, it harms others.

Consider audio at 22050 Hz sampling rate with length 74751 samples. With a hopsize of 256 samples and padding such that each frame is centered at hopsize / 2, 3 * hopsize / 2, 5 * hopsize / 2, etc., this should yield 291 frames. With the current default rounding, it produces 292 frames and is misaligned with other features I use. Because rounding cannot be turned off, I cannot fix this while using this library in its current state.

I think rounding should be optional. I also think that it should not be the default system behavior.

Rounding should be done after division

I'm getting overlapping IntervalTier segments and trying to find the cause for this.

There is a line in MLF.add that rounds a float and the divides it by the samplerate:

pmin = round(float(line[0]), round_digits) / samplerate

However, this seems to defy the purpose of rounding. Here's an example:

>>> round(1234.56789, 3)
1234.568
>>> round(1234.56789, 3) / 10000
0.12345679999999999

Maybe it's better to define by samplerate and round afterwards, like this:

pmin = round(float(line[0]) / samplerate, round_digits)

If you agree about this, I'm happy to file a pull request.

__min__ and __max__

The TextGrid, IntervalTier, and PointTier classes have __min__ and __max__ methods defined to return the minTime and maxTime attributes. These two methods aren't included as Python magic methods, so you have to call them by name:

it = textgrid.IntervalTier()
it.add(5.0, 6.0, 'a segment')

it.__min__()

which returns the minTime, 0.0

If you call min(it), the IntervalTier returns the first interval Interval(5.0, 6.0, a segment), and min() with a TextGrid returns the first/top tier. Go ahead and close if this is all intended behavior. I just saw the double underscores and went for min() but then didn't get what I expected.

PointProcess

Hi, older versions used to support PointProcess, which is needed to read files that contain events such as GCI marks. Was there a specific reason that lead to removal? And/or: Any chance of re-inclusion?

pypi upload

The module should be uploaded to PyPi, with whatever changes are necessary to do so.

PYPI release?

Would it be possible to make a pypi release? I keep having to install from master because of some issues with the current release, and things seem pretty stable on GitHub.

How does one use this?

Hi,

Thanks a lot for coding this. I'm just a bit confused on how to use it. I've how a .mlf file from HTK that I need to get into TextGrid format.

Do I need to import textgrid into Python and then call some functions on it?

Or is there a command line that I can use? python textgrid.py filename.mlf ??

Some documentation would help.

Cheers

intervalContaining doesn't work if the time is the first index

Right now, the intervalContaining function has an error where it returns None even if the corresponding interval exists, if that interval is the first index. This is because if i: evaluates to false if i is 0.

   def intervalContaining(self, time):
        """
        Returns the interval containing the given time point, or None if
        the time point is outside the bounds of this tier. The argument
        can be a numeric type, or a Point object.
        """
        i = self.indexContaining(time)
        if i: #This evaluates to false if i is None or if i is = 0 
             return self.intervals[i] 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.