Giter VIP home page Giter VIP logo

captionstransformer's Issues

Library chokes on milliseconds/frames/ticks

It's neither unusual, nor beyond the specifications for ttml time attributes to include sub-second values (milliseconds, frames, ticks...), but captionstransformer fails to parse these.

Adobe Premiere (for example) includes sub-second values in its timecodes. And while it's not optimized for captions, it's in very common use, and not a wholly unusual source of ttml files.

Looking under the hood, I can see that ttml.py is only looking for '%H:%M:%S' causing the exception "ValueError unconverted data remains".

The %f macro was added to strftime/strptime in python 2.6 to handle these units.

You can change ttml.py so that if the conversion fails it will try to convert again, using the %f macro. Something like this:

    def get_date(self, time_str):
        try:
            convertedTime = datetime.strptime(time_str, '%H:%M:%S')
        except ValueError as v:
            ulr = len(v.args[0].partition('unconverted data remains: ')[2])
            if ulr:
                convertedTime = datetime.strptime(time_str, "%H:%M:%S.%f")
            else:
                raise v
        return convertedTime

It would be wise to add the %f macro to the Writer output as well

class Writer(core.Writer):
    DOCUMENT_TPL = u"""<tt xml:lang="" xmlns="http://www.w3.org/ns/ttml"><body><div>%s</div></body></tt>"""
    CAPTION_TPL = u"""<p begin="%(start)s" end="%(end)s">%(text)s</p>"""

    def format_time(self, caption):
        """Return start and end time for the given format"""
        #milliseconds now given (remove the [:-3] for microseconds)
        return {'start': caption.start.strftime('%H:%M:%S.%f')[:-3],
                'end': caption.end.strftime('%H:%M:%S.%f')[:-3]}

AttributeError: 'str' object has no attribute 'decode'

Hi,

I'm trying to use the captionstransformer as I've a need to convert SRT to TTML. I'm using the following example code but I'm getting the following error,

Error trace:
Traceback (most recent call last):
File "srt_to_ttml.py", line 15, in
captions = reader.read()
File "C:\Python34\lib\site-packages\captionstransformer-1.2.1-py3.4.egg\captionstransformer\core.py", line 13, in read
self.rawcontent = self.rawcontent.decode(self.encoding)
AttributeError: 'str' object has no attribute 'decode'

The following is the example code that I'm using,
from captionstransformer.srt import Reader
from captionstransformer.ttml import Writer
from io import StringIO
test_content = StringIO(u"""
1
00:00:03,490 --> 00:00:07,430
FISHER: All right. So, let's begin.
This session is: Going Social

00:00:07,430 --> 00:00:11,600
with the YouTube APIs. I am
Jeff Fisher,
""")
reader = Reader(test_content)
captions = reader.read()

len(captions) == 4

first = captions[0]

type(first.text) == unicode

first.text == u"Jellyfish at the Monterey Aquarium"

next get a writer

filelike = StringIO()
writer = Writer(filelike)
writer.set_captions(captions)
text = writer.captions_to_text()
text.startswith(u"""

""")
writer.write()
writer.close()

Could you please help me fix this error? I'm using ActiveState Python 3.4.1

Thanks,
Prem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.