Giter VIP home page Giter VIP logo

captionstransformer's People

Contributors

57uff3r avatar brennanyoung avatar toutpt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

captionstransformer's Issues

Library chokes on milliseconds/frames/ticks

It's neither unusual, nor beyond the specifications for ttml time attributes to include sub-second values (milliseconds, frames, ticks...), but captionstransformer fails to parse these.

Adobe Premiere (for example) includes sub-second values in its timecodes. And while it's not optimized for captions, it's in very common use, and not a wholly unusual source of ttml files.

Looking under the hood, I can see that ttml.py is only looking for '%H:%M:%S' causing the exception "ValueError unconverted data remains".

The %f macro was added to strftime/strptime in python 2.6 to handle these units.

You can change ttml.py so that if the conversion fails it will try to convert again, using the %f macro. Something like this:

    def get_date(self, time_str):
        try:
            convertedTime = datetime.strptime(time_str, '%H:%M:%S')
        except ValueError as v:
            ulr = len(v.args[0].partition('unconverted data remains: ')[2])
            if ulr:
                convertedTime = datetime.strptime(time_str, "%H:%M:%S.%f")
            else:
                raise v
        return convertedTime

It would be wise to add the %f macro to the Writer output as well

class Writer(core.Writer):
    DOCUMENT_TPL = u"""<tt xml:lang="" xmlns="http://www.w3.org/ns/ttml"><body><div>%s</div></body></tt>"""
    CAPTION_TPL = u"""<p begin="%(start)s" end="%(end)s">%(text)s</p>"""

    def format_time(self, caption):
        """Return start and end time for the given format"""
        #milliseconds now given (remove the [:-3] for microseconds)
        return {'start': caption.start.strftime('%H:%M:%S.%f')[:-3],
                'end': caption.end.strftime('%H:%M:%S.%f')[:-3]}

AttributeError: 'str' object has no attribute 'decode'

Hi,

I'm trying to use the captionstransformer as I've a need to convert SRT to TTML. I'm using the following example code but I'm getting the following error,

Error trace:
Traceback (most recent call last):
File "srt_to_ttml.py", line 15, in
captions = reader.read()
File "C:\Python34\lib\site-packages\captionstransformer-1.2.1-py3.4.egg\captionstransformer\core.py", line 13, in read
self.rawcontent = self.rawcontent.decode(self.encoding)
AttributeError: 'str' object has no attribute 'decode'

The following is the example code that I'm using,
from captionstransformer.srt import Reader
from captionstransformer.ttml import Writer
from io import StringIO
test_content = StringIO(u"""
1
00:00:03,490 --> 00:00:07,430
FISHER: All right. So, let's begin.
This session is: Going Social

00:00:07,430 --> 00:00:11,600
with the YouTube APIs. I am
Jeff Fisher,
""")
reader = Reader(test_content)
captions = reader.read()

len(captions) == 4

first = captions[0]

type(first.text) == unicode

first.text == u"Jellyfish at the Monterey Aquarium"

next get a writer

filelike = StringIO()
writer = Writer(filelike)
writer.set_captions(captions)
text = writer.captions_to_text()
text.startswith(u"""

""")
writer.write()
writer.close()

Could you please help me fix this error? I'm using ActiveState Python 3.4.1

Thanks,
Prem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.