toutpt / captionstransformer Goto Github PK
View Code? Open in Web Editor NEWTransform captions from one to format to another
Transform captions from one to format to another
It's neither unusual, nor beyond the specifications for ttml time attributes to include sub-second values (milliseconds, frames, ticks...), but captionstransformer fails to parse these.
Adobe Premiere (for example) includes sub-second values in its timecodes. And while it's not optimized for captions, it's in very common use, and not a wholly unusual source of ttml files.
Looking under the hood, I can see that ttml.py is only looking for '%H:%M:%S' causing the exception "ValueError unconverted data remains".
The %f macro was added to strftime/strptime in python 2.6 to handle these units.
You can change ttml.py so that if the conversion fails it will try to convert again, using the %f macro. Something like this:
def get_date(self, time_str):
try:
convertedTime = datetime.strptime(time_str, '%H:%M:%S')
except ValueError as v:
ulr = len(v.args[0].partition('unconverted data remains: ')[2])
if ulr:
convertedTime = datetime.strptime(time_str, "%H:%M:%S.%f")
else:
raise v
return convertedTime
It would be wise to add the %f macro to the Writer output as well
class Writer(core.Writer):
DOCUMENT_TPL = u"""<tt xml:lang="" xmlns="http://www.w3.org/ns/ttml"><body><div>%s</div></body></tt>"""
CAPTION_TPL = u"""<p begin="%(start)s" end="%(end)s">%(text)s</p>"""
def format_time(self, caption):
"""Return start and end time for the given format"""
#milliseconds now given (remove the [:-3] for microseconds)
return {'start': caption.start.strftime('%H:%M:%S.%f')[:-3],
'end': caption.end.strftime('%H:%M:%S.%f')[:-3]}
https://pypi.org/project/captionstransformer/ points to https://github.com/toutpt/captionstranformer which is a 404 since it's missing a letter.
Also, it looks like there's only an outdated version of the project from 2012 there?
setup.py
does say that the project is under GPL, but there's no version specified. Adding a dedicated LICENSE
file would clarify things, and at the same time would satisfy inclusion requirement of the GPL.
Hi,
I'm trying to use the captionstransformer as I've a need to convert SRT to TTML. I'm using the following example code but I'm getting the following error,
Error trace:
Traceback (most recent call last):
File "srt_to_ttml.py", line 15, in
captions = reader.read()
File "C:\Python34\lib\site-packages\captionstransformer-1.2.1-py3.4.egg\captionstransformer\core.py", line 13, in read
self.rawcontent = self.rawcontent.decode(self.encoding)
AttributeError: 'str' object has no attribute 'decode'
The following is the example code that I'm using,
from captionstransformer.srt import Reader
from captionstransformer.ttml import Writer
from io import StringIO
test_content = StringIO(u"""
1
00:00:03,490 --> 00:00:07,430
FISHER: All right. So, let's begin.
This session is: Going Social
00:00:07,430 --> 00:00:11,600
with the YouTube APIs. I am
Jeff Fisher,
""")
reader = Reader(test_content)
captions = reader.read()
filelike = StringIO()
writer = Writer(filelike)
writer.set_captions(captions)
text = writer.captions_to_text()
text.startswith(u"""
Could you please help me fix this error? I'm using ActiveState Python 3.4.1
Thanks,
Prem
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.