Giter VIP home page Giter VIP logo

ttml2srt's People

Contributors

codingcatgirl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ttml2srt's Issues

xml.etree.ElementTree.ParseError: not well-formed (invalid token)

Traceback (most recent call last):
File "ttml2srt.py", line 8, in
tree = ET.parse(filename)
File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 1187, in parse
tree.parse(source, parser)
File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 598, in parse
self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

Source file:
textnld.txt

Very likely because there is still binary content in there, in between of the XML...

Problem with conversion

Hi @codingcatgirl, I have a problem with converting subtitle to srt format. I cloned this repo, installed latest Python 3.7.0 for windows and started it from command line like this:

python ttml2srt.py "d:\Temp\aaa\S02E12.ttml"

In return I get:

Traceback (most recent call last): File "ttml2srt.py", line 88, in <module> parse_times(body) File "ttml2srt.py", line 86, in parse_times parse_times(child, default_begin=begin) File "ttml2srt.py", line 86, in parse_times parse_times(child, default_begin=begin) File "ttml2srt.py", line 64, in parse_times begin = parse_time_expression(elem.attrib['begin'], default_offset=default_begin) File "ttml2srt.py", line 49, in parse_time_expression raise NotImplementedError('Parsing time expressions by ticks is not supported!') NotImplementedError: Parsing time expressions by ticks is not supported!

I had attached original ttml file. Can you help me?

S02E12.zip

TTML referencing undefined style causes ttml2srt to crash

A TTML file which makes reference to a style which doesn't exist will cause ttml2srt to crash with the following error (based on the minimal example file attached):

Traceback (most recent call last):
File "/home/jlick/bin/ttml2srt", line 175, in
rendered.append((timestamp, re.sub(r'\n\n\n+', '\n\n', render_subtitles(body, timestamp)).strip()))
File "/home/jlick/bin/ttml2srt", line 139, in render_subtitles
result += render_subtitles(child, timestamp)
File "/home/jlick/bin/ttml2srt", line 120, in render_subtitles
style.update(styles[elem.attrib['style']])
KeyError: 'default'

When removing 'style="default"' from the <div> tag in the broken example, ttml2srt runs correctly, therefore a suggested fix is to ignore references to styles that are undefined.

example.zip

'datetime.timedelta' object is not subscriptable

Hi!

I am trying to convert the attached ttml to srt. However, I got the following error:

Traceback (most recent call last): File "/home/chen/bin/ttml2srt.py", line 156, in <module> print(format_timestamp(timestamp)+' --> '+format_timestamp(rendered_grouped[i+1][0])) TypeError: 'datetime.timedelta' object is not subscriptable

Could you advise how the error could be fixed?
Spionligaen 1_52-MSUI47002110AA.no.ttml.zip

SyntaxError: invalid syntax

After trying to convert from a ttml file to srt I got this message:

- Marking Time.español.srt"
  File "ttml2srt.py", line 190
    def format_timestamp(timestamp: timedelta):
                                  ^
SyntaxError: invalid syntax

Here is the file I'm trying to convert to srt:
transcript.zip
Thanks for any help.

Subtitles with <br> tags are ommited

Hi!

I found that style="center" is not recognized by the script. Could you have a look at the attached subtitle in ttml format and the converted subtitle in srt format? You can see that all lines with style="center" are omitted.

Could you also check whether style="right" is coded in the script, just in case?

Thank you!

Cheers,

Xianwen

Katten_med_hatten-S02-E05.mp4.subtitle.zip

Parsing Error on Python 3.7

Here is the output

M:\FlixGrab>"Q:\[QW]\ttml2srt\python.exe" "Q:\[QW]\ttml2srt\ttml2srt.py" "M:\FlixGrab\[S1.Ep2] Sherlock - The Blind Banker.Turkish.ttml" > "M:\FlixGrab\[S1.Ep2] Sherlock - The Blind Banker.Turkish.srt" Traceback (most recent call last): File "Q:\[QW]\ttml2srt\ttml2srt.py", line 88, in <module> parse_times(body) File "Q:\[QW]\ttml2srt\ttml2srt.py", line 86, in parse_times parse_times(child, default_begin=begin) File "Q:\[QW]\ttml2srt\ttml2srt.py", line 86, in parse_times parse_times(child, default_begin=begin) File "Q:\[QW]\ttml2srt\ttml2srt.py", line 64, in parse_times begin = parse_time_expression(elem.attrib['begin'], default_offset=default_begin) File "Q:\[QW]\ttml2srt\ttml2srt.py", line 49, in parse_time_expression raise NotImplementedError('Parsing time expressions by ticks is not supported!') NotImplementedError: Parsing time expressions by ticks is not supported!

Italics and white space incorrectly stripped

I have a ttml file I've tried converting with several utilities. Yours is working 99% correctly but there are some issues with italics that aren't being converted correctly:

This:

<p begin="00:01:02.562" end="00:01:06.31"><span tts:fontStyle="italic">EXAMPLE</span> TEXT<br/>YET ANOTHER <span tts:fontStyle="italic">EXAMPLE</span> TEXT.</p>

Is getting converted to this:

17
00:01:02,562 --> 00:01:06,310
EXAMPLETEXT
YET ANOTHEREXAMPLETEXT.

So there are two issues here:

  1. This italics format isn't being detected correctly
  2. White space is being trimmed which shouldn't be

I see that ttml2srt is supposed to handle italics but I couldn't figure out why it isn't handling this case. Even if that isn't corrected, the issue of essential white space being trimmed should be corrected so that words don't get smushed together.

Missing license

I'd like to use the TTML2SRT source code in another project, but since there is no license specified I can't. Would it be possible to add one to the top of the file or to the repository root?

UnicodeEncodeError: 'charmap' codec can't encode character

Hi @codingcatgirl, there is a problem with special characters in converted files. In some situations the converter can't handle national characters and then an error occurs:

File "d:\GIT\ttml2srt\ttml2srt.py", line 202, in <module> print(content) File "C:\Users\csrednicki\AppData\Local\Programs\Python\Python37-32\lib\encodings\cp1250.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xe0' in position 13: character maps to <undefined>

The problematic line is:

<p begin="8942266666t" end="8959283666t" xml:id="subtitle209">To pewnie déjà vu.</p>

I came across this error also when converting another file. In the second case it was "\xe8". Line was:

<p begin="15572223332t" end="15607925666t" xml:id="subtitle411">oraz „La donna è mobile"<br/>z <span style="normal_1">Rigoletto </span>Verdiego.</p>

I have attached the problematic files mentioned above.

ValueError: unknown time expression: 00:-11:-48.-340

Hi!

Thank you for fixing and improving the ttml2srt in the past!

I am here to report a new bug.

While trying to convert the attached ttml, I got the following error:

Traceback (most recent call last):
File "/home/chen/bin/ttml2srt.py", line 88, in
parse_times(body)
File "/home/chen/bin/ttml2srt.py", line 86, in parse_times
parse_times(child, default_begin=begin)
File "/home/chen/bin/ttml2srt.py", line 86, in parse_times
parse_times(child, default_begin=begin)
File "/home/chen/bin/ttml2srt.py", line 75, in parse_times
dur = parse_time_expression(elem.attrib['dur'])
File "/home/chen/bin/ttml2srt.py", line 60, in parse_time_expression
raise ValueError('unknown time expression: %s' % expression)
ValueError: unknown time expression: 00:-11:-48.-340

Could you have a look?

Have a nice Easter by the way!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.