byroot / pysrt Goto Github PK
View Code? Open in Web Editor NEWPython parser for SubRip (srt) files
License: GNU General Public License v3.0
Python parser for SubRip (srt) files
License: GNU General Public License v3.0
I'm using your library to build a simple scripts which downloads subtitles from opensubtitles.org, removes all the unnecessary lines(synced by... and similar entries), saves the srt file and then encodes it into the video file with ffmpeg.
I've found that ffmpeg doesn't like srt files where there's a missing line, and it refuses to work with them.
This is what happens now:
1
00:00:22,712 --> 00:00:24,478
first line
2
00:00:25,000 --> 00:00:31,074
second line
3
00:00:57,413 --> 00:01:00,180
third line
when i run
del sub[1]
I get
1
00:00:22,712 --> 00:00:24,478
first line
3
00:00:57,413 --> 00:01:00,180
third line
Which is not valid according to ffmpeg. Instead I should get
1
00:00:22,712 --> 00:00:24,478
first line
2
00:00:57,413 --> 00:01:00,180
third line
Can you please make that when I close an srt file, pysrt automatically renumbers all the lines, so that there are no interruptions?
On a Kubuntu Linux system the easy_install of the package (on python 2.4 and 2.6 also) is not so simple. Someway the egg downloaded is NOT the right one.
Typing: "easy_install pysrt" e use this file:
http://pypi.python.org/packages/any/p/pysrt/pysrt-0.2.3.macosx-10.6-universal.tar.gz
After manually downloading this:
http://pypi.python.org/packages/source/p/pysrt/pysrt-0.2.3.tar.gz
The installation gone well.
Hi,
It would be great if srt command could 'unsplit' subtitles. Something like:
$ srt join movie.1.srt movie.2.srt movie.3.srt > movie.srt
pysrt 0.4.6 using pip install
Python 2.7
Mac OS 10.8
File "pysrt/__init__.py", line 3, in <module>
from pysrt.srtfile import SubRipFile, SUPPORT_UTF_32_LE, SUPPORT_UTF_32_BE
File "pysrt/srtfile.py", line 12, in <module>
import chardet as charade
ImportError: No module named chardet
Would it be possible to publish a release and push it it to pypi? The 1.1.1 release is from April 2016 and does not contain the python 3 clasifiers added a few months later.
I think it would be nice to have a method where you could insert a subtitle into an existing .srt file. For example:
subs = pysrt.open('some/file.srt')
subs.insert("some subtitle", [start-time], [end-time])
The method would then shift all the existing subtitles down accordingly.
Just a thought...
Please include tests into releases, Gentoo python packages run them before installing.
If I have an srt_string like this:
1
00:00:20 --> 00:00:24
I also enjoy the fruits of our labor.2
00:00:24 --> 00:00:27
We truly are blessed creatures.
and then execute SubRipFile.fromstring(srt_string), it will result in an empty list. Shouldn't it just be able to recognize the improperly formatted timestamp and append ",00" to them? Getting an empty list (which can cause serious issues with programs that are interacting with SRTs) seems like a rather harmful result in this case.
Hi @ThiefMaster, @MestreLion, @chenhsiu, @ichernev
Thanks again for your contribution to pysrt.
Since the beginning pysrt was tagged as licensed under GNU GPL on PyPI, but I never explicitly included the license in the repository.
So it's not really a change of licensing, but I agree that you may not have been aware of pysrt licensing. So if you have any objection about this licensing, please tell me and I'll make sure to remove the code you own from the repository.
Regards.
It would be nice to keep git tags and PyPI releases in sync :)
I am trying to split subtiltes where there are two speakers that show up in the same frame. In my case this is indicated by newlines and hyphens ('\n-'). I this code snippet to split the subtitles into multiple:
# Split any multi-speaker subtitles (denoted by '\n-') into multiple single-speaker subtitles
for i in reversed(xrange(len(subs))):
if '\n-' in subs[i].text:
# Split the subtitle at the hyphen and format the list
lines = [line[1:] if line[0] == '-' else line for line in subs[i].text.split('\n-')]
length_milli = 1000 * float(subs[i].end.seconds - subs[i].start.seconds) + float(subs[i].end.milliseconds - subs[i].start.milliseconds)
interval_milli = int(length_milli / len(lines))
dummy = pysrt.SubRipItem(0, start=subs[i].start, end=subs[i].end, text="") # Use this just to get the right formatting for the time
dummy.shift(milliseconds =+ interval_milli) # Shift the dummy so its start time is now the end time we want
for j in xrange(len(lines)):
new_sub = pysrt.SubRipItem(0, start=subs[i].start, end=dummy.start, text=lines[j])
new_sub.shift(milliseconds =+ (j * interval_milli))
subs.append(new_sub)
del subs[i]
subs.clean_indexes()
The basic gist is that to format the time I am using a dummy object so that I can take advantage of shifting. For example, a 3-phrase frame over 3 seconds is split 3 ways would be 1 second long for each new frame.
When I create the dummy as above using start=sub.start
and end=sub.end
and then shift the dummy, it also shifts the original subtitle. I suspect this was not the intended behavior.
I found that casting sub.start
and sub.end
to strings in the assignment (e.g. start=str(sub.start)
) solved the issue. It appears that without the cast, however, I am actually assigning a reference or pointer of some kind rather than the value of the string.
If the srt file starts with a BOM ('\xef\xbb\xbf') it fails the subtitle parse, so the first subtitle is missing.
Maybe a manual test after open to check for these bytes, or a library to handle it automatically?
For example in the file captions.srt:
1 00:00:02,000 00:00:03,000 Hello world
import pysrt
captions = pysrt.open('captions.srt')
select = captions.at(seconds=2)
select
[]
select = captions.at(seconds=3)
select
[]
For large-enough files (100KiB+), the in-place output always gets cut off.
I'm guessing the buffer does not get flushed / the file handle does not get closed?
Hello, I tried to write small script, which will merge two subtitle files (assumed every is in different language) to one file. My motivation is my Taiwanise wife, I am Czech person. We want to watch movie together in Chinese and Czech subtitles and I want she will have chance to learn other language too.
Here is small script. I hoped somebody could use too.
#!/usr/bin/env python
# -*- coding: utf8 -*-
import sys
import getopt
from pysrt import SubRipFile
from pysrt import SubRipItem
from pysrt import SubRipTime
def join_lines(txtsub1, txtsub2):
if (len(txtsub1) > 0) & (len(txtsub2) > 0):
return txtsub1 + '\n' + txtsub2
else:
return txtsub1 + txtsub2
def find_subtitle(subtitle, from_t, to_t, lo=0):
i = lo
while (i < len(subtitle)):
if (subtitle[i].start >= to_t):
break
if (subtitle[i].start <= from_t) & (to_t <= subtitle[i].end):
return subtitle[i].text, i
i += 1
return "", i
def merge_subtitle(sub_a, sub_b, delta):
out = SubRipFile()
intervals = [item.start.ordinal for item in sub_a]
intervals.extend([item.end.ordinal for item in sub_a])
intervals.extend([item.start.ordinal for item in sub_b])
intervals.extend([item.end.ordinal for item in sub_b])
intervals.sort()
j = k = 0
for i in xrange(1, len(intervals)):
start = SubRipTime.from_ordinal(intervals[i-1])
end = SubRipTime.from_ordinal(intervals[i])
if (end-start) > delta:
text_a, j = find_subtitle(sub_a, start, end, j)
text_b, k = find_subtitle(sub_b, start, end, k)
text = join_lines(text_a, text_b)
if len(text) > 0:
item = SubRipItem(0, start, end, text)
out.append(item)
out.clean_indexes()
return out
def usage():
print "Usage: ./srtmerge [options] lang1.srt lang2.srt out.srt"
print
print "Options:"
print " -d <milliseconds> The shortest time length of the one subtitle"
print " --delta=<milliseconds> default: 500"
print " -e <encoding> Encoding of input and output files."
print " --encoding=<encoding> default: utf_8"
def main():
try:
opts, args = getopt.getopt(sys.argv[1:], 'hd:e:', ["help", "encoding=", "delta="])
except getopt.GetoptError, err:
print str(err)
usage()
sys.exit(2)
#Settings default values
delta = SubRipTime(milliseconds=500)
encoding="utf_8"
#-
if len(args) <> 3:
usage()
sys.exit(2)
for o, a in opts:
if o in ("-d", "--delta"):
delta = SubRipTime(milliseconds=int(a))
elif o in ("-e", "--encoding"):
encoding = a
elif o in ("-h", "--help"):
usage()
sys.exit()
subs_a = SubRipFile.open(args[0], encoding=encoding)
subs_b = SubRipFile.open(args[1], encoding=encoding)
out = merge_subtitle(subs_a, subs_b, delta)
out.save(args[2], encoding=encoding)
if __name__ == "__main__":
main()
Running
a = pysrt.open('tests/static/utf-8.srt')
print a[1]
Gives me:
<ipython-input-11-e9d376687425> in <module>()
----> 1 print a[1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 51: ordinal not in range(128)```
Running `print a[1].__str__()` is fine though, which puzzles me. When dealing with Unicode, I gather that `str(text)` and `str(poition)` in the init method of SubRipItem should not be used, but I am not so sure. I tried fixing this, but since pysrt wants to support both python 2 and 3, I am not sure how to go about it and failed miserably so far. Do you have any suggestings?
Hi Jean,
the first item is bypassed, i just tried it on one file so maybe this file is misformatted, but i don't think so.
Thanks
I try to validate subtitles with this:
import codecs
import pysrt
from charade.universaldetector import UniversalDetector
def is_valid_subtitle(path):
u = UniversalDetector()
for line in open(path, 'rb'):
u.feed(line)
u.close()
encoding = u.result['encoding']
source_file = codecs.open(path, 'rU', encoding=encoding, errors='replace')
try:
for _ in pysrt.SubRipFile.stream(source_file, error_handling=pysrt.SubRipFile.ERROR_RAISE):
pass
except pysrt.Error:
return False
except UnicodeEncodeError: # Workaround for https://github.com/byroot/pysrt/issues/12
pass
return True
But unfortunately for some subtitles it fails even though the file is a valid subtitle. For example this one: https://docs.google.com/open?id=0B2q9iBGZdj6qOXZrbFpiV2ozOHc
I think there should be different kind of InvalidItem error. It could be subclassed to raise, in this case, EmptyText error.
Although, I'm not sure this should raise an error at all because this doesn't mean the item is invalid, it just has its text empty.
I have this subs:
1
00:00:00,058 --> 00:00:02,942
Previously on AMC's
Breaking Bad...
2
00:00:02,984 --> 00:00:04,513
Sooner or later
someone is gonna flip.
If I shift this subs -1 seconds, the resulting file has all this subs at 00:00:00,000
1
00:00:00,000 --> 00:00:00,000
Previously on AMC's
Breaking Bad...
2
00:00:00,000 --> 00:00:00,000
Sooner or later
someone is gonna flip.
3
00:00:00,000 --> 00:00:00,000
I've got nine guys.
They were part of the
I think it should be 00:00:00,000 at first and then -1 second.
Would it be possible to release a new version with the changes currently in master? Just so that it's possible to use the new features.
Hello,
I was looking for a dual subtitle feature. I didn't found.
But I found this fantastic package, I just would like to share my merge script here.
For me it works perfectly on my VLC player.
Perhaps it might be added to your command line tool.
Usage:
./subtitle_merge.py en.srt de.srt en_de.srt
Best regards,
Karel.
See the code below....
If you receive some times as strings, split them into parts and try calling SubRipFile.slice
with a dict of those parts, e.g.:
subs.slice(starts_after={'minutes': '11', 'seconds': '22'})
then you'll get a rather cryptic error: TypeError: '>' not supported between instances of 'SubRipTime' and 'dict'
. This is caused by a different error: TypeError: '>' not supported between instances of 'str' and 'int'
in ComparableMixin._compare
. Which in turn means that the ordinal
field in one of the objects is a string.
The root cause is that, when passed strings as the arguments, the SubRipTime constructor multiplies them by HOURS_RATIO, MINUTES_RATIO and SECONDS_RATIO respectively, and adds them all together, silently resulting in a long-ass string instead of a number.
To either handle the use-case or make the output more informative, it would be prudent to convert the arguments to numbers, or to explicitly forbid non-number arguments. IMO the first approach is better, especially since Python itself would then complain if the arguments really don't contain numbers. One other question to decide is whether the constructor should accept fractional times and thus convert to float and not just int. Milliseconds should probably be integers, but I might want to cut e.g. 1.5 hours into a film. Some workflows involving arithmetic might even produce fractional milliseconds, which of course should still be cut to integers after parsing.
Alternatively, or in addition, you might want to pass exceptions through in ComparableMixin._compare
instead of returning NotImplemented
: firstly, AttributeError and TypeError may have more possible causes when calling _cmpkey
than just the two envisioned cases. Secondly, use of a mixin suggests more complex workflows than plain comparison of two values—as in the very case of SubRipTime—while Python's resulting message is rather unenlightening. So passing exceptions through seems to be more informative, as they would properly indicate invalid use of ComparableMixin.
I can't run srt with this file http://dl.dropbox.com/u/1788271/Bones.S07E01.HDTVRip.srt
It is cp1251
I have the following error:
Traceback (most recent call last):
File "/usr/local/bin/srt", line 9, in <module>
load_entry_point('pysrt==0.4.1', 'console_scripts', 'srt')()
File "/usr/local/lib/python2.7/dist-packages/pysrt/commands.py", line 190, in main
SubRipShifter().run(sys.argv[1:])
File "/usr/local/lib/python2.7/dist-packages/pysrt/commands.py", line 118, in run
self.arguments.action()
File "/usr/local/lib/python2.7/dist-packages/pysrt/commands.py", line 164, in break_lines
self.input_file.break_lines(self.arguments.length)
File "/usr/local/lib/python2.7/dist-packages/pysrt/commands.py", line 177, in input_file
encoding=encoding, error_handling=SubRipFile.ERROR_LOG)
File "/usr/local/lib/python2.7/dist-packages/pysrt/srtfile.py", line 131, in open
new_file.read(source_file, error_handling=error_handling)
File "/usr/local/lib/python2.7/dist-packages/pysrt/srtfile.py", line 159, in read
self.extend(self.stream(source_file, error_handling=error_handling))
File "/usr/lib/python2.7/UserList.py", line 88, in extend
self.data.extend(other)
File "/usr/local/lib/python2.7/dist-packages/pysrt/srtfile.py", line 190, in stream
yield SubRipItem.from_lines(source)
File "/usr/local/lib/python2.7/dist-packages/pysrt/srtitem.py", line 79, in from_lines
return cls(index, start, end, body, position)
File "/usr/local/lib/python2.7/dist-packages/pysrt/srtitem.py", line 21, in __init__
self.index = int(index)
UnicodeEncodeError: 'decimal' codec can't encode character u'\ufeff' in position 0: invalid decimal Unicode string
part = subs.slice(starts_after={'minutes': 2, seconds': 30}, ends_before={'minutes': 3, 'seconds': 40})
part.shift(seconds=-2)
should be (seconds part ' missing)
part = subs.slice(starts_after={'minutes': 2, 'seconds': 30}, ends_before={'minutes': 3, 'seconds': 40})
part.shift(seconds=-2)
Even though it is a small library, a few pages of documentation would be welcome.
Hi,
I encoutered this error:
Traceback (most recent call last):
File "/home/antoine/workspace/python/subliminal/subliminal/api.py", line 250, in download_best_subtitles
subtitle_text = provider.download_subtitle(subtitle)
File "/home/antoine/workspace/python/subliminal/subliminal/providers/podnapisi.py", line 161, in download_subtitle
if not is_valid_subtitle(subtitle_text):
File "/home/antoine/workspace/python/subliminal/subliminal/subtitle.py", line 106, in is_valid_subtitle
pysrt.from_string(subtitle_text, error_handling=pysrt.ERROR_RAISE)
File "/home/antoine/.virtualenvs/subliminal/local/lib/python2.7/site-packages/pysrt/srtfile.py", line 188, in from_string
new_file.read(source.splitlines(True), error_handling=error_handling)
File "/home/antoine/.virtualenvs/subliminal/local/lib/python2.7/site-packages/pysrt/srtfile.py", line 202, in read
self.extend(self.stream(source_file, error_handling=error_handling))
File "/usr/lib/python2.7/UserList.py", line 88, in extend
self.data.extend(other)
File "/home/antoine/.virtualenvs/subliminal/local/lib/python2.7/site-packages/pysrt/srtfile.py", line 243, in stream
yield SubRipItem.from_lines(source)
File "/home/antoine/.virtualenvs/subliminal/local/lib/python2.7/site-packages/pysrt/srtitem.py", line 66, in from_lines
return cls(index, start, end, body, position)
File "/home/antoine/.virtualenvs/subliminal/local/lib/python2.7/site-packages/pysrt/srtitem.py", line 28, in __init__
self.end = SubRipTime.coerce(end or 0)
File "/home/antoine/.virtualenvs/subliminal/local/lib/python2.7/site-packages/pysrt/srttime.py", line 128, in coerce
return cls.from_string(other)
File "/home/antoine/.virtualenvs/subliminal/local/lib/python2.7/site-packages/pysrt/srttime.py", line 170, in from_string
return cls(*(int(i) for i in items))
File "/home/antoine/.virtualenvs/subliminal/local/lib/python2.7/site-packages/pysrt/srttime.py", line 170, in <genexpr>
return cls(*(int(i) for i in items))
ValueError: invalid literal for int() with base 10: '197?'
You can download the subtitle causing this here : http://podnapisi.net/static/podnapisi/c/1/8/c18482a60f7ce6f94a8a33947aa723e6c3bd2e18.zip
https://pypi.python.org/packages/source/p/pysrt/pysrt-0.5.1.tar.gz#md5=c5d44c8abac6089cb8cd03ddee26faa5 does not have tests/static/, so 'nosetests --with-coverage --cover-package=pysrt' fails.
maybe you can add some method like:
subs.addAfter(LineNumber, StartTime, EndTime, SubtitleContent)
subs.addBefore(LineNumber, StartTime, EndTime, SubtitleContent)
subs.addAfterLastLine(StartTime, EndTime, SubtitleContent)
subs.addAfterLastLine(duringtime, SubtitleContent)
This could be used on a failed attempt to open a file due to UnicodeDecodeError. charade would be called to detect the encoding and a second attempt to open the file would be done.
This would be the default behavior and suppressed if encoding argument is not None
.
What do you think?
I'm planning tu use this inside a project using Plone (a know CMS made with Zope and Python). The last stable version of Plone is using python 2.4, so the current egg is not compatible due to use of try-except-finally.
Changing the block at line 87 of pysrtfile.py like this make it usable!
try:
try:
new_item = SubRipItem.from_string(source)
new_file.append(new_item)
except InvalidItem, error:
cls._handle_error(error, error_handling, path, index)
finally:
string_buffer.truncate(0)
installation
pip install pysrt
run Windows command
C:\usr\local\python27\Scripts\srt.exe -i shift -65s sample.srt
or linux
srt -i shift -65s sample.srt
usage: srt shift [-h] ←[4moffset←[0m
srt-script.py shift: error: too few arguments
i'm try running
C:\usr\local\python27\Scripts\srt.exe -i shift "-65s" sample.srt
with same error result
and
C:\usr\local\python27\Scripts\srt.exe -i shift '-65s' sample.srt
just shift sample.srt with posifive offset exactly as
C:\usr\local\python27\Scripts\srt.exe -i shift 65s sample.srt
Maybe it can just be an empty egg that require pysrt >= 1.0.0
Installing tests to /usr/lib64/python2.7/site-packages/ is wrong, they should be installed to /usr/lib64/python2.7/site-packages/pysrt. Else file collisions will happen.
Would be great to do the following:
import pysrt
pysrt.is_valid('/path/to/Inception.srt')
What it would do is check if the file is a subtitle or not, with the correct format.
I tried to open an invalid subtitle file and it raised a ValueError somewhere in the code, would be better to have it raise an "InvalidFormatError" or something.
>>> SubRipFile.open('test.srt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/pysrt/srtfile.py", line 127, in open
new_file.read(source_file, error_handling=error_handling)
File "/usr/local/lib/python2.6/dist-packages/pysrt/srtfile.py", line 155, in read
self.extend(self.stream(source_file, error_handling=error_handling))
File "/usr/lib/python2.6/UserList.py", line 88, in extend
self.data.extend(other)
File "/usr/local/lib/python2.6/dist-packages/pysrt/srtfile.py", line 186, in stream
yield SubRipItem.from_lines(source)
File "/usr/local/lib/python2.6/dist-packages/pysrt/srtitem.py", line 56, in from_lines
start, end, position = cls.split_timestamps(lines[1])
File "/usr/local/lib/python2.6/dist-packages/pysrt/srtitem.py", line 62, in split_timestamps
start, end_and_position = line.split('-->')
ValueError: need more than 1 value to unpack
I've a problem with the from_string API; an Unicode error I'm not able to fix.
The file is there (but for application reason I can't use the SubRipFile.open method):
http://releases.flowplayer.org/data/buffalo.srt
Some tested examples:
from pysrt import SubRipFile
p = '/Users/luca/Documents/buffalo.srt'
SubRipFile.open(p)
Traceback (most recent call last):
File "", line 1, in ?
File "/Users/luca/Library/Buildout/eggs/pysrt-0.2.4-py2.4.egg/pysrt/srtfile.py", line 81, in open
source = unicode(string_buffer.read(), new_file.encoding)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 47-48: invalid data
SubRipFile.open(p, encoding='latin1')
[... THIS IS OK, IT WORKS ...]
st = open(p).read()
SubRipFile.from_string(st)
Traceback (most recent call last):
File "", line 1, in ?
File "/Users/luca/Library/Buildout/eggs/pysrt-0.2.4-py2.4.egg/pysrt/srtfile.py", line 107, in from_string
return cls.open(file_descriptor=StringIO(source))
File "/Users/luca/Library/Buildout/eggs/pysrt-0.2.4-py2.4.egg/pysrt/srtfile.py", line 81, in open
source = unicode(string_buffer.read(), new_file.encoding)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 49-50: invalid data
SubRipFile.from_string(st.decode('iso-8859-1'))
Traceback (most recent call last):
File "", line 1, in ?
File "/Users/luca/Library/Buildout/eggs/pysrt-0.2.4-py2.4.egg/pysrt/srtfile.py", line 107, in from_string
return cls.open(file_descriptor=StringIO(source))
File "/Users/luca/Library/Buildout/eggs/pysrt-0.2.4-py2.4.egg/pysrt/srtfile.py", line 81, in open
source = unicode(string_buffer.read(), new_file.encoding)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 17348-17349: invalid data
SubRipFile.from_string(st.decode('iso-8859-1').encode('utf-8'))
[]
Any tips? right now I can skip this problem using a temp file, but however it seems there are some problem in the method.
Hello
easy_install does not resolve chardet dependency by itself. It has to be installed manually before.
Regards,
Following the documentation I tried this:
from pysrt import SubRipFile, SubRipItem, SubRipTime
subs = SubRipFile('buffalo.srt')
subs
['b', 'u', 'f', 'f', 'a', 'l', 'o', '.', 's', 'r', 't']
subs[0]
'b'
subs[1]
'u'
subs[2]
'f'
The "buffalo.srt" file is the one used for a demo of the Flowplayer Flash player, at this URL:
http://releases.flowplayer.org/data/buffalo.srt
I can't understand if something is goind bad or the srt used is someway corrupted or not well formed.
Hello,
I am working through an online class and trying to produce notes based on the instructional video content. Since many of the concepts covered in these videos are worth taking note of, I'm finding myself writing out nearly every line spoken by the instructor. Obviously, this process is laborious and extremely time-consuming. I am wondering if there is an easier way to extract the text from these videos using an srt tool to help parse and modify the text.
The syntax of the transcript files for each video are identical to standard srt format. Here's an example:
1
00:00:00,710 --> 00:00:03,220
Rob just showed us how we can
make things accessible to
2
00:00:03,220 --> 00:00:05,970
anyone who can't use a mouse or
pointing device.
3
00:00:05,970 --> 00:00:09,130
Whether that's because it's any
type of physical impairment or
4
00:00:09,130 --> 00:00:11,510
a technology issue or
simply personal preference.
Does pysrt currently provide any tools for modifying text content so that it's formatted into a more readable format? To clarify, for the above example, I would like to remove blank lines, lines beginning with the record number and time-stamp, and then join the remaining lines, adding spaces after periods, like so:
Rob just showed us how we can make things accessible to anyone who can't use a mouse or pointing device. Whether that's because it's any type of physical impairment or a technology issue or simply personal preference.
I am interested in creating the following output from the example above and being able to apply such a modification to more of the files in the series. In my current situation, I am really pretty rusty working with python, though believe this capability could be pretty easily implemented with
an understanding of common string methods.
Can anyone contributing to this project let me know how this is done or if the functionality already exists in pysrt?
Thanks!
charade has been merged into chardet and is not maintained anymore so you might want to switch :)
Right now the SubRipItem tests contain a trailing newline, like this:
u'MILES:\nNo one stops us.\n'
u'No one ever has.\n'
When changing the text you need to ensure to keep (or re-add) that trailing newline since otherwise the next sub in the file will come directly after the edited one instead of after the edited one plus one newline.
This separating newline is not really part of the subtitle itself though, so it shouldn't be there and be added automatically when saving the subtitles to a file.
Almost ~40% of subtitles fail to parse because of unicode errors.
Traceback (most recent call last):
File "/home/username/bin/nocc", line 11, in <module>
load_entry_point('nocc', 'console_scripts', 'nocc')()
File "/home/username/projects/nocc/nocc/nocc.py", line 155, in main
nocc(fn)
File "/home/username/projects/nocc/nocc/nocc.py", line 47, in nocc
subs = pysrt.open(filename)
File "/home/username/.local/venvs/nocc/lib/python3.5/site-packages/pysrt/srtfile.py", line 153, in open
new_file.read(source_file, error_handling=error_handling)
File "/home/username/.local/venvs/nocc/lib/python3.5/site-packages/pysrt/srtfile.py", line 180, in read
self.eol = self._guess_eol(source_file)
File "/home/username/.local/venvs/nocc/lib/python3.5/site-packages/pysrt/srtfile.py", line 257, in _guess_eol
first_line = cls._get_first_line(string_iterable)
File "/home/username/.local/venvs/nocc/lib/python3.5/site-packages/pysrt/srtfile.py", line 269, in _get_first_line
first_line = next(iter(string_iterable))
File "/home/username/.local/venvs/nocc/lib/python3.5/codecs.py", line 711, in __next__
return next(self.reader)
File "/home/username/.local/venvs/nocc/lib/python3.5/codecs.py", line 642, in __next__
line = self.readline()
File "/home/username/.local/venvs/nocc/lib/python3.5/codecs.py", line 555, in readline
data = self.read(readsize, firstline=True)
File "/home/username/.local/venvs/nocc/lib/python3.5/codecs.py", line 501, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 4: invalid start byte
Please enable "errors=ignore" in open()
Would improve support for languages and UTF-8 a lot.
Is it planned?
Edit: sorry, just found python3 branch.
The pysrt library should also minimally support the WebVTT subtitle format for input and output, as it is so very close to the original SRT format, and one would need a program to convert an SRT to the WebVTT format and vice versa for HTML5 videos.
Furthermore, the counter field before the time code should also be made optional for SRT too, as I have seen subtitles without the counter (only the timestamp); they also currently work in many players as is, but pysrt just returns an empty list for such files unless I add the 1, 2, 3... by hand before the timestamps.
The following is a perfectly valid SRT file (at least anything that handles them will open it just fine), let's call it subs.srt:
3
00:08:17,317 --> 00:08:19,328
It is life or death, James.
The following happens
a = pysrt.open('subs.srt')
a[0].index
>>>> 3
a.clean_indexes()
a[0].index
>>>> 1
Should not clean_indexes() be called after opening the subtitles by default? There is also the issue that index after calling clean_indexes() starts at 1 and python lists start at 0, but I do not think it would be wise to do anything about it.
Hi,
I'm trying to create a Debian/Ubuntu package for your library, and it'd be really helpful if you could create a git tag for release you made.
Thanks in advance!
Hi, and first thanks for this handy library. I don't know if my error is due to a bad srt
file or if it's a bug, but whith a srt file like this:
1
00:22:10,440 --> 00:22:15,195
Je suis coincée au boulot,
j'aurai 10 minutes de retard.
305
00:22:15,960 --> 00:22:19,157
John, je suis dans les embouteillages.
La 5e Avenue est en travaux.
When I run the command: srt shift 35s file_with_empty_line.srt
, I've got the following error:
PySRT-InvalidItem(line 5):
Traceback (most recent call last):
File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 212, in stream
yield SubRipItem.from_lines(source)
File "/home/john/Documents/git/pysrt/pysrt/srtitem.py", line 83, in from_lines
raise InvalidItem()
pysrt.srtexc.InvalidItem: j'aurai 10 minutes de retard.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/bin/srt", line 9, in <module>
load_entry_point('pysrt', 'console_scripts', 'srt')()
File "/home/john/Documents/git/pysrt/pysrt/commands.py", line 222, in main
SubRipShifter().run(sys.argv[1:])
File "/home/john/Documents/git/pysrt/pysrt/commands.py", line 140, in run
self.arguments.action()
File "/home/john/Documents/git/pysrt/pysrt/commands.py", line 161, in shift
self.input_file.shift(milliseconds=self.arguments.time_offset)
File "/home/john/Documents/git/pysrt/pysrt/commands.py", line 205, in input_file
encoding=encoding, error_handling=SubRipFile.ERROR_LOG)
File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 153, in open
new_file.read(source_file, error_handling=error_handling)
File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 181, in read
self.extend(self.stream(source_file, error_handling=error_handling))
File "/opt/miniconda3/lib/python3.5/collections/__init__.py", line 1091, in extend
self.data.extend(other)
File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 215, in stream
cls._handle_error(error, error_handling, index)
File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 311, in _handle_error
sys.stderr.write(error.args[0].encode('ascii', 'replace'))
TypeError: write() argument must be str, not bytes
This seems geared towards editing existing files. Is there a way to create a new .srt file in python with the existing functions?
Please add support for Python 3
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.