Comments (20)
Strange, I'm able to shift it without encoding error.
srt shift 20 russian.srt
Can you paste the whole command you typed ?
from pysrt.
Well, a month without reply -> I close this issue.
Feel free to reopen it if you still have a problem.
from pysrt.
Hi, sorry for the long responce
srt shift 40s 33.srt
Traceback (most recent call last):
File "/usr/local/bin/srt", line 9, in
load_entry_point('pysrt==0.4.1', 'console_scripts', 'srt')()
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 192, in main
SubRipShifter().run(sys.argv[1:])
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 118, in run
self.arguments.action()
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 136, in shift
self.input_file.shift(milliseconds=self.arguments.time_offset)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 179, in input_file
encoding=encoding, error_handling=SubRipFile.ERROR_LOG)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 127, in open
new_file.read(source_file, error_handling=error_handling)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 155, in read
self.extend(self.stream(source_file, error_handling=error_handling))
File "/usr/lib/python2.7/UserList.py", line 88, in extend
self.data.extend(other)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 186, in stream
yield SubRipItem.from_lines(source)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtitem.py", line 58, in from_lines
return cls(index, start, end, body, position)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtitem.py", line 21, in init
self.index = int(index)
UnicodeEncodeError: 'decimal' codec can't encode character u'\ufeff' in position 0: invalid decimal Unicode string
python -V
Python 2.7.2+
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.10
Release: 11.10
Codename: oneiric
from pysrt.
Hum, very strange... so it always happen whatever the subtitle file ?
And how did you installed it ? Beacause /data/share/_films/Game of Thrones_S02E02/src/
is a very strange location...
from pysrt.
I've only tried on a few files, all russian, UTF8.
installed from git
pip install -e git+https://github.com/byroot/pysrt.git#egg=pysrt
from pysrt.
Ok, I still can't reproduce but now I'm almost sure that it's a BOM issue...
I will ask a friend on ubuntu to test that
Did you tried the version released on PyPI ?
pip install --upgrade pysrt
from pysrt.
I confirm it is a BOM issue.
I've successfully edited file without BOM created with notepad++
also I've tried the following command
srt -e utf_8_sig ...
but failed with same error
from pysrt.
Pysrt is supposed to handle BOM correctly...
And the file you gived to me is in cp1252, why did it have an utf-8 BOM ?
Can you send me another file again ?
from pysrt.
I'm having the same issue
File is here: https://docs.google.com/open?id=0B2q9iBGZdj6qN29uUzBBQXNJM2c
from pysrt.
I finally found the issue, it was because chardet returned "UTF-8"
and the encodings
module was only aware of "utf-8"
.
My bad ...
from pysrt.
Is this fixed in 0.4.4? Because I still have this error
from pysrt.
I Think so. You still have the issue with this same file and pysrt 0.4.4 ?
from pysrt.
Oh shit ... confirmed, I'll fix that right now.
from pysrt.
Oh, I just forgot to release ...
from pysrt.
0.4.5 released with the fix.
from pysrt.
Thanks, that was fast :)
from pysrt.
I'm still having an error ๐ข
I added a print statement to see what's in lines
here and I got this:
[u'\ufeff1\r\n', u'00:00:01,677 --> 00:00:04,145\r\n', u'Alors, sur quel genre de croisi\xe8re\r\n', u'allez-vous embarquer ?\r\n']
from pysrt.
Of course int(u'\ufeff1\r\n')
fails
File can be downloaded on Addic7ed
from pysrt.
Sample code to reproduce the error:
from charade.universaldetector import UniversalDetector
import codecs
import pysrt
def is_valid_subtitle(path):
u = UniversalDetector()
for line in open(path, 'rb'):
u.feed(line)
u.close()
encoding = u.result['encoding']
source_file = codecs.open(path, 'rU', encoding=encoding, errors='replace')
try:
for _ in pysrt.SubRipFile.stream(source_file, error_handling=pysrt.SubRipFile.ERROR_RAISE):
pass
except pysrt.Error as e:
if e.args[0] < 50: # Error occurs within the 50 first lines
return False
# except UnicodeEncodeError: # Workaround for https://github.com/byroot/pysrt/issues/12
# pass
return True
from pysrt.
Oh ! it make sense now. If you open the file yourself pysrt do not strip the BOM.
Anyway chardet is integrated inside pysrt now.
Try something like:
def is_valid_subtitle(path):
source_file = pysrt.SubRipFile._open_unicode_file(path)
try:
for _ in pysrt.SubRipFile.stream(source_file, error_handling=pysrt.SubRipFile.ERROR_RAISE):
pass
except pysrt.Error as e:
if e.args[0] < 50: # Error occurs within the 50 first lines
return False
# except UnicodeEncodeError: # Workaround for https://github.com/byroot/pysrt/issues/12
# pass
return True
from pysrt.
Related Issues (20)
- Phantom pointers when assigning fields HOT 1
- In-place mode does not write the entire file HOT 2
- Inserting Subtitle Snippet
- Can't parse text with empty line HOT 2
- time passed to at() will not find caption if the time passed in equals start time of caption
- UnicodeDecodeError
- Script: parsing transcript .srt files into readable text HOT 1
- the latest code in master was not released
- SubRipTime.__init__ should maybe cast the arguments to int or float (aka โTypeError: '>' not supported between instances of 'SubRipTime' and 'dict'โ in slice())
- Faster loading HOT 2
- Tag v1.1.2 HOT 1
- UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' HOT 1
- Weird version ??
- text_without_tags should also remove subtitle tags
- pysrt.open() returns empty list
- Captions whose text begins with Line Separator character are parsed as blank string HOT 1
- pysrt fails to build with Python 3.11.0a1
- Tests fail with Python 3.12 HOT 2
- Subtitle synchronization with input video files.
- SubRipTime.to_time() does not support times over 24 hours
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pysrt.