Giter VIP home page Giter VIP logo

Comments (20)

byroot avatar byroot commented on July 25, 2024

Strange, I'm able to shift it without encoding error.

srt shift 20 russian.srt

Can you paste the whole command you typed ?

from pysrt.

byroot avatar byroot commented on July 25, 2024

Well, a month without reply -> I close this issue.

Feel free to reopen it if you still have a problem.

from pysrt.

limpbrains avatar limpbrains commented on July 25, 2024

Hi, sorry for the long responce

srt shift 40s 33.srt
Traceback (most recent call last):
File "/usr/local/bin/srt", line 9, in
load_entry_point('pysrt==0.4.1', 'console_scripts', 'srt')()
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 192, in main
SubRipShifter().run(sys.argv[1:])
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 118, in run
self.arguments.action()
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 136, in shift
self.input_file.shift(milliseconds=self.arguments.time_offset)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/commands.py", line 179, in input_file
encoding=encoding, error_handling=SubRipFile.ERROR_LOG)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 127, in open
new_file.read(source_file, error_handling=error_handling)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 155, in read
self.extend(self.stream(source_file, error_handling=error_handling))
File "/usr/lib/python2.7/UserList.py", line 88, in extend
self.data.extend(other)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtfile.py", line 186, in stream
yield SubRipItem.from_lines(source)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtitem.py", line 58, in from_lines
return cls(index, start, end, body, position)
File "/data/share/_films/Game of Thrones_S02E02/src/pysrt/pysrt/srtitem.py", line 21, in init
self.index = int(index)
UnicodeEncodeError: 'decimal' codec can't encode character u'\ufeff' in position 0: invalid decimal Unicode string

python -V
Python 2.7.2+

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.10
Release: 11.10
Codename: oneiric

from pysrt.

byroot avatar byroot commented on July 25, 2024

Hum, very strange... so it always happen whatever the subtitle file ?

And how did you installed it ? Beacause /data/share/_films/Game of Thrones_S02E02/src/ is a very strange location...

from pysrt.

limpbrains avatar limpbrains commented on July 25, 2024

I've only tried on a few files, all russian, UTF8.
installed from git
pip install -e git+https://github.com/byroot/pysrt.git#egg=pysrt

from pysrt.

byroot avatar byroot commented on July 25, 2024

Ok, I still can't reproduce but now I'm almost sure that it's a BOM issue...

I will ask a friend on ubuntu to test that

Did you tried the version released on PyPI ?
pip install --upgrade pysrt

from pysrt.

limpbrains avatar limpbrains commented on July 25, 2024

I confirm it is a BOM issue.
I've successfully edited file without BOM created with notepad++
also I've tried the following command
srt -e utf_8_sig ...
but failed with same error

from pysrt.

byroot avatar byroot commented on July 25, 2024

Pysrt is supposed to handle BOM correctly...

And the file you gived to me is in cp1252, why did it have an utf-8 BOM ?
Can you send me another file again ?

from pysrt.

Diaoul avatar Diaoul commented on July 25, 2024

I'm having the same issue
File is here: https://docs.google.com/open?id=0B2q9iBGZdj6qN29uUzBBQXNJM2c

from pysrt.

byroot avatar byroot commented on July 25, 2024

I finally found the issue, it was because chardet returned "UTF-8" and the encodings module was only aware of "utf-8".

My bad ...

from pysrt.

Diaoul avatar Diaoul commented on July 25, 2024

Is this fixed in 0.4.4? Because I still have this error

from pysrt.

byroot avatar byroot commented on July 25, 2024

I Think so. You still have the issue with this same file and pysrt 0.4.4 ?

from pysrt.

byroot avatar byroot commented on July 25, 2024

Oh shit ... confirmed, I'll fix that right now.

from pysrt.

byroot avatar byroot commented on July 25, 2024

Oh, I just forgot to release ...

from pysrt.

byroot avatar byroot commented on July 25, 2024

0.4.5 released with the fix.

from pysrt.

Diaoul avatar Diaoul commented on July 25, 2024

Thanks, that was fast :)

from pysrt.

Diaoul avatar Diaoul commented on July 25, 2024

I'm still having an error ๐Ÿ˜ข
I added a print statement to see what's in lines here and I got this:

[u'\ufeff1\r\n', u'00:00:01,677 --> 00:00:04,145\r\n', u'Alors, sur quel genre de croisi\xe8re\r\n', u'allez-vous embarquer ?\r\n']

from pysrt.

Diaoul avatar Diaoul commented on July 25, 2024

Of course int(u'\ufeff1\r\n') fails
File can be downloaded on Addic7ed

from pysrt.

Diaoul avatar Diaoul commented on July 25, 2024

Sample code to reproduce the error:

from charade.universaldetector import UniversalDetector
import codecs
import pysrt

def is_valid_subtitle(path):
    u = UniversalDetector()
    for line in open(path, 'rb'):
        u.feed(line)
    u.close()
    encoding = u.result['encoding']
    source_file = codecs.open(path, 'rU', encoding=encoding, errors='replace')
    try:
        for _ in pysrt.SubRipFile.stream(source_file, error_handling=pysrt.SubRipFile.ERROR_RAISE):
            pass
    except pysrt.Error as e:
        if e.args[0] < 50:  # Error occurs within the 50 first lines
            return False
#    except UnicodeEncodeError:  # Workaround for https://github.com/byroot/pysrt/issues/12
#        pass
    return True

from pysrt.

byroot avatar byroot commented on July 25, 2024

Oh ! it make sense now. If you open the file yourself pysrt do not strip the BOM.

Anyway chardet is integrated inside pysrt now.

Try something like:

def is_valid_subtitle(path):
    source_file = pysrt.SubRipFile._open_unicode_file(path)
    try:
        for _ in pysrt.SubRipFile.stream(source_file, error_handling=pysrt.SubRipFile.ERROR_RAISE):
            pass
    except pysrt.Error as e:
        if e.args[0] < 50:  # Error occurs within the 50 first lines
            return False
#    except UnicodeEncodeError:  # Workaround for https://github.com/byroot/pysrt/issues/12
#        pass
    return True

from pysrt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.