Giter VIP home page Giter VIP logo

Comments (4)

senatet avatar senatet commented on June 19, 2024 1

Hi.

I am also seeing this bug when attempting to use gensim, which uses smart_open to open gz compressed files... here is a minimal reproduction of the issue:


➜  /tmp virtualenv venv
New python executable in venv/bin/python
Installing setuptools, pip...done.
➜  /tmp source venv/bin/activate
(venv)➜  /tmp pip install -U pip smart_open
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting pip from https://pypi.python.org/packages/b6/ac/7015eb97dc749283ffdec1c3a88ddb8ae03b8fad0f0e611408f196358da3/pip-9.0.1-py2.py3-none-any.whl#md5=297dbd16ef53bcef0447d245815f5144
  Using cached pip-9.0.1-py2.py3-none-any.whl
Collecting smart-open
  Using cached smart_open-1.5.0.tar.gz
Collecting boto>=2.32 (from smart-open)
  Using cached boto-2.46.1-py2.py3-none-any.whl
Collecting bz2file (from smart-open)
  Using cached bz2file-0.98.tar.gz
Collecting requests (from smart-open)
  Using cached requests-2.13.0-py2.py3-none-any.whl
Installing collected packages: requests, bz2file, boto, smart-open, pip

  Running setup.py install for bz2file

  Running setup.py install for smart-open
  Found existing installation: pip 6.0.8
    Uninstalling pip-6.0.8:
      Successfully uninstalled pip-6.0.8

Successfully installed boto-2.46.1 bz2file-0.98 pip-9.0.1 requests-2.13.0 smart-open-1.5.0
(venv)➜  /tmp 
(venv)➜  /tmp echo 'test text' | gzip > test_text.gz
(venv)➜  /tmp zcat test_text.gz
test text
(venv)➜  /tmp python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import smart_open
>>> fname = './test_text.gz'
>>> fd = smart_open.smart_open(fname)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 138, in smart_open
    return file_smart_open(parsed_uri.uri_path, mode)
  File "/tmp/venv/local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 642, in file_smart_open
    return compression_wrapper(open(fname, mode), fname, mode)
  File "/tmp/venv/local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 630, in compression_wrapper
    return make_closing(GzipFile)(file_obj, mode)
  File "/usr/lib/python2.7/gzip.py", line 94, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: coercing to Unicode: need string or buffer, file found

from smart_open.

tmylk avatar tmylk commented on June 19, 2024

Reproduced. Working on a fix and test to read/write compressed files. In particular, it broke gensim Travis tests.

For completeness, could you paste a code snippet that breaks for you in this version.

CC @robottwo

from smart_open.

simonseed avatar simonseed commented on June 19, 2024
model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)

from smart_open.

tmylk avatar tmylk commented on June 19, 2024

Thanks for reporting. Fixed in #110 and released in 1.5.1 on pypi

from smart_open.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.