Giter VIP home page Giter VIP logo

fs.archive's People

Contributors

althonos avatar mattalxndr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fs.archive's Issues

TarReadFS errors when parsing a file created with dotslash paths

If I create an archive like this:

touch file1
mkdir sub
touch sub/file2
tar cf test.tar ./file ./sub

then these don't work:

tarfs.exists('file1')
"""
Failure
Traceback (most recent call last):
  File "/[snip]/fs.archive/tests/test_tarfs.py", line 202, in test_exists
    self.assertTrue(self.tarfs.exists('file1'))
AssertionError: False is not true
"""

self.tarfs.walk.info()
"""
Error
Traceback (most recent call last):
  File "/[snip]/fs.archive/.venv/lib/python3.10/site-packages/fs/archive/tarfs/__init__.py", line 131, in getinfo
    tar_info = self._members[_path]
KeyError: 'file1'
"""


self.tarfs.listdir('/')
"""
Error
Traceback (most recent call last):
  File "/[snip]/fs.archive/.venv/lib/python3.10/site-packages/fs/archive/tarfs/__init__.py", line 131, in getinfo
    tar_info = self._members[_path]
KeyError: 'sub'
"""

I know this bug report is incomplete, but I have a pull request to fix it, which I'm pushing now.

Error when closing any 7z archive

This happens with almost every 7z archive that contains anything.

With archive.7z being an archive in the current directory:

myfs = fs.archive.open_archive(".","archive.7z")
myfs.close()

results in

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/krateng/.local/lib/python3.10/site-packages/fs/archive/base.py", line 278, in close
    self._saver.save(self)
  File "/home/krateng/.local/lib/python3.10/site-packages/fs/archive/base.py", line 58, in save
    self.to_stream(fs)
  File "/home/krateng/.local/lib/python3.10/site-packages/fs/archive/base.py", line 86, in to_stream
    self._to(temp, fs)
  File "/home/krateng/.local/lib/python3.10/site-packages/fs/archive/sevenzipfs/__init__.py", line 293, in _to
    _7z.worker.archive(_7z.fp, _7z.files, folder, deref=_7z.dereference)
  File "/home/krateng/.local/lib/python3.10/site-packages/py7zr/py7zr.py", line 1494, in archive
    foutsize, crc = self.write(fp, f, (f.is_symlink and not deref), folder)
  File "/home/krateng/.local/lib/python3.10/site-packages/py7zr/py7zr.py", line 1460, in write
    with f.origin.open(mode="rb") as fd:
  File "/home/krateng/.local/lib/python3.10/site-packages/fs/archive/sevenzipfs/__init__.py", line 37, in open
    return self.fs.openbin(self.path, mode)
  File "/home/krateng/.local/lib/python3.10/site-packages/fs/wrapfs.py", line 194, in openbin
    bin_file = _fs.openbin(_path, mode=mode, buffering=-1, **options)
  File "/home/krateng/.local/lib/python3.10/site-packages/fs/archive/wrap.py", line 149, in openbin
    return self._rfs.openbin(path, mode, buffering, **options)
  File "/home/krateng/.local/lib/python3.10/site-packages/fs/archive/sevenzipfs/__init__.py", line 194, in openbin
    return iocursor.Cursor(decompressed[relpath(_path)].getbuffer())
KeyError: 'randomfileinthearchive.ext'

It also happens when I provide a pre-initialized Filesystem object as the parent of the archive fs and when I use context manager instead of manually closing.

If I create a new archive with only one empty file in it, it works. If that file contains anything, it doesn't work anymore.

The same error happens when trying to read a file.

`AttributeError` for `_close_handle` on `close()` of `ISOReadFS` with non-seekable file handle

If I create an instance of fs.archive.isofs.ISOReadFS by passing it a file handle f where f.seekable() is False, then when I subsequently call ISOReadFS.close() I get:

[...]
    [...].close()
  File "[...]/site-packages/fs/archive/base.py", line 193, in close
    if self._close_handle:
AttributeError: 'ISOReadFS' object has no attribute '_close_handle'

I note that ArchiveReadFS.__init__() includes:

        elif hasattr(handle, 'read'):
            # Create the readable fs if the handle is readable
            if handle.readable() and handle.seekable():
                self._close_handle = options.get('close_handle', True)
                self._handle = handle

        else:
            raise errors.CreateFailed("cannot use {}".format(handle))

so I guess that passing a non-seekable handle results in _close_handle and _handle not being set. Perhaps the intent was that an exception should be thrown if the file is non-readable or non-seekable rather than just leaving some attributes unset? I was still able to read a file from the ISOReadFS though, so it seems like perhaps it doesn't really need a seekable handle.

For now I'm giving it a seekable handle anyway (by using a io.BytesIO wrapper) as a workaround.

Seen with these package versions:

fs==2.4.15
fs.archive==0.7.1
pycdlib==1.12.0

ModuleNotFoundError: No module named 'fs.archive.sevenzipfs.SevenZipFS'

Something seems to be broken in the loading of the 7z extra. I tested another extra (iso) and it seems to be ok.

Here is my test:

~ % python -m venv venv
~ % source venv/bin/activate
(venv) ~ % pip install "fs.archive[all]"
Collecting fs.archive[all]
Using cached fs.archive-0.7.0-py2.py3-none-any.whl (31 kB)
Collecting fs~=2.2
Using cached fs-2.4.14-py2.py3-none-any.whl (133 kB)
Collecting six~=1.10
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: setuptools>=38.3.0 in ./venv/lib/python3.10/site-packages (from fs.archive[all]) (58.1.0)
Collecting pycdlib~=1.8
Using cached pycdlib-1.12.0-py2.py3-none-any.whl (208 kB)
Collecting iocursor~=0.1
Using cached iocursor-0.1.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (47 kB)
Collecting py7zr~=0.17
Using cached py7zr-0.17.2-py3-none-any.whl (68 kB)
Collecting appdirs~=1.4.3
Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting pytz
Using cached pytz-2021.3-py2.py3-none-any.whl (503 kB)
Collecting multivolumefile>=0.2.3
Using cached multivolumefile-0.2.3-py3-none-any.whl (17 kB)
Collecting pyzstd>=0.14.4
Using cached pyzstd-0.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB)
Collecting pybcj>=0.5.0
Using cached pybcj-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (47 kB)
Collecting pyppmd>=0.17.0
Using cached pyppmd-0.17.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (127 kB)
Collecting brotli>=1.0.9
Using cached Brotli-1.0.9-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.7 MB)
Collecting texttable
Using cached texttable-1.6.4-py2.py3-none-any.whl (10 kB)
Collecting pycryptodomex>=3.6.6
Using cached pycryptodomex-3.12.0-cp35-abi3-manylinux2010_x86_64.whl (2.0 MB)
Installing collected packages: six, pytz, appdirs, texttable, pyzstd, pyppmd, pycryptodomex, pybcj, multivolumefile, fs, brotli, pycdlib, py7zr, iocursor, fs.archive
Successfully installed appdirs-1.4.4 brotli-1.0.9 fs-2.4.14 fs.archive-0.7.0 iocursor-0.1.2 multivolumefile-0.2.3 py7zr-0.17.2 pybcj-0.5.0 pycdlib-1.12.0 pycryptodomex-3.12.0 pyppmd-0.17.3 pytz-2021.3 pyzstd-0.15.0 six-1.16.0 texttable-1.6.4
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/home/matt/venv/bin/python -m pip install --upgrade pip' command.
(venv) ~ % cat > test.py <<EOF
heredoc> from fs import open_fs
heredoc> from fs.archive import open_archive
heredoc> my_fs = open_fs(u'temp://')
heredoc> with open_archive(my_fs, u'test.zip') as archive:
heredoc>        print(type(archive))
heredoc> with open_archive(my_fs, u'test.iso') as archive:
heredoc>        print(type(archive))
heredoc> with open_archive(my_fs, u'test.7z') as archive:
heredoc>        print(type(archive))
heredoc> EOF
(venv) ~ % python test.py
<class 'fs.archive.zipfs.ZipFS'>
<class 'fs.archive.isofs.ISOFS'>
Traceback (most recent call last):
File "/home/matt/test.py", line 8, in <module>
with open_archive(my_fs, u'test.7z') as archive:
File "/home/matt/venv/lib/python3.10/site-packages/fs/archive/opener.py", line 56, in open_archive
archive_opener = entry_point.load()
File "/home/matt/venv/lib/python3.10/site-packages/pkg_resources/__init__.py", line 2450, in load
return self.resolve()
File "/home/matt/venv/lib/python3.10/site-packages/pkg_resources/__init__.py", line 2456, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
ModuleNotFoundError: No module named 'fs.archive.sevenzipfs.SevenZipFS'

And my environment:

(venv) ~ % python --version
Python 3.10.1
(venv) ~ % pip --version
pip 21.2.4 from /home/matt/venv/lib/python3.10/site-packages/pip (python 3.10)
(venv) ~ % pip list | grep fs\.
fs              2.4.14
fs.archive      0.7.0
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/home/matt/venv/bin/python -m pip install --upgrade pip' command.

ZipReadFS claims to be case-insensitive but probably shouldn't

The fix for issue #6 enables something like my test case in that issue to work - specifying paths within a .zip file works regardless of the case of the path within the .zip file, at least when I provide the path using the same case as appears in the .zip file:

>>> zip.listdir("/testdir")
['lower', 'Mixed', 'UPPER']
>>> zip.open('/testdir/lower').read()
'lower\n'
>>> zip.open('/testdir/Mixed').read()
'mixed\n'
>>> zip.open('/testdir/UPPER').read()
'upper\n'

However, it doesn't work if I specify a path in the wrong case:

>>> zip.open('/testdir/upper').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "[...]/site-packages/fs/base.py", line 1224, in open
    bin_file = self.openbin(path, mode=bin_mode, buffering=buffering)
  File "[...]/site-packages/fs/archive/zipfs/__init__.py", line 217, in openbin
    raise errors.ResourceNotFound(path)
fs.errors.ResourceNotFound: resource '/testdir/upper' not found

despite this claim that the filesystem is not case-sensitive:

>>> zip.getmeta()["case_insensitive"]
True

The Info-ZIP zip manual page suggests that .zip files are in fact case-sensitive:

it is possible the archive came from a system where case does matter and the archive could include both Bar and bar as separate files in the archive.

The Python 3.6.13 zipfile module's documentation doesn't say anything about case, but at least in Python 3.6.8 the module seems to be similarly case-sensitive.

I think then that the bug is just that zip.getmeta()["case_insensitive"] returns True when it should return False.

Incidentally fs.zipfs.ReadZipFS which comes with PyFilesystem has the same bug. I'll try to get around to filing a bug against that package too.

pycdlib 1.2.0 no longer exists

Hi,

When I try to install ft.archive[all] I get the following error:

 Could not find a version that satisfies the requirement pycdlib~=1.2.0; extra == "all" (from fs.archive[all]==0.2.0) (from versions: 1.0.0, 1.3.0, 1.3.1, 1.3.2)
No matching distribution found for pycdlib~=1.2.0; extra == "all" (from fs.archive[all]==0.2.0)

Walking an Archive

Hi @althonos,

I'm looking at switching my old 7zip + re solution for getting the contents of Zip and ISO files. All I need to grab is the filepath, size, and timestamp.

I thought I'd be able to 'walk' the opened archives, but alas, it doesn't seem to work:

For Zips, I get no results wth walk.files, and only the top-level dirs with walk.dirs.

For ISOs, I seem to just get an error pycdlib.pycdlibexception.PyCdlibInvalidISO: Tag version not 2 or 3

Any suggestions?

Geoff

`isofs` seems to report `case_insensitive` incorrectly for Rock Ridge

From inspection (I haven't done testing) of fs.archive 0.7.2, it looks to me like fs.archive.isofs reports that it is case_insensitive in the following way:

if self._rock_ridge or self._joliet: True
else if self._cd.interchange_level < 4: True
else: False

I searched online and according to quite a few pages I found it seems like Rock Ridge is actually case-sensitive. I don't know if pycdlib has any restrictions of its own in this regard though.

After looking into this, I figured out that I don't actually care how it reports case_insensitive for my project, so I'm not actually affected by this issue.

`__ziptemp__` not cleaning up

Hi,

I’m creating a zip using open_fs, inside a temp location. The Zip creates its own, separate __ziptemp__ location, and doesn’t create the zip until I ‘close’ the zip file. When I close it, the zip is created, but the __ziptemp__ is not cleaned up.

I noticed also that all the files from the Zip are in the ziptemp. Should I be constantly closing the zip? Or am I doing something wrong?

My use-case is I am copying files from network storage to a local Temp location, renaming them as I do. Sometimes I need the files, sometimes I need them packaged in a zip. So when they need to be zipped, I open_fs a zip:// in the Temp location and copy files there - otherwise I copy the files straight to the Temp://. Once complete, I copy_fs the Temp:// to where it needs to be.

ResourceNotFound in ZipReadFS

Hi,

the following code:

from io import BytesIO
from typing import Any

import fs
import requests
from fs.archive.tarfs import TarReadFS
from fs.archive.zipfs import ZipReadFS
from fs.tarfs import ReadTarFS
from fs.zipfs import ReadZipFS


def print_readme(klass: Any, url: str) -> None:
    data = requests.get(url).content
    fs = klass(BytesIO(data))
    sub_fs = fs.opendir(fs.listdir("/")[0])
    print(sub_fs.listdir("."))
    print(sub_fs.open("README.md").read().strip())


if __name__ == '__main__':
    print(f"VERSION: {fs.archive.__version__}\n")

    data = [
        (TarReadFS, "https://gitlab.com/dAnjou/test/-/archive/main/test-main.tar.gz"),
        (ReadTarFS, "https://gitlab.com/dAnjou/test/-/archive/main/test-main.tar.gz"),
        (ZipReadFS, "https://gitlab.com/dAnjou/test/-/archive/main/test-main.zip"),
        (ReadZipFS, "https://gitlab.com/dAnjou/test/-/archive/main/test-main.zip"),
    ]

    for klass, url in data:
        print(f"### BEGIN {klass}")
        try:
            print_readme(klass, url)
        except Exception as e:
            print(e)
        print("### END\n")

produces this output:

VERSION: 0.7.0

### BEGIN <class 'fs.archive.tarfs.TarReadFS'>
['README.md']
# Test
### END

### BEGIN <class 'fs.tarfs.ReadTarFS'>
['README.md']
# Test
### END

### BEGIN <class 'fs.archive.zipfs.ZipReadFS'>
['README.md']
resource 'README.md' not found
### END

### BEGIN <class 'fs.zipfs.ReadZipFS'>
['README.md']
# Test
### END

Could you have a look, please? 🙂

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.