Giter VIP home page Giter VIP logo

Comments (15)

ajkerrigan avatar ajkerrigan commented on September 27, 2024 2

Thanks for investigating! =)

Wrt that commit, for any fixes, we'll also have to check up on this behaviour: #1978

Good call, thanks! Ok repro matrix time 😅 . There are a few candidates for that line:

   if tarfile.is_tarfile(p):
   if tarfile.is_tarfile(p.fp):
   if tarfile.is_tarfile(p.open_bytes()):

And a few cases to check:

  1. Do we drop rows reading from stdin, in a test case like this?
  2. If we rename a tar file to an unknown extension, does it open as a tar?
    2a. What if it's in a zip file?
    2b. Or if we open it via stdin?
  3. if we open a non-tar file with an unknown extension, does it open as txt?
    3a. What if it's in a zip file?
    3b. Or if we open it via stdin? (e.g. vd -N <(cat <file>) or cat <file> | vd -N)

And from some testing the results look like...


   if tarfile.is_tarfile(p):
  1. ❌ Drops rows when reading from stdin, disqualified 😅

   if tarfile.is_tarfile(p.fp):
  1. ✔️
  2. guess_tar: nothing to open
    2a. ✔️
    2b. ❌ guess_tar: illegal seek
  3. ✔️
    3a. ❌ Incorrectly guesses as tar and then throws ReadErrors (the core of this GH issue)
    3b. ✔️

   if tarfile.is_tarfile(p.open_bytes()):
  1. ✔️
  2. ✔️
    2a. ❌ guess_tar: a RepeatFile holds text and cannot be reopened in binary mode, opens as txt
    2b. ❌ guess_tar: a RepeatFile holds text and cannot be reopened in binary mode, opens as txt
  3. ✔️
    3a. ✔️
    3b. ✔️

So I think none of the options I listed are perfect, but p.open_bytes() seems to fail the least 😅 . Its main drawback is that it can't properly identify a tar file with a non-tar extension, being opened from stdin or inside an archive. Maybe that's an issue that's worth digging into separately, or it's a rare enough combination of edge cases that we consciously punt on it? 🤔

from visidata.

midichef avatar midichef commented on September 27, 2024 2

Update on this patch: it's been a few months. I got hung up with a bug that I didn't understand. I feel the patch is too experimental to be part of visidata 3.1. I'll submit it after visidata 3.1 is released so it gets more time to be tested.

from visidata.

saulpw avatar saulpw commented on September 27, 2024 1

So I think none of the options I listed are perfect, but p.open_bytes() seems to fail the least 😅 . Its main drawback is that it can't properly identify a tar file with a non-tar extension, being opened from stdin or inside an archive. Maybe that's an issue that's worth digging into separately, or it's a rare enough combination of edge cases that we consciously punt on it? 🤔

I think it should be rare enough that we can punt on it for now. Ultimately we should head towards filetype being composable, e.g. zip+tar+txt so that you can specify the filetype for each layer of container independently. Until then, sorry about your log file inside a tarball inside a zip file.

from visidata.

midichef avatar midichef commented on September 27, 2024 1

I've been sitting on a mostly-finished patch that addresses the issue of is_tarfile() needing a binary file. It also would allow Visidata to guess binary filetypes, like guess_zip(), when reading data piped to stdin.

Currently, visidata saves stdin data into a RepeatFile, which reads the data as text, and saves it to be read multiple times for "files" opened in text mode. The change is to make a RepeatBinaryFile that saves the incoming data as binary bytes instead. Then it can be reread directly as bytes for binary "files", or encoded and served as text for text "files".

If there's interest, I can put up what I've got as a starting point. But it needs to be finished, and then tested thoroughly. I can't do that in the near future.

from visidata.

saulpw avatar saulpw commented on September 27, 2024 1

I would love to see that, @midichef. I don't have time myself right now either, but maybe it will inspire another enterprising VisiDatan to kick the tires on it.

from visidata.

ajkerrigan avatar ajkerrigan commented on September 27, 2024 1

I'm curious too. I would definitely need some time to spin up on the changes and ripple effects. Happy to kick the tires and work through issues though. I'm onboard with the motivation for sure, so it's very cool that you've already got something started! ❤️

from visidata.

midichef avatar midichef commented on September 27, 2024 1

An update to my comment a couple of months ago:

I've been sitting on a mostly-finished patch

I've been prioritizing fixing Visidata bugs, but now I can resume work on this feature. I will likely have a workable version ready within a week.

from visidata.

anjakefala avatar anjakefala commented on September 27, 2024

Hi @krackout!

Thanks for filing this issue. Please provide some sample data, so that we can reproduce this. We might not be able to fix it otherwise.

from visidata.

krackout avatar krackout commented on September 27, 2024

Inside somelogs.zip file, there's a freshclam.log which produces the error.
somelogs.zip

I also attach a screenshot of the error.
visidata-logfile-error

from visidata.

ajkerrigan avatar ajkerrigan commented on September 27, 2024

Seeing the same thing here, repro:

echo "hiya" > test.log
zip test.zip test.log
vd -N test.zip

Then hitting Enter gives me the same thing @krackout showed. Full traceback:

Traceback (most recent call last):
File "/home/aj/.local/pipx/venvs/visidata/lib/python3.12/site-packages/visidata/threads.py", line 220, in _toplevelTryFunc
t.status = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/aj/.local/pipx/venvs/visidata/lib/python3.12/site-packages/visidata/sheets.py", line 260, in reload
self.loader()
File "/home/aj/.local/pipx/venvs/visidata/lib/python3.12/site-packages/visidata/sheets.py", line 285, in loader
for r in self.iterload():
File "/home/aj/.local/pipx/venvs/visidata/lib/python3.12/site-packages/visidata/loaders/archive.py", line 140, in iterload
with tarfile.open(name=str(self.source)) as tf:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/tarfile.py", line 1808, in open
raise ReadError(f"file could not be opened successfully:\n{error_msgs_summary}")
tarfile.ReadError: file could not be opened successfully:
- method gz: ReadError('not a gzip file')
- method bz2: ReadError('not a bzip2 file')
- method xz: ReadError('not an lzma file')
- method tar: ReadError('truncated header')

Bisect tells me that this started in eb9d9c4.

If I use p.open_bytes() instead of p.fp here locally it seems to do good things, in that:

  • A .log inside a .zip will still open as txt
  • Tar files still open properly, and renaming test.tar to test.taradjacent gets properly guessed/opened as tar

Guessing doesn't work for a tar file named test.taradjacent inside a zip file... but I can't tell if we expect that level of nested guessing to work 🤔

from visidata.

anjakefala avatar anjakefala commented on September 27, 2024

Thanks for investigating! =)

Wrt that commit, for any fixes, we'll also have to check up on this behaviour: #1978

from visidata.

krackout avatar krackout commented on September 27, 2024

It must be an edge case indeed; I supposed it would be easy to fix, since it worked on v2.11.

from visidata.

anjakefala avatar anjakefala commented on September 27, 2024

It must be an edge case indeed; I supposed it would be easy to fix, since it worked on v2.11.

v2.11 did not have the filetype guesser logic! That's why the behaviour changed. =)

from visidata.

krackout avatar krackout commented on September 27, 2024

Until then, sorry about your log file inside a tarball inside a zip file.

@saulpw Apparently you misunderstood. It's a text file with .log extension, inside a zip file. The text/log file is guessed erroneously as tar, no tar involved.

from visidata.

saulpw avatar saulpw commented on September 27, 2024

@krackout I was being a little cheeky, the 'you' here was referring to the hypothetical user for a tar file within an archive:

Its main drawback is that it can't properly identify a tar file with a non-tar extension, being opened from stdin or inside an archive.

from visidata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.