Comments (15)
Thanks for investigating! =)
Wrt that commit, for any fixes, we'll also have to check up on this behaviour: #1978
Good call, thanks! Ok repro matrix time 😅 . There are a few candidates for that line:
if tarfile.is_tarfile(p):
if tarfile.is_tarfile(p.fp):
if tarfile.is_tarfile(p.open_bytes()):
And a few cases to check:
- Do we drop rows reading from stdin, in a test case like this?
- If we rename a tar file to an unknown extension, does it open as a tar?
2a. What if it's in a zip file?
2b. Or if we open it via stdin? - if we open a non-tar file with an unknown extension, does it open as txt?
3a. What if it's in a zip file?
3b. Or if we open it via stdin? (e.g.vd -N <(cat <file>)
orcat <file> | vd -N
)
And from some testing the results look like...
if tarfile.is_tarfile(p):
- ❌ Drops rows when reading from stdin, disqualified 😅
if tarfile.is_tarfile(p.fp):
- ✔️
- ❌
guess_tar: nothing to open
2a. ✔️
2b. ❌guess_tar: illegal seek
- ✔️
3a. ❌ Incorrectly guesses as tar and then throwsReadError
s (the core of this GH issue)
3b. ✔️
if tarfile.is_tarfile(p.open_bytes()):
- ✔️
- ✔️
2a. ❌guess_tar: a RepeatFile holds text and cannot be reopened in binary mode
, opens astxt
2b. ❌guess_tar: a RepeatFile holds text and cannot be reopened in binary mode
, opens astxt
- ✔️
3a. ✔️
3b. ✔️
So I think none of the options I listed are perfect, but p.open_bytes()
seems to fail the least 😅 . Its main drawback is that it can't properly identify a tar
file with a non-tar extension, being opened from stdin or inside an archive. Maybe that's an issue that's worth digging into separately, or it's a rare enough combination of edge cases that we consciously punt on it? 🤔
from visidata.
Update on this patch: it's been a few months. I got hung up with a bug that I didn't understand. I feel the patch is too experimental to be part of visidata 3.1. I'll submit it after visidata 3.1 is released so it gets more time to be tested.
from visidata.
So I think none of the options I listed are perfect, but p.open_bytes() seems to fail the least 😅 . Its main drawback is that it can't properly identify a tar file with a non-tar extension, being opened from stdin or inside an archive. Maybe that's an issue that's worth digging into separately, or it's a rare enough combination of edge cases that we consciously punt on it? 🤔
I think it should be rare enough that we can punt on it for now. Ultimately we should head towards filetype being composable, e.g. zip+tar+txt
so that you can specify the filetype for each layer of container independently. Until then, sorry about your log file inside a tarball inside a zip file.
from visidata.
I've been sitting on a mostly-finished patch that addresses the issue of is_tarfile()
needing a binary file. It also would allow Visidata to guess binary filetypes, like guess_zip()
, when reading data piped to stdin.
Currently, visidata saves stdin data into a RepeatFile
, which reads the data as text, and saves it to be read multiple times for "files" opened in text mode. The change is to make a RepeatBinaryFile
that saves the incoming data as binary bytes instead. Then it can be reread directly as bytes for binary "files", or encoded and served as text for text "files".
If there's interest, I can put up what I've got as a starting point. But it needs to be finished, and then tested thoroughly. I can't do that in the near future.
from visidata.
I would love to see that, @midichef. I don't have time myself right now either, but maybe it will inspire another enterprising VisiDatan to kick the tires on it.
from visidata.
I'm curious too. I would definitely need some time to spin up on the changes and ripple effects. Happy to kick the tires and work through issues though. I'm onboard with the motivation for sure, so it's very cool that you've already got something started! ❤️
from visidata.
An update to my comment a couple of months ago:
I've been sitting on a mostly-finished patch
I've been prioritizing fixing Visidata bugs, but now I can resume work on this feature. I will likely have a workable version ready within a week.
from visidata.
Hi @krackout!
Thanks for filing this issue. Please provide some sample data, so that we can reproduce this. We might not be able to fix it otherwise.
from visidata.
Inside somelogs.zip file, there's a freshclam.log which produces the error.
somelogs.zip
I also attach a screenshot of the error.
from visidata.
Seeing the same thing here, repro:
echo "hiya" > test.log
zip test.zip test.log
vd -N test.zip
Then hitting Enter
gives me the same thing @krackout showed. Full traceback:
Traceback (most recent call last):
File "/home/aj/.local/pipx/venvs/visidata/lib/python3.12/site-packages/visidata/threads.py", line 220, in _toplevelTryFunc
t.status = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/aj/.local/pipx/venvs/visidata/lib/python3.12/site-packages/visidata/sheets.py", line 260, in reload
self.loader()
File "/home/aj/.local/pipx/venvs/visidata/lib/python3.12/site-packages/visidata/sheets.py", line 285, in loader
for r in self.iterload():
File "/home/aj/.local/pipx/venvs/visidata/lib/python3.12/site-packages/visidata/loaders/archive.py", line 140, in iterload
with tarfile.open(name=str(self.source)) as tf:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/tarfile.py", line 1808, in open
raise ReadError(f"file could not be opened successfully:\n{error_msgs_summary}")
tarfile.ReadError: file could not be opened successfully:
- method gz: ReadError('not a gzip file')
- method bz2: ReadError('not a bzip2 file')
- method xz: ReadError('not an lzma file')
- method tar: ReadError('truncated header')
Bisect tells me that this started in eb9d9c4.
If I use p.open_bytes()
instead of p.fp
here locally it seems to do good things, in that:
- A
.log
inside a.zip
will still open as txt - Tar files still open properly, and renaming
test.tar
totest.taradjacent
gets properly guessed/opened astar
Guessing doesn't work for a tar file named test.taradjacent
inside a zip file... but I can't tell if we expect that level of nested guessing to work 🤔
from visidata.
Thanks for investigating! =)
Wrt that commit, for any fixes, we'll also have to check up on this behaviour: #1978
from visidata.
It must be an edge case indeed; I supposed it would be easy to fix, since it worked on v2.11.
from visidata.
It must be an edge case indeed; I supposed it would be easy to fix, since it worked on v2.11.
v2.11 did not have the filetype guesser logic! That's why the behaviour changed. =)
from visidata.
Until then, sorry about your log file inside a tarball inside a zip file.
@saulpw Apparently you misunderstood. It's a text file with .log extension, inside a zip file. The text/log file is guessed erroneously as tar, no tar involved.
from visidata.
@krackout I was being a little cheeky, the 'you' here was referring to the hypothetical user for a tar file within an archive:
Its main drawback is that it can't properly identify a tar file with a non-tar extension, being opened from stdin or inside an archive.
from visidata.
Related Issues (20)
- [save] nulls derived from other columns save as "str(None)" "float(None)" etc HOT 1
- Mapping Ctrl-Semicolon and similar? HOT 5
- Double Quotation Character not working on Windows HOT 1
- About config display option and regex filters HOT 6
- [] Feature request: Horizontal guide lines in plots HOT 10
- Menu rendering glitches on macOS HOT 4
- Add friendly view of PyTables structured HDF5 files HOT 1
- How would you unhide a single column in visidata? HOT 2
- Feature request: Text-editing conveniences when editing a cell HOT 5
- data doesn't fill available space in multiline mode HOT 1
- Loading AWS S3 URL: External package "s3fs.core" not installed HOT 2
- Include memory sheet variables in autocomplete? HOT 1
- Viewing PDFs: External package "pdfminer.high_level" not installed; run: pip install pdfminer.six HOT 5
- Column using slow function as expr causes extreme slowness HOT 5
- Deprecation and vulnerability in upstream library (olm) HOT 2
- saving to XML: LookupError: unknown encoding: 'UTF-8-SIG' HOT 5
- Mailbox support not working? HOT 5
- Precision adjustments on date columns
- Make `curcol` available to `setcol-expr` HOT 2
- How do I transform column values in place with g* on v3.x? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from visidata.