Comments (8)
Would this be with https://github.com/moby/moby/archive/refs/tags/v20.10.5.tar.gz ?
from extractcode.
And I had not seen your attachment! so I am good wih the code link you provided.
So I think this is sparse tarball issue which is 60GB extracted and only 5KB otherwise, unextracted, and not even compressed
Can you test scanning that one single 5KB file?
https://github.com/moby/moby/blob/363e9a88a11be517d9e8c65c998ff56f774eb4dc/vendor/archive/tar/testdata/gnu-sparse-big.tar
And can you detail what pre-processing you apply on this. Did you call extractcode first?
Is there a place where I can see your code? I am always interested in how ScanCode is integrated!
from extractcode.
Note that this could be related to nexB/scancode-toolkit#2431 where @Angi2412 and @avishmehta68710 both mentioned having a similar issue or at least this comment nexB/scancode-toolkit#2431 (comment) references the same code:
I could observe the same behaviour and error message with this file: moby v20.10.5.
from extractcode.
Yes, I apply extractcode first! Unfortunately, there is no place where you can see my code, but I can provide a minimalexample.py:
from scancode import cli
import multiprocessing
import logging
from extractcode import extract
def main():
extract_log = extract.extract(location='gnu-sparse-big.tar', recurse=True)
i = 0
for e in extract_log:
print(e)
rc, results = cli.run_scan(
'gnu-sparse-big.tar-extract', license=True, copyright=True,
return_results=True, processes=14, verbose=True, quiet=False,
timeout=46800)
print(rc, results)
if __name__ == "__main__":
main()
As you proposed, I tested the one single 5KB file and the exact same Error occurs:
(docker logs) (Linux and Windows with docker)
Depending on the machine I work on
- I either experience the same Issue as in nexB/scancode-toolkit#2431 (Linux with docker or run minimalexample.py as standalone python script)
- or there appears the Issue "OSError: [Errno 28] No space left on device" as I mentioned above (Windows with docker, as well as Linux with docker)
from extractcode.
So this is a sparse file issue. The short term workaround may be to ignore the gnu-sparse-big.tar
file entirely.
from extractcode.
Thank you very much! After further analysis, I also came to the conclusion that this is a sparse file issue.
from extractcode.
I still may want to keep this open for now and transfer the issue to extractcode... as we may want to have a special processing for sparse files
from extractcode.
Done... now in extractcode!
from extractcode.
Related Issues (20)
- Spaces in paths are replaced with underscore HOT 1
- Drop support for extracting patches
- Check uncompressed size before extract entries of archive HOT 6
- Extractcode FileNotFoundError if using replace-originals option HOT 1
- Add support for lpkg file extraction
- Failed to extract windows AR lib
- Extractcode replaces `:` in file names with `_` HOT 1
- Ungraceful handling of libarchive missing symbol
- `./configure --dev` fails with: Could not find a version that satisfies the requirement typecode[full]>=30.0.0
- Cannot extract Lz4 file
- Cannot extract some files
- Problem with recursive extraction
- Trouble getting tests running HOT 2
- Various tests failures on Python 3.12.0rc1 HOT 2
- ExtractCode (from ScanCode TK) fails to extract .pkg and .exe files
- Error while extracting patch file
- Improve doc for extractcode --ignore option
- Ensure we are hadling common tar bombs and zip bombs correctly
- extractcode errors out on HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from extractcode.