Giter VIP home page Giter VIP logo

Comments (8)

pombredanne avatar pombredanne commented on June 9, 2024

Would this be with https://github.com/moby/moby/archive/refs/tags/v20.10.5.tar.gz ?

from extractcode.

pombredanne avatar pombredanne commented on June 9, 2024

And I had not seen your attachment! so I am good wih the code link you provided.

So I think this is sparse tarball issue which is 60GB extracted and only 5KB otherwise, unextracted, and not even compressed

Can you test scanning that one single 5KB file?
https://github.com/moby/moby/blob/363e9a88a11be517d9e8c65c998ff56f774eb4dc/vendor/archive/tar/testdata/gnu-sparse-big.tar

And can you detail what pre-processing you apply on this. Did you call extractcode first?
Is there a place where I can see your code? I am always interested in how ScanCode is integrated!

from extractcode.

pombredanne avatar pombredanne commented on June 9, 2024

Note that this could be related to nexB/scancode-toolkit#2431 where @Angi2412 and @avishmehta68710 both mentioned having a similar issue or at least this comment nexB/scancode-toolkit#2431 (comment) references the same code:

I could observe the same behaviour and error message with this file: moby v20.10.5.

from extractcode.

goekDil avatar goekDil commented on June 9, 2024

Yes, I apply extractcode first! Unfortunately, there is no place where you can see my code, but I can provide a minimalexample.py:

  from scancode import cli
  import multiprocessing
  import logging
  from extractcode import extract
  
  def main():
  
      extract_log = extract.extract(location='gnu-sparse-big.tar', recurse=True)
      i = 0
      for e in extract_log:
          print(e)
  
      rc, results = cli.run_scan(
          'gnu-sparse-big.tar-extract', license=True, copyright=True,
          return_results=True, processes=14, verbose=True, quiet=False,
          timeout=46800)
  
      print(rc, results)
  
  if __name__ == "__main__":
      main()

As you proposed, I tested the one single 5KB file and the exact same Error occurs:

image
(docker logs) (Linux and Windows with docker)

Depending on the machine I work on

  • I either experience the same Issue as in nexB/scancode-toolkit#2431 (Linux with docker or run minimalexample.py as standalone python script)
  • or there appears the Issue "OSError: [Errno 28] No space left on device" as I mentioned above (Windows with docker, as well as Linux with docker)

from extractcode.

pombredanne avatar pombredanne commented on June 9, 2024

So this is a sparse file issue. The short term workaround may be to ignore the gnu-sparse-big.tar file entirely.

from extractcode.

goekDil avatar goekDil commented on June 9, 2024

Thank you very much! After further analysis, I also came to the conclusion that this is a sparse file issue.

from extractcode.

pombredanne avatar pombredanne commented on June 9, 2024

I still may want to keep this open for now and transfer the issue to extractcode... as we may want to have a special processing for sparse files

from extractcode.

pombredanne avatar pombredanne commented on June 9, 2024

Done... now in extractcode!

from extractcode.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.