Giter VIP home page Giter VIP logo

commoncode's Introduction

CommonCode

Commoncode provides a set of common functions and utilities for handling various things like paths, dates, files and hashes. It started as library in scancode-toolkit. Visit https://aboutcode.org and https://github.com/nexB/ for support and download.

To install this package use:

pip install commoncode

Alternatively, to set up a development environment:

./configure --dev
source venv/bin/activate

To run unit tests:

pytest -vvs -n 2

To clean up development environment:

./configure --clean

commoncode's People

Contributors

a-tinsmith avatar abhishek-dev09 avatar agustinhenze avatar arijitde92 avatar armintaenzertng avatar arnav-mandal1234 avatar ayansinhamahapatra avatar cco3 avatar chinyeungli avatar georgthegreat avatar jdaguil avatar jonoyang avatar keshav-space avatar knobix avatar pihu1998 avatar pombredanne avatar pratikrocks avatar pyagni avatar saravananoffl avatar sschuberth avatar steven-esser avatar swastkk avatar tdruez avatar tg1999 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

commoncode's Issues

Testing in GitHub CI

At the moment, it seems like only the docs are checked when submitting a PR - and only for Python 3.9. This does not seem to be sufficient, as test breakages are not visible automatically by this, for example because of another Python version in the supported range 3.7 to 3.12.

When already at it: It seems like this repository does not have any contribution docs like ScanCode Toolkit: https://github.com/nexB/scancode-toolkit/blob/develop/CONTRIBUTING.rst Do the same rules apply here?

Commoncode fails with Click < 8

See in #30 by @vznncv:

The pull request #27 adds update_min_steps option usage of click._termui_impl.ProgressBar that is available from click 8.0. But existed setup.cfg contains the following restriction: click >= 6.7, !=7.0. It breaks scancode with click<8 version installation. For example: https://github.com/ARMmbed/mbed-os/pull/14981/checks?check_run_id=3470879205

This pull request updates minimal click version in the setup.cfg according changes in the pull request #27.

codebase_attributes are ignored when creating a VirtualCodebase from a scan

I am using commoncode 31.0.0 to create a VirtualCodebase from a scan using the following code:

    vc = VirtualCodebase(
        location=scan_file_location,
        codebase_attributes=dict(
            packages=attr.ib(default=attr.Factory(list))
        ),
        resource_attributes=dict(
            packages=attr.ib(default=attr.Factory(list)),
            for_packages=attr.ib(default=attr.Factory(list))
        )
    )

The resulting VirtualCodebase has an attribute named packages, but its value is None instead of a list.

The issue is at https://github.com/nexB/commoncode/blob/main/src/commoncode/resource.py#L1842, where we try to get the codebase attribute value from the scan passed into VirtualCodebase. If there is no codebase attribute with the same name in the scan, then None is assigned to the codebase attribute instead of the default value that was passed in when VirtualCodebase was instantiated.

commoncode breaks with click 8.0.1

  File "/usr/lib/python3.9/site-packages/commoncode/cliutils.py", line 373, in __init__
    super(PluggableCommandLineOption, self).__init__(
  File "/usr/lib/python3.9/site-packages/click/core.py", line 2482, in __init__
    super().__init__(param_decls, type=type, multiple=multiple, **attrs)
  File "/usr/lib/python3.9/site-packages/click/core.py", line 2041, in __init__
    self.type = types.convert_type(type, default)
  File "/usr/lib/python3.9/site-packages/click/types.py", line 1019, in convert_type
    return FuncParamType(ty)
  File "/usr/lib/python3.9/site-packages/click/types.py", line 158, in __init__
    self.name = func.__name__
AttributeError: 'str' object has no attribute '__name__'

FYI

Issue importing a scancode.io JSON results into VirtualCodebase

We're encountering the issue where we cannot walk a VirtualCodebase created from a scancode.io scan (prior to v31.0.0). When the scan is loaded into VirtualCodebase, and we call walk(), only the virtual_root is returned. This is due to the scancode.io Resource names not containing the extensions. The full filename is required for a VirtualCodebase to work since we keep the filenames of the children of each Resource (https://github.com/nexB/commoncode/blob/main/src/commoncode/resource.py#L671). If the filename isn't the complete filename, then the Resource will never find the child Resource when walking.

For example, in the problematic scan, the two resources at the root have the paths example.zip and example.zip-extract, but their names are both example. When loaded into a VirtualCodebase, the virtual_root has two children, but their names are both example, when it should be example.zip and example.zip-extract. Since virtual_root/example does not exist in resources_by_path of the VirtualCodebase, then we cannot continue the walk.

Test failure with Python 3.10

While the tests are passing with Python 3.9, are they failing with Python 3.10.

============================= test session starts ==============================
platform linux -- Python 3.10.1, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /build/commoncode-30.0.0, configfile: pyproject.toml
plugins: xdist-2.5.0, forked-1.4.0
collected 415 items / 2 errors / 413 selected                                  

==================================== ERRORS ====================================
___________________ ERROR collecting src/commoncode/fetch.py ___________________
src/commoncode/fetch.py:12: in <module>
    import requests
<frozen importlib._bootstrap>:1027: in _find_and_load
    ???
<frozen importlib._bootstrap>:1006: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:688: in _load_unlocked
    ???
/nix/store/chwjpc7aq97ihs7isn966dwcw5qzn6ah-python3.10-pytest-6.2.5/lib/python3.10/site-packages/_pytest/assertion/rewrite.py:170: in exec_module
    exec(co, module.__dict__)
/nix/store/92b2sgh6v6dflkwnb3csg2ks0nbgr3ix-python3.10-requests-2.26.0/lib/python3.10/site-packages/requests/__init__.py:133: in <module>
    from . import utils
<frozen importlib._bootstrap>:1027: in _find_and_load
    ???
<frozen importlib._bootstrap>:1006: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:688: in _load_unlocked
    ???
/nix/store/chwjpc7aq97ihs7isn966dwcw5qzn6ah-python3.10-pytest-6.2.5/lib/python3.10/site-packages/_pytest/assertion/rewrite.py:170: in exec_module
    exec(co, module.__dict__)
/nix/store/92b2sgh6v6dflkwnb3csg2ks0nbgr3ix-python3.10-requests-2.26.0/lib/python3.10/site-packages/requests/utils.py:41: in <module>
    DEFAULT_CA_BUNDLE_PATH = certs.where()
/nix/store/92b2sgh6v6dflkwnb3csg2ks0nbgr3ix-python3.10-requests-2.26.0/lib/python3.10/site-packages/requests/certs.py:30: in where
    return certifi.where()
/nix/store/bnxl1yg8nzqvhlg3sh57dffddj9zz8s6-python3.10-certifi-2021.10.08/lib/python3.10/site-packages/certifi/core.py:37: in where
    _CACERT_PATH = str(_CACERT_CTX.__enter__())
/nix/store/b798fp24zf2fdafmyyc4sxfr48ly5yy9-python3-3.10.1/lib/python3.10/contextlib.py:135: in __enter__
    return next(self.gen)
/nix/store/b798fp24zf2fdafmyyc4sxfr48ly5yy9-python3-3.10.1/lib/python3.10/importlib/_common.py:89: in _tempfile
    os.write(fd, reader())
/nix/store/b798fp24zf2fdafmyyc4sxfr48ly5yy9-python3-3.10.1/lib/python3.10/importlib/abc.py:371: in read_bytes
    with self.open('rb') as strm:
/nix/store/b798fp24zf2fdafmyyc4sxfr48ly5yy9-python3-3.10.1/lib/python3.10/importlib/_adapters.py:54: in open
    raise ValueError()
E   ValueError
___________________ ERROR collecting src/commoncode/fetch.py ___________________
src/commoncode/fetch.py:12: in <module>
    import requests
<frozen importlib._bootstrap>:1027: in _find_and_load
    ???
<frozen importlib._bootstrap>:1006: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:688: in _load_unlocked
    ???
/nix/store/chwjpc7aq97ihs7isn966dwcw5qzn6ah-python3.10-pytest-6.2.5/lib/python3.10/site-packages/_pytest/assertion/rewrite.py:170: in exec_module
    exec(co, module.__dict__)
/nix/store/92b2sgh6v6dflkwnb3csg2ks0nbgr3ix-python3.10-requests-2.26.0/lib/python3.10/site-packages/requests/__init__.py:133: in <module>
    from . import utils
<frozen importlib._bootstrap>:1027: in _find_and_load
    ???
<frozen importlib._bootstrap>:1006: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:688: in _load_unlocked
    ???
/nix/store/chwjpc7aq97ihs7isn966dwcw5qzn6ah-python3.10-pytest-6.2.5/lib/python3.10/site-packages/_pytest/assertion/rewrite.py:170: in exec_module
    exec(co, module.__dict__)
/nix/store/92b2sgh6v6dflkwnb3csg2ks0nbgr3ix-python3.10-requests-2.26.0/lib/python3.10/site-packages/requests/utils.py:41: in <module>
    DEFAULT_CA_BUNDLE_PATH = certs.where()
/nix/store/92b2sgh6v6dflkwnb3csg2ks0nbgr3ix-python3.10-requests-2.26.0/lib/python3.10/site-packages/requests/certs.py:30: in where
    return certifi.where()
/nix/store/bnxl1yg8nzqvhlg3sh57dffddj9zz8s6-python3.10-certifi-2021.10.08/lib/python3.10/site-packages/certifi/core.py:37: in where
    _CACERT_PATH = str(_CACERT_CTX.__enter__())
/nix/store/b798fp24zf2fdafmyyc4sxfr48ly5yy9-python3-3.10.1/lib/python3.10/contextlib.py:135: in __enter__
    return next(self.gen)
/nix/store/b798fp24zf2fdafmyyc4sxfr48ly5yy9-python3-3.10.1/lib/python3.10/importlib/_common.py:89: in _tempfile
    os.write(fd, reader())
/nix/store/b798fp24zf2fdafmyyc4sxfr48ly5yy9-python3-3.10.1/lib/python3.10/importlib/abc.py:371: in read_bytes
    with self.open('rb') as strm:
/nix/store/b798fp24zf2fdafmyyc4sxfr48ly5yy9-python3-3.10.1/lib/python3.10/importlib/_adapters.py:54: in open
    raise ValueError()
E   ValueError
=========================== short test summary info ============================
ERROR src/commoncode/fetch.py - ValueError
ERROR src/commoncode/fetch.py - ValueError
!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!
============================== 2 errors in 3.43s ===============================

VirtualCodebase fails with Python 3.9

The json.load call in VirtualCodebase._get_scan_data_helper provide a encoding='utf-8' argument that was removed in Python 3.9. See https://github.com/nexB/commoncode/blob/main/src/commoncode/resource.py#L762

https://docs.python.org/3.8/library/json.html#json.loads

Deprecated since version 3.1, will be removed in version 3.9: encoding keyword argument.

This was simply ignored by all Python 3 version until 3.9 that now raise a TypeError.

We should simply remove this argument.

Traceback (most recent call last):
  File "lib/python3.9/site-packages/commoncode/resource.py", line 1464, in __init__
    scan_data = self._get_scan_data(location)
  File "lib/python3.9/site-packages/commoncode/resource.py", line 1507, in _get_scan_data
    return self._get_scan_data_helper(location)
  File "lib/python3.9/site-packages/commoncode/resource.py", line 1478, in _get_scan_data_helper
    scan_data = json.load(f, object_pairs_hook=OrderedDict, encoding='utf-8')
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
TypeError: __init__() got an unexpected keyword argument 'encoding'

Refactor how strip_root argument works for Codebase

A proposed change in scancode.io is that the root Resource of a project codebase will be named . and all other Resources of that codebase will not have the root path prefix in its path. We should change how the strip_root argument for Codebase() works where the root resource is renamed to . and the path set to ., and the root path prefixes are removed from the remaining Resource paths.

Rethinking the relationship between Codebases and Resources

It is not easy to determine a root for a codebase when a scan contains many different codebases within it.

A way to resolve the issue about not being able to determine a root for a codebase in a scan with multiple codebases
is to consider the group of codebases as a Project, similar to what we do in scancode.io

A Project would then have multiple "starting paths" that would be the individual roots of the different codebases in a scan.
A Project would keep track of the leading path segment of Resources as a "starting path".

For example, consider that we have the following Resource paths in a scan:

codebase1/a.c
codebase2/foo.c
codebase3/do.c

In this case, the Project would track "codebase1", "codebase2", and "codebase3" as starting paths.

If the input to a Project is a file, then the starting path will just be the file name of the single file.

datetime.datetime.utcfromtimestamp is deprecated in Python 3.12

Using commoncode.filetype.get_last_modified_date on Python 3.12 (for example indirectly through ScanCode Toolkit) will report a deprecation warning (in this case as part of unit/integration tests of my own package which uses ScanCode):

test_file_path (test_retrieval.RunTestCase.test_file_path) ... /opt/hostedtoolcache/Python/3.12.1/x64/lib/python3.12/site-packages/commoncode/filetype.py:167: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
  datetime.utcfromtimestamp(os.path.getmtime(location))

The root of the problem is the root: How to get a root at all times in a codebase (or no root at all)

The root problem is this:

  1. ScanCode TK demands a single root named Resource in a Codebase
  2. ScanCode TK creates a fake "virtual_root" name if there is no common root in several scans (--from-json /VirtualCodebases)
  3. ScanCode.io does not have a single root for a project (or rather it has one which might be the "codebase" directory, but never reports it.
  4. on Windows there is no single root: each "drive" is its own root.
  5. on POSIX, there is a single root, but it has no name: this is name only "/" and it can have files and directories as children
  6. ScanCode TK and CommonCode generally ignore or even strip a leading slash therefore making the POSIX root moot
  7. When you "--strip-root" in ScanCode TK (leaving aside the possible loss of data attached to a root dir) it is potentially problematic to further read this with --from-json because we have no root anymore.
  8. Scanning a subset of paths or a collections of path in ScanCode is problematic because of this need for root
  9. The Codebase classes expect some sort order when creating Resources which may not make sense in all cases and may be overly restrictive as we cannot predict this sort order at all times
  10. Somethings are not entirely clear:
  • what is the different between a Codebase and a root Resource?
  • why could a Codebase not be just a collection of paths? And why do we even need a root?

We need to define a clean and well spec way to handle this across all projects

Implement `for_packages_append` on `Resource`

We are updating the application package scanning process on scancode.io in nexB/scancode.io#447. We are implementing the package assembly step from scancode-toolkit in scancode.io. The assembly methods from packagedcode associates Resources to packages by appending the package_uid to the for_packages attribute on Resources. This method of associating Resources to Packages does not work on scancode.io because for_packages is an property on CodebaseResource that generates a list of purls from DiscoveredPackages associated with a CodebaseResource.

A solution would be to create a method on the Resource class named for_packages_append that appends a package_uid to Resource.for_packages. This extra level of indirection allows us to create a different implementation on CodebaseResource for associating Packages to Resources using the same interface.

VirtualCodebase walk issue

When I create a VirtualCodebase from the following data:

[{'is_file': False, 'path': 'to', 'sha1': '', 'size': 4096},
 {'is_file': False, 'path': 'to/to', 'sha1': '', 'size': 4096},
 {'is_file': True,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar',
  'sha1': '07bfa85a425faacf3f1dcbda3ac13c9ff0a00f43',
  'size': 10424},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract',
  'sha1': '',
  'size': 4096},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22',
  'sha1': '',
  'size': 4096},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com',
  'sha1': '',
  'size': 4096},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay',
  'sha1': '',
  'size': 4096},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal',
  'sha1': '',
  'size': 4096},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika',
  'sha1': '',
  'size': 4096},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal',
  'sha1': '',
  'size': 4096},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator',
  'sha1': '',
  'size': 4096},
 {'is_file': True,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator/TikaBundleActivator.class',
  'sha1': '04bce1304882d0d1d1f851d4b0484bfd22df9956',
  'size': 853},
 {'is_file': False,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/META-INF',
  'sha1': '',
  'size': 4096},
 {'is_file': True,
  'path': 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/META-INF/MANIFEST.MF',
  'sha1': 'b259041906b9cc9db46a87ac5db538c8b2e59cce',
  'size': 32212}]

I get into a strange loop when I try to use walk() on it.

to
to/to
to/com.liferay.portal.tika-1.0.22.jar-extract
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator
to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator
...

We never end up visiting the resource 'to/to/com.liferay.portal.tika-1.0.22.jar-extract/com.liferay.portal.tika-1.0.22/com/liferay/portal/tika/internal/activator/TikaBundleActivator.class, and most of the paths start with to, instead of to/to.

Looking into the problem, I see in commoncode.resource._get_parent_directory() that we are not properly appending the root path prefix when we are seeing if a path segment resource exists: https://github.com/nexB/commoncode/blob/main/src/commoncode/resource.py#L2037

Failure of tests/test_paths.py::TestPortablePath::test_safe_path_posix_style_chinese_char

Environment:

  • Python 3.12.0~rc1
  • Fedora Rawhide
  • commoncode 31.0.2

The following test fails:

___________ TestPortablePath.test_safe_path_posix_style_chinese_char ___________

self = <test_paths.TestPortablePath testMethod=test_safe_path_posix_style_chinese_char>

    def test_safe_path_posix_style_chinese_char(self):
        test = paths.safe_path(b'/includes/webform.compon\xd2\xaants.inc/')
        expected = 'includes/webform.componNSnts.inc'
>       assert test == expected
E       AssertionError: assert 'includes/web...mponS_nts.inc' == 'includes/web...mponNSnts.inc'
E         - includes/webform.componNSnts.inc
E         ?                        -
E         + includes/webform.componS_nts.inc
E         ?                         +

tests/test_paths.py:74: AssertionError

tests/test_paths.py::TestPortablePath::test_safe_path_posix_style_chinese_char

Add warnings field to codebase headers

We are deprecating some of the summary features of scancode and we need a good place to put these messages. We are not using the errors field of the headers to store these messages since deprecation messages aren't errors. It would be nice to have a warnings field for this situation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.