nexb / aboutcode-toolkit Goto Github PK

:white_check_mark: AboutCode Toolkit provides a simple way to document provenance metadata (origin and license) about third-party code that you use in your project: it includes utilities to generate inventory/BOM or Attribution documentation.

HTML 19.68% Python 77.13% Batchfile 1.26% Shell 1.48% Dockerfile 0.19% Makefile 0.26%

aboutcode-toolkit's People

Contributors

Stargazers

Watchers

aboutcode-toolkit's Issues

Break down the pre_process function

The current pre_process() is doing way to many things. We should break down these into couple of small functions instead.

Create an output file error.txt when generating attribution

When creating attribution, an error log should be created containing the warnings and errors
For example:
Mandatory fields are missing from the .ABOUT files.
The error log would show which components are missing the mandatory fields and specifically which mandatory fields are missing.

Define how we are packaging ABOUT

For now a simple .py script is best.
Later we need to have extra libs and the single file approach will no longer work.
We need to pick what packaging works knowing that target users may not be python developers. Some possibilities include:

sdist on Pypi for Python developers. see #4
pyz zip for general usage that requires a Python installation
py2exe, pyinstaller or similar for Windows/Mac/Linux that could work without a Python installed

Should ABOUT be system-wide installed? with man pages? as .deb or .rpm packages?

Make warning and error messages easier to read

For now we dump the python repr... we can do better

Do not check format if field is null

('download_url', '', 'This optional field has no value.'),('download_url', '', 'URL is either not in a valid format, or it is not reachable.')

If a field doesn't contains any value, we should not check for format.

genabout.py : try to rearrange the key,value row into a better order

For instance,

all the mandatory fields should be at the first few lines, and then probably all the vcs* should be grouped together etc.

genabout.py : bug for the --action 1

1 - Overwrites the current ABOUT field value if existed

The current behavior is kinda like replace the whole ABOUT file.

For instance,
if my ABOUT file already have a version value but my input csv doesn't have any value in the version column, the version field value will be removed which is not correct.

"hyphen" should not be considered as an invalid character in field names

WARNING: [Field: date-retrieved, Value: date-retrieved: 2013-01-16, Message: Field name contains invalid characters: '-': line ignored.]

genabout.py : "Update" the ABOUT file if ABOUT file already exist and have different context.

The current design is to do nothing if the ABOUT file exist.

The tool should be able to look into the ABOUT file to see if there is anything that need to be updated if the ABOUT file exist.

Provide options, such as:

overwrite the current field value
keep the current field value and only add the "new" field and field value
replace the ABOUT file with the current generation

In addition, if no option is set, the tool will not touch the ABOUT file and prompt "ABOUT file already existed."

Review FAILING tests

... we should move to unittest2 and and unitttest2 @expectedfailure

genabout.py : create an error log file to log all the errors during the generation

Create an error.txt if error exists.

Spaces on the left of the colon are not trimmed

For example:

description : Some description.

is silently ignored.

Test test_return_path_is_not_abspath writes to test data dir

This is usually not a good thing to write files in the testdata directories that are under version control. Should use a temp file/dir instead

add list of supported SCM tools

add list of supported SCM tools and raise warning if not found: cvs, svn, git, arch, hg, bzr, darcs, clearcase, perforce

Resolve paths in _file fields as POSIX path first then as local os.path

The ABOUT spec states the paths are posix, so we should use the posixpath module for this. To be resolved to actual files the paths would need to converted to local os path (ie windows or posix conventions internally)

A test should easily show if we do it right first

Generate attribution

Given ABOUT files or a list of ABOUT files in a CSV and a base directory, I want to generate an attribution notice for all these components:

supported output should be text and HTML
the output should be based on a template. Mustache or Jinja2 are a good candidates
for this to work, there need to be some mandatory fields present

Add setup.py and release on Pypi

Implement command line options using argparse module

The argparse module allows easy implementation of optional and positional parameters and also handles the output formatting.

genabout.py : split function into smaller functions

possible create another function called "process_input"

The test for check_url with network flag true may not be correct.

For instance,

self.assertTrue(about_file.check_url("http://www.google.com", True))

It will only return the request status code, such as 200, 404 etc instead of the boolean value.

Generate attribution from an input list

Generate attribution from ABOUT files for a specific list of components. User should be able to pass a list of components they would specifically like to generate attribution for.

In other words, from a given set of about files and an input list (that contains component names which map to about file field "name") be able to generate attribution.

Pass as an option a list of fields that are mandatory

I would like to have a configurable command line option to pass a list of fields considered as mandatory overriding the basics... This is important because for a certain usage certain fields will be needed. For instance generating attribution or redistribution requires certain fields to be set which may be otherwise considered as optional.

Generate an ABOUT "Scorecard"

If an engineering organization were to take up using ABOUT in a serious, ongoing way, it probably would be very helpful to provide a tool/utility that would generate a "scorecard" on a target codebase directory. Some of the scorecard points would include:

Percentage/counts of the directories that (1) have no ABOUT info, (2) have some ABOUT info, (3) appear to be fully-documented in ABOUT.

The value of this would be that as new code is checked into a codebase, the scorecard could help determine if it's time to review the ABOUT files.

The scorecard tool might also identify potentially obsolete ABOUT files, where the date (timestamp) on the ABOUT file is older (by a number of days, perhaps parameter-driven) that the file/directory that it describes.

Spec how to define a network flag on the command line for URL live checks

The default should be to not check over the network if a URL exists.

Extract license text from a license_url

The tool should be able to extract the license text based on the user license_url provided in the ABOUT file and save the license text in the user specific directory.

Collect redistributable sources

Given a directory with ABOUT files OR a csv and directory as an input, I want to collect all the source code archives that have the redistribute flag set , and bundle all these in a zip or tar

How to handle "non-supported" field?

These for now are just ignored...
Should these be returned in the output?

genabout.py : if the "about_resource" is ".", the generated about file will become ..ABOUT

The "..ABOUT" is not correct. If this is the case, maybe we want to use the "name" field as the filename?

Switch from getopt to argparse

This will also make implementing subcommands easier.

http://docs.python.org/2.7/howto/argparse.html
http://docs.python.org/2.7/library/argparse.html

Return error if a directory is not readable

Currently the tool will skip a directory without returning any error message if a directory is not readable

write test code for genabout.py

genabout.py : Implement --verbosity to show errors in stdout

all the "sub-directories" are pruned out in the output csv

For intance, if I run the about.py on

tmp/t1/t2/test.ABOUT

The generated csv's about_file field will return
tmp/test.ABOUT

the t1/t2/ are pruned

integrate gen_tests.py into the main tests.py

as title

Problem if I use '..' in the project input

For instance,
if I run it like the following:
$ python about.py ../about-code-tool/testdata/ /cygdrive/z/tmp/t2.csv

The output of the about_file is
../about-code-tool/testdata/

which is incorrect.

generate ABOUT files from a CSV

user should be able to generate ABOUT files from a CSV as an input

Current proposals:

A. An interactive prompt.

For instance, user only needs to type

python about.py
Then the script will prompt:

Collect ABOUT files context
Generate ABOUT files
Exit
Select:
something like the above, and the ask for input/output and all kind of options.

B. Create another script, maybe, called generate_about_file.py

genabout.py : update the syntax to include the location path of where the user want the ABOUT file be generated

The syntax should be:

python genabout.py [input_csv] [generated location]

genabout.py : make sure to generate all the mandatory field even they don't have value in the input csv

as title.

Check that everything in a dir is documented by ABOUT files

We should have a feature such that given a directory, it will parse ABOUT files, and ensure that all files in the directory are directly or indirectly (ie from a parent directory) documented or referenced in an ABOUT file

Setup travis CI for tests

genabout.py : Create log to indicate which ABOUT file have been ignored/changed.

Copy "License" file indicated in the ABOUT to the specific directory

For instance, in the ABOUT file, user has specify the location of the license file. The tool should be able to copy this license file into a user specific location.

Verbosity option for about.py

It would be nice to offer the possibility to log the warnings and errors in the stdout.

Probably a new option "--verbosity"

0 is default
1 is errors
2 is errors + warnings
So I can see the issues in my terminal without having to open the output file as I'm fixing the issues.

nexb / aboutcode-toolkit Goto Github PK

aboutcode-toolkit's People

Contributors

Stargazers

Watchers

Forkers

aboutcode-toolkit's Issues

Recommend Projects

Recommend Topics

Recommend Org