Giter VIP home page Giter VIP logo

aboutcode's Introduction

AboutCode

What is AboutCode?

AboutCode is a family of FOSS projects to uncover data ... about software:

  • where does the code come from? which software package?
  • what is its license? copyright?
  • is the code vulnerable, maintained, well coded?
  • what are its dependencies, are there vulneribilities/licensing issues?

All these are questions that are important to answer: there are millions of free and open source software components available on the web for reuse.

Knowing where a software package comes from, what its license is and whether it is vulnerable should be a problem of the past such that everyone can safely consume more free and open source software. We support not only open source software, but also open data, generated and curated by our applications.

NOTE: This is a repository with information on aboutcode open source activities and not the actual code repository. See the projects section below for links to all the code repositories of our projects with a brief overview and our wiki if you are looking to participate.

Documentation Build Status

Doc Build

Important Links

Our homepage is at http://aboutcode.org

Our documentation (in progress) is at https://aboutcode.readthedocs.io/en/latest/

Join the chat online at app.gitter.im : aboutcode-org#discuss or if you're using the element app set the homeserver to gitter.im and then join the aboutcode-org#discuss chatroom. Introduce yourself and start the discussion!

Look at our wiki for information about our participation in the GSoC and GSoD programs.

We have a weekly meeting, see more details here.

Projects

Each AboutCode project has its own repository:

  • ScanCode Toolkit: a set of code scanning tools to detect the origin and license of code and dependencies. ScanCode now uses a plug-in architecture to run a series of scan-related tools in one process flow. This is the most popular project and is used by 100's of software teams . The lead maintainer is @pombredanne

  • Scancode.io: is a web-based and API to run and review scans in rich scripted pipelines, on different kinds of containers, docker images, package archives, manifests etc, to get information on licenses, copyrights, source, vulneribilities. The lead maintainer is @tdruez

  • VulnerableCode: is a web-based API and database to collect and track all the known software package vulnerabilities, with affected and fixed packages, references and a standalone tool Vulntotal to compare this vulneribility information across similar tools. This is maintained by @tg1999 and @pombredanne

  • univers is a package to parse and compare all the package versions and all the ranges.

  • purlDB consists of tools to create and expose a database of purls (Package URLs) and also has package data for all of these packages created from scans. This is maintained by @jyang

  • FetchCode is a library to reliably fetch any code via HTTP, FTP and version control systems such as git.

  • Scancode Workbench: a desktop application based on typescript and react to visualize and review scan results from scancode scans.

  • AboutCode Toolkit: a set of command line tools to document the provenance of your code and generate attribution notices. AboutCode Toolkit uses small yaml files to document code provenance inside a codebase. The lead maintainer is @chinyeungli

  • container-inspector: a tool to analyze the structure and provenance of software components in Docker images using static analysis. Maintained by @pombredanne

  • python-inspector and nuget inspector inspects manifests and code to resolve dependencies (vulnerable and non-vulnerable) for python and nuget packages respectively.

  • license-expression: a library to parse, analyze, compare and normalize SPDX and SPDX-like license expressions using a boolean logic expression engine. See https://spdx.org/spdx-specification-21-web-version#h.jxpfx0ykyb60 to understand what an expression is. See https://github.com/nexB/license-expression for the code. The underlying boolean engine is live at https://github.com/bastikr/boolean.py . Both are co-maintained by @pombredanne

  • ABCD aka AboutCode Data: a simple set of conventions to define data structures that all the AboutCode tools can understand and use to exchange data. The details are at AboutCode Data. ABOUT files and ScanCode Toolkit data are examples of this approach. Other projects such as https://libraries.io and and OSS Review Toolkit are also using these conventions.

  • TraceCode Toolkit: a set of tools to trace files from your deployment or distribution packages back to their origin in a development codebase or repository. The primary tool uses strace https://github.com/strace/strace/ to trace system calls on Linux and construct a build graph from syscalls to show which files are used to build a binary. We are contributors to strace. Maintained by @pombredanne

We also co-started and worked closely with other FOSS orgs and projects:

  • Package URL: a widely used standard to reference software packages of all types with simple, readable and concise URLs.

  • SPDX: aka. Software Package Data Exchange, a spec to document the origin and licensing of packages.

  • CycloneDX aka. OWASP CycloneDX is a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction

  • ClearlyDefined: a project to review and help FOSS projects improve their licensing and documentation clarity. This project is incubating with https://opensource.org

aboutcode's People

Contributors

agustinhenze avatar arijitde92 avatar arnav-mandal1234 avatar ayansinhamahapatra avatar chinyeungli avatar dennisclark avatar dependabot[bot] avatar farialima avatar johnmhoran avatar jonoyang avatar kartiksibal avatar keshav-space avatar kevinji22 avatar lf32 avatar mjherzog avatar naveen-singla avatar omkarph avatar pombredanne avatar pratikrocks avatar saransh-cpp avatar singh1114 avatar steven-esser avatar swastkk avatar tg1999 avatar yash-varshney avatar ziadhany avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aboutcode's Issues

RFC: Specify how header-level data are returned in ABCD

Today we have header-level data in ScanCode data that is a tad ad-hoc. For instance:

{
  "scancode_notice": "Generated with ScanCode...",
  "scancode_version": "2.1.0.post55.ff2948e",
  "scancode_options": {
    "--info": true,
    "--format": "json-pp"
  },
  "files_count": 1,

  ..files, etc ... regular ABC Data.....
}

We should normalize this and support having multiple tools providing some log that they touched the data.
Here is what I suggest: store these in a top level "header" attribute. This attribute would contain a list. Each list item would be an object.
With this in mind the new ScanCode output would look like this:

{ 
  "header" : [
    { 
      "tool": "scancode-toolkit",
      "tool_version": "2.1.0.post55.ff2948e",
      "date": "2017-09-12T12:23:12",

      "scancode_notice": "Generated with ScanCode...",
      "scancode_options": {
        "--info": true,
        "--format": "json-pp"
       },
      "files_count": 1,
      [.... any other attributes that a tool may want to add, such as a scanned path, etc] ,
    }
  ]
  ..files, etc ... regular ABC Data.....
}

And with several tools having "touched" the data:

{ 
  "header" : [
    { 
      "tool": "scancode-toolkit",
      "tool_version": "2.1.0.post55.ff2948e",
      "date": "2017-09-12T12:23:12",

      "scancode_notice": "Generated with ScanCode...",
      "scancode_options": {
        "--info": true,
        "--format": "json-pp"
      },
      "files_count": 1,
      [.... any other attributes that a tool may want to add, such as a scanned path, etc] ,
    },
    { 
      "tool": "aboutcode-mamanger",
      "tool_version": "3.1.0",
      "date": "2017-09-13T15:23:12",
      [.... any other attributes that a tool may want to add, such as a scanned path, etc] ,
    },
    { 
      "tool": "vulnerablecode",
      "tool_version": "0.1.0",
      "date": "2017-09-13T16:23:12",
      [.... any other attributes that a tool may want to add, such as a scanned path, etc] ,
    }
  ]
  ..files, component, packages etc ... e.g. regular ABC Data.....
}

In this context these would be the only fields that are expected in a header item:

      "tool": "aboutcode-mamanger",
      "tool_version": "3.1.0",
      "date": "2017-09-13T15:23:12",

and the convention would be that each tool exporting ABCD data would:

  1. preserve the previous header
  2. add one "log" entry to the header

The benefits of all this are:

  1. clear header data, no longer mixed with other regular code-related data
  2. minimal trail/log of which tool touched and eventually transformed the data which is useful for tracing and documentation

Update ReadTheDocs page

The AboutCode pages on ReadTheDocs are outdated and due for a refresh. The major changes to make are:

  • Update the main page based on changes outlined in:
    AboutCode-RTD-changes-22.03.04.docx
  • Archive the GSoC and GSoD pages and remove them from the TOC and index because we have decided to use the GitHub wikis for our participation in those programs. We may want to move some of this content to the GH wikis - TBD.

Automate posting project releases to aboutcode.org

We need to automate posting project releases to the News feed on aboutcode.org. Aboutcode.org is the primary website for information about any and all of our many projects so it would be a good place to post project releases so that a viewer can see consolidated information across projects.
The idea is to create a News template that would be a requirement for any release. The information should be very simple with a simple title like "Project version x.y.z released" with links to the Releases page for each project.
We already seem to have a basic template as with https://www.aboutcode.org/news/2021-12-17-scancode.io.html, but the posting process is manual.

Improve how we references issues in commit message and code

Description

Today we use #12 as a reference to an issue.
When the code is move to another repo, 12 will point to another issue.
We should consider switching to something using full URLs in code comments
and this Reference: https://github.com/nexB/dejacode/issues/1198 just above the signoff in commits

Readme.md needs correction

Description

Tiny correction in the readme file

Select Category

  • Inconsistency []
  • New Section Request []
  • General Improvement []
  • Typo/Mistakes [โœ…]
  • Other []

Scan command shows error

Scan for all clues:

To scan for licenses, copyrights, urls, emails, package information, and file information

./scancode -clip -e -u --format html-app samples samples.html
Scan for license and copyright clues:

./scancode -cl --format html-app samples samples.html

Scan for emails and URLs:
./scancode -e -u --format html-app samples samples.html

Scan for package information:

./scancode -p --format html-app samples samples.html

Scan for file information:

./scancode -i --format html-app samples samples.html

To see more example scans:

./scancode --examples

All these commands are showing an error.

(base) rituraj@rituraj-G7-7588:~/Documents/git/scancode-toolkit$ ./scancode -clip -e -u --format html-app samples samples.html
Error: no such option: --format

I think --format is an additional option to the command.

Improvements in README.md file

Description

In the README.md file --> Projects Section, the username of the lead maintainers is not pointing to their own profile link.

Add links to aboutcode.readthedocs.io from the wiki

Modify the aboutcode wiki pages to point to corresponding pages in aboutcode.readthedocs.io, while deleting content already present in the RTD pages.

Pages to be modified:

  • Home
  • Contributor Project Ideas
  • Google Season of Documents (GSOD) 2019
  • GSOC 2017
  • GSOC 2018
  • GSOC 2019
  • Writing good commit messages

The GSOC 2020 wiki page is left as it is, as this will be linked from the GSoC official website.

Test RTD subprojects

We need to determine whether / how to use AboutCode as a top-level site/directory for all of our project documentation. The Subprojects feature - https://docs.readthedocs.io/en/stable/subprojects.html - may support this, but it is not clear how it works and how much it would impact the structure and operation of the RTD pages for subprojects like SCTK, SCIO, etc.
I think that we are looking for something that would allow us to easily maintain a directory of links to the various projects with minimal impact on how we manage the doc for each subproject.
I propose that aboutcode-toolkit would be a a good project to test with.

Space between title and subtopics( UI, UX)

Short Description

Space between title and subtopics( UI, UX) can be changed to increase the readability of end user. For example, in community section, the space between community(heading) and following sub topics can be increased.

How This Feature will help you/others

increases the readability of end user by differentiating headings from subheadings.

Can you help with this Feature

sure

Document a way to make org-wide enhancement proposals

Description

We have things that are org-wide such as coding standards and more. We should have a simple way to document and adopt these.
I suggest a process similar to Python PEPs, where we have a simple text document that goes through a PR and lands in this repo.
These could be numbered ABC-XXXX

Give examples for documenting package dependencies

When describing software packages, like Java libraries, it's quite essential to also capture any dependencies / relationships between packages. The current ABCD spec seems to be quite loosely defined in this regard. A bit too loose for my taste probably. Could you give a concrete example how e.g. the dependency of mockito-core on junit would be represented in ABCD in YAML format?

Publish PDF version of RTD documentation

We need a better way to review project documentation to suggest improvements and corrections to our project/product documentation. It is slow and difficult to review doc changes in a PR format.

This page - https://docs.readthedocs.io/en/stable/downloadable-documentation.html - seems to indicate that we could generate a PDF (or ePub or HTML) format document by adding it to a Sphinx-based workflow.

PDF output would be very useful for offline use and it could be a good way to print a review copy that anyone could annotate with a PDF editing tool.

Scancode-Toolkit Doc Improvements

The following tasks are remaining in terms of improving the Scancode-Toolkit Docs
[Before Next Release and Migration]

This part I'll complete.

  1. Major Improvements
  • Getting Started Section [ 214b8e8 ]
  • Support for First-Timers [ cdc4976 ]
  • Document support For pip install [ 94b50ca ]
  • Document python 3 related installations and others [ 94b50ca ]
  1. Linking Sphinx Docs
  • Add Intersphinx to nexB/aboutcode [ 6dbe59f ]
  • Add Intersphinx to nexB/scancode-toolkit [ 3187b34 ]
  • Add Intersphinx related Docs [ f1b2414 ]

The following sections I'd need help on:-

  1. CLI Options Marked as [ToDo] (For Improvement)
  • --license-score [Basic Options] #28
  • --generated [Basic Options]
  • --reindex-licenses [Core Options]
  • --license-clarity-score [Post Scan Options]
  1. Related Bugs Open

Inconsistent Format in Readme.md

Description

In Project section in Readme.md the projects description show inconsistent format as somewhere the colon is missing and one project idea is not start with Capital letter.

Link to Documentation Page

Where the confusion/inconsistency/incomplete documentation is.

Select Category

  • Inconsistency []
  • General Improvement []

Error

image

can someone suggest how to remove this error and what does is the error saying

Adjust TOC for https://aboutcode.readthedocs.io

We currently have the GSoC projects hidden under: https://aboutcode.readthedocs.io/en/latest/archive.html#table-of-contents.
Please move GSoC to the TOC top level after Contributing to AboutCode. It is not intuitive to look for these under Archived Pages.

At some point we will want to also promote GSoD, but we can leave it and Contributor Project Ideas (old) under the Archived Pages for now. We should remove the TOC level under Archived Pages if that is easy to do.
It would be nice to make the change quickly so that we can post the better link with the News item for completing GSoC 2022 on aboutcode.org.

migrating to aboutcode-org

Short Description

Migrate FOSS projects under nexB organization on GitHub to aboutcode-org on GitHub.

How This Feature will help you/others

Transferring ownership of AboutCode projects on GitHub from nexB to aboutcode-org will demonstrate to the community our FOSS for FOSS commitment with the projects in a FOSS organization on GitHub instead of commercial entity.

This will facilitate displaying individual projects on the new aboutcode.org site. We will also revisit the plan to better index the docs for AboutCode projects: #129

Possible Solution/Implementation Details

GitHub published docs on how to transfer ownership between organizations: https://docs.github.com/en/repositories/creating-and-managing-repositories/transferring-a-repository

GitHub should create automatic redirects following their docs:
All links to the previous repository location are automatically redirected to the new location. When you use git clone, git fetch, or git push on a transferred repository, these commands will redirect to the new repository location or URL. However, to avoid confusion, we strongly recommend updating any existing local clones to point to the new repository URL.

There could be some issues with integrations and their tokens, including:

  • PyPI deployment
  • Azure DevOps
  • GitHub Actions
  • Read the Docs
  • Slack
  • Others?

There could be some issues with migrating permissions and collaborators in the new organization.

Example/Links if Any

More documentation on transferring a repo owned by your org: https://docs.github.com/en/repositories/creating-and-managing-repositories/transferring-a-repository#transferring-a-repository-owned-by-your-organization

Can you help with this Feature

I can help with some testing but this would require someone with advanced git (and git troubleshooting) skills.

Actual repos to migrate

This list is sorted so we start with a few smaller, less critical repos to validate the move process:

Documentation is not Formatted Correctly(don't copy blindly from wiki)

I think that all Documentation of aboutcode is very messed up and we should not just copy blindly from wiki and update here. Many of the docs are using an old version of the scancode-toolkit command, So we should take care of that and correct it then only we should push it. By just copying and pasting they look like very messed (the font is also not right, spacing is also not proper)and it will be a problem later because for modification we have to take a look on every page. So I think whenever we are updating doc of aboutcode we should modify it at that instant and then only we should push it.

The second thing is wiki doc is not well defined and are the quick tutorial. During forwarding it from wiki to aboutcode doc, we should take care of this also else this will cause a problem. Later, when we have to do modifications, then all previous work will be useless and we have to remove the code.

Testing page

Description

A brief description of the Documentation Improvement or New Section request.
add the testing page for aboutcode. cc4b496

Link to Documentation Page

Where the confusion/inconsistency/incomplete documentation is.

https://aboutcode.readthedocs.io/en/latest/contributing/testing.html#testing

Select Category

  • Inconsistency []
  • New Section Request []
  • General Improvement []
  • Typo/Mistakes []
  • Other []

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.