Giter VIP home page Giter VIP logo

gh-pyxplorer's Introduction

GitHub pyxplorer

You have a GitHub organization and you want to explore it? With this simple interactive crawler you can explore a GitHub {organization, repository, or a list of repositories} and extract some insightful information. Please note: it uses GitHub's API.

Details

This tool is a fundamental asset for any Open Source Program Office (OSPO). In the complex landscape of open source software, it's crucial to have a clear understanding of your organization's repositories. This tool provides a comprehensive overview of your repositories, including details such as the repository name, license, and primary language.

Regular scans of your repositories are essential for maintaining compliance and ensuring the health of your open source projects. This tool makes it easy to perform these scans on a regular basis, providing you with up-to-date information and helping you to identify any potential issues early.

How to run

Install

pip3 install -r requirements.txt

Launch

export GH_TOKEN=<your_github_token>
python3 crawler.py -i org <organization_name> -o print

Options

The crawler accepts the following command-line arguments:

  • -i or --input [required]: Specifies the input type. This should be either:

    • org for a GitHub organization,
    • repo for a specific repository,
    • list for a list of repositories.
  • <name> [required]: The name of the organization, repository or file name containing a list of repositories to explore.

  • -o or --output [required]: Specifies the output type. This can be:

    • print [default] to print the results to the console,
    • file to write the results to a file.
  • -f or --fields: Specifies the fields to include in the output. This should be a space-separated list of field names. The default is 'name license language'. The available fields are:

    • name for the repository name,
    • license for the repository license,
    • language for the repository's primary language,
    • private for private repositories [NOT IMPLEMENTED YET],
    • public for public repositories [NOT IMPLEMENTED YET]
    • archived for archived repositories

For example, if you want just the name and license fields as output:

python3 crawler.py -i org <organization_name> -o print -f name license

NOTE: GH heuristics catches if a repo has a license by checking the LICENSE.md or LICENSE files. Here we are digging a bit deeper in order to see if there are other files that "may" contain licensing info in the root of the repo.

Output

Depending on the type of parameter passed to the -o option, the output will be printed to the console or written to a file.

The output is a CSV-like structure with the following fields:

repo_name,license,primary_language

In case of file, this is the output (JSON-like):

{
    "group name": {
        "count": total number of repositories for the group,
        "private-repos": [
            "name of private repository 1",
            "name of private repository 2",
            ...
        ],
        "public-repos": [
            "name of public repository 1",
            "name of public repository 2",
            ...
        ]
    },
    ...
}

Testing

Run

python3 -m unittest tests.py

License

MIT License, see LICENSE

gh-pyxplorer's People

Contributors

dependabot[bot] avatar libremente avatar

Stargazers

 avatar

Watchers

 avatar

gh-pyxplorer's Issues

Fix tests

Tests should be rewritten given the refactoring

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.