jenkins-infra / jenkins-contribution-stats Goto Github PK

License: MIT License

Shell 84.57% jq 1.49% Dockerfile 13.93%

jenkins-contribution-stats's Introduction

Jenkins contribution statistics

A set of tools to extract and analyze the number of software contributions and their submitter. This is a strict interpretation of "contribution". Other statistics (lile those from the Linux Foundation), analyse all interactions with a project (PR, comment, issue creation, review).

Suggested usage

Retrieve data since the last time the script ran with ./collect-missing-data.sh.
Update/create the consolidated data file with ./consolidate-data.sh.
Update/create global summary file (nbr submitters and submissions per month) with ./submission-submitter-report.sh.

All the above operations can be performed with update-stats.sh

Script list

Following scripts are available:

check-prerequisites.sh checks whether all required programs are available on the system
extract-monthly-submissions.sh extracts the monthly data from GitHub and stores it in ,/data/ directory as a CSV file. It also generates a list with the number of PR for each submitter in the given month.
consolidate-data.sh takes all the available monthly data and creates a single data file, consolidated_data/submissions.csv. If a data file already exists, it is backuped.
collect-missing-data.sh will extract all the monthly data files since JAN-2020. If the output already exists, it will skip that particular month.
submission-submitter-report.sh uses the existing monthly data to generate a summary CSV with the number of submissions and the number of submitters. The resulting output is stored in consolidated_data/summary_counts.csv
update-stats.sh is the script that performs the necessary update operation in sequence

Produced datafiles

File name	Comment	produced by
`data/submissions-YYYY-MM.csv`	List of PRs created in a given month	`extract-monthly-submissions.sh`
`data/pr_per_submitter-YYYY-MM.csv`	Nbr of PRs submitted by a user for a given month	`extract-monthly-submissions.sh`
`data/comments_YYYY-MM.csv`	List of comments created in a given month	`extract-monthly-submissions.sh`
`data/comments_per__commenter-YYYY-MM.csv`	Nbr of comments made by a user for a given month	`extract-monthly-submissions.sh`
`consolidated_data/submissions.csv`	All extracted submissions (since Jan 2020)	`consolidate-data.sh submissions`
`consolidated_data/submissions_overview.csv`	Global submissions pivot table (user/month -> nbr prs)	`consolidate-data.sh submissions`
`consolidated_data/top_submissions.csv`	35 top submitters over the last 12 month	`consolidate-data.sh submissions`
`consolidated_data/top_submissions_evolution.csv`	New or churned top submitters (compared to 3 months before)	`consolidate-data.sh submissions`
`consolidated_data/comments.csv`	All extracted comments (since Jan 2020)	`consolidate-data.sh comments`
`consolidated_data/submissions_overview.csv`	Global comments pivot table (user/month -> nbr comments)	`consolidate-data.sh comments`
`consolidated_data/top_submissions.csv`	35 Top commenters over the last 12 months	`consolidate-data.sh comments`
`consolidated_data/top_submissions_evolution.csv`	New or churned top commenters (compared to 3 months before)	`consolidate-data.sh comments`

pre-requisite

Prerequisites are checked with check-prerequisite.sh. This is the list of executables that have to be installed in order for the automation to work.

gh : the Github command line utility
jq : Json query tool
datamash : data manipulation tool (CSV pivots)
jenkins-contribution-aggregator : extracts the to submitters or commenters from the global pivot tables
jenkins-contribution-extractor : various extraction and jenkins data handling tools
gdate : GNU date manipulation tool for Mac OS (part of coreutils, installable with brew)

Data and process flow

Not to self: to generate the mermaid graphic by hand docker run -i -t --rm -v "$PWD:/data" jmmeessen/render-md-mermaid:v2

diagram source

This details block is collapsed by default when viewed in GitHub. This hides the mermaid graph definition, while the rendered image linked above is shown. The details tag has to follow the image tag. (newlines allowed)

flowchart TD
	start1(("`Start
	(others)
	 `"))

    start2(("`Start
    (jenkins)
     `"))

    extract_end((End))

    %% Processes

	A[[update-benchmark-stats.sh]]
	B[[update-stats.sh]]
    C[[collect-missing-data.sh]]
    D[[consolidate-data.sh submissions]]
    E[[consolidate-data.sh comments]]
    F[[submission-submitter-report.sh]]
    G[[comment-commenter-report.sh]]
    extracData[[extract-monthly-submissions.sh]]
    get_submitters{{"jenkins-contribution-extractor get submitters {org}"}}
    get_commenters{{"jenkins-contribution-extractor get commenters"}}
    top_extract{{jenkins-contribution-aggregator </br> extract}}
    top_compare{{jenkins-contribution-aggregator </br>compare}}

    %% data files
    submission_month[(submission_YYMM.csv)]
    monththlyPivot_submit[(pr_per_submitter.csv)]
    comments_month[(comments_YYMM.csv)]
    monththlyPivot_comment[(comments_per_</br>_commenter.csv)]
    global_submissions[(submissions.csv)]
    global_submissionsOverview[(submissions_overview.csv)]
    top_submission[(top_submissions.csv)]
    top_submission_evol[(top_submissions_evolution.csv)]

    global_comments[(comments.csv)]
    global_commentsOverview[(comments_overview.csv)]

    %% legend
    legend_app[[Application </br>or script]]
    legend_sub{{sub routine}}
    legend_data[(data file)]
    legend_app --> legend_sub -.-> legend_data

    %% pivot processes
    monthlypivot_subm{{pivot monthly data}}
    monthlypivot_comment{{pivot monthly data}}
    subm_overview_pivot{{pivot}}
    comment_overview_pivot{{pivot}}

    
    %% flow
	start1 --> A -- loops through org --> B
	start2 --> B
    B --> C -- monthly data missing ? --> extracData  --> get_submitters
    get_submitters -.-> submission_month --> monthlypivot_subm -.-> monththlyPivot_submit --> extract_end --> C
    submission_month --> get_commenters -.-> comments_month --> monthlypivot_comment -.-> monththlyPivot_comment --> extract_end
    B --> D --> global_submissions
    global_submissions --> subm_overview_pivot -.-> global_submissionsOverview
    global_submissions --> top_extract --> top_submission
    global_submissions --> top_compare --> top_submission_evol
    B --> E --> global_comments --> comment_overview_pivot -.-> global_commentsOverview
    B --> F 
    B --> G

jenkins-contribution-stats's People

Contributors

Stargazers

Watchers

Forkers

jwhite0409 gounthar alyssat jmmeessen lemeurherve

jenkins-contribution-stats's Issues

./update-stats.sh ends with an error locally.

Locally, I get:

creating pivot table

Computing top comments
2024/06/03 17:47:35 Unexpected error loading./consolidated_data/comments_overview.csv
%!(EXTRA <nil>)
Error: Invalid input file.
Usage:
  jenkins-top-submitters extract [input file] [flags]

Flags:
  -h, --help           help for extract
      --history        Outputs the available activity history for the top submitters
  -m, --month string   Month to extract top submitters. (default "latest")
  -o, --out string     Output file name. Using the ".md" extension will generate a markdown file  (default "top-submitters_YYYY-MM.csv")
  -p, --period int     Number of months to accumulate. (default 12)
  -t, --topSize int    Number of top submitters to extract. (default 35)
      --type string    The type of data being analyzed. Can be either "submitters" or "commenters" (default "submitters")
  -v, --verbose        Displays useful info during the extraction

But the script works fine in a GitHub Action:
https://github.com/gounthar/jenkins-submitter-stats/actions/runs/9353654387/job/25744673444

Could it be linked to the version of one of your binaries being outdated on my machine?
If that's the case, the check-prerequisites.sh should have warned me. 🤔
And my brew packages seem to be up to date:

brew update && brew doctor
==> Updating Homebrew...
Updated 1 tap (homebrew/core).
==> New Formulae
vexctl
Your system is ready to brew.

Add bot exclusion to collect-missing-data

Create command to accumulate PR and Comment counts

This should be ponderated with a coefficient as a comment is not "worth" the same as creating a PR

Revert to initial mermaid GHA

Integrate new top-submitters tools to generate MD output

required updates caused by repository ownership change and rename

What feature do you want to see added?

With the rename of the tooling used by this processing chain, several items must be corrected so that the daily and monthly processing continues to work

Upstream changes

Tasks

Beta Give feedback

replace all occurence of jenkins-stats with jenkins-contribution-extractor in scripts and documentation
replace all occurences of jenkins-top-contributor with jenkins-contribution-aggregator in scripts and documentation
check automatic updaters (dependabot/updateCLI)
Check docker image and automated run (GitHub action)
ask Jmm to remove old Brew Tap for full validation
Allow direct commit to main branch
Options

The monthly update commit title is wrong

Reproduction steps

Expected Results

Monthly changes computed on 06/01/2024.

Actual Results

Monthly changes computed on $(date +'%m/%d/%Y').

Anything else?

No response

Minimal version check should be moved to `check-prerequisites.sh`

So that we have an early failure and a single place to maintain it

add a message at the end of the main bash files that indicate that the process ended succesfully

It is not possible to see clearly whether the script ended with an error or succesfully

Add data file with honored contributor

In order to display the name of an honored contributor at the bottom of the https://contributors.jenkins.io/ page, data is required.

These are the proposed specifications for that new data extraction feature:

a github action will run the extraction process on regular base (daily or weekly). It will
- generate a data file (CSV format) with always the same name and at the same location (consolidated_data)
- commit and push it to the GitHub repository
the process will pick, randomly, a GitHub user that submitted at least one PR during the previous month
the resulting data will show
- the time stamp of the extraction
- the month examined
- the GitHub handle of the contributor
- the URL of the GitHub user's page
- the number of PRs submitted in the last month
- the list of repositories where the PR's were submitted to

The proposed format would be:

RUN_DATE, MONTH, GH_HANDLE, GH_HANDLE_URL, NBR_PR, REPOSITORIES
2024-05-15T13:02:24, 2024-04, olamy, https://github.com/olamy, 14, "jenkinsci/myproject jenkinsci/mysecondprj"

The file name will be "https://github.com/jmMeessen/jenkins-submitter-stats/tree/main/consolidated_data/honored_contributor.csv". A prototype file will be made available asap to allow concurrent work on the UI. Note that the org and repository will change as it will be moved to the JenkinsCi org.

Notes

The process will be built using bash and data files already extracted for the Jenkins Submitter Stats. Should some key features or information not be available using this strategy, a specific GOlang extractor or a new option to jenkins-stats will be writen.
This specification proposal is for discussion and subject to change based on feasbility. The objective is to deliver quickly a proof of concept of the full feature.
It can and will be enhanced in later phases once the initial version is running.

Make `date` platform independant

The scripts rely on GNU date. The date application on mac doesn't have the expected behavior. It requires the installation of gdate.

The script's date manipulations must work on both platform

Commenters for listed PRs may change

The number of comments may change run after run

comments can spill over month boundaries
comments can be added over time

Todo

Check the commenter's collection logic
create script to update commenters with a second run

UpdateCLI generated PR comment is not correct

When UpdateCLI generates a PR, the additional commit details reference to a demo program

 ... he 'Hello World!' tutorial

Integrate new top-submitter tool version (plot functionality)

Fix issue where endDate for compare was incorrect

Should be solved with version v1.2.8 of jenkins-top-submitters

Make the parameters used in the top calculation obvious

"The devil is in defaults"

Also update readme

Add bash file to produce the honored submitter datafile

Add documentation in readme about produced files

Review data storage policy (and format)

What feature do you want to see added?

This application, still in the exploration phase for what data to extract, has evolved in an iterative way. Some parts are less then optimum by choice. The data storage is one of them: data is stored has been simplified. It is based on CSV or MarkDown files stored in the Git Repository itself.

The migration from a private (lab) environment to a public community organisation (jenkins-infra) has shown these short comings. In order for the daily "Honored Contributor" automated extraction to work, the main branch had be left unprotected. Which is not a recommended pattern and not part of the jenkins-infra policies.

The data storage policy should be reviewed, alternative strategies proposed, and implementation prioritized.

cc: @gounthar @lemeurherve

Upstream changes

No response

Something is wrong with the STATS_GET => empty users

It appears that empty users exists in the collected data. They appear at the top of the pivot table

Honor contributor GitHub action now fails.

Running Homebrew as root is extremely dangerous and no longer supported.
As Homebrew does not drop privileges on installation you would be giving all 0.318 build scripts full access to your system.

I'll work on a PR to solve that.

Update graph arrows (top-submitters)

the top submitter file is computed from the overview csv

Move the repositories to a Jenkins organisation

Following repositories should be moved:

jmMeessen/jenkins-submitter-stats
jmMeessen/jenkins-stats
jmMeessen/jenkins-top-submitters
jmMeessen/homebrew-tap

A GitHub App is also needed to provide a token to update the homebrew-tap repository from GitHub actions running in jenkins-stats and jenkins-top-submitters repositories

jenkins-infra / jenkins-contribution-stats Goto Github PK

jenkins-contribution-stats's Introduction

Jenkins contribution statistics

Suggested usage

Script list

Produced datafiles

pre-requisite

Data and process flow

jenkins-contribution-stats's People

Contributors

Stargazers

Watchers

Forkers

jenkins-contribution-stats's Issues

What feature do you want to see added?

Upstream changes

Tasks

Reproduction steps

Expected Results

Actual Results

Anything else?

Notes

Todo

What feature do you want to see added?

Upstream changes

Recommend Projects

Recommend Topics

Recommend Org