kevinjalbert / git_statistics Goto Github PK
View Code? Open in Web Editor NEWA gem that allows you to get detailed statistics of a git repository.
Home Page: https://rubygems.org/gems/git_statistics
License: MIT License
A gem that allows you to get detailed statistics of a git repository.
Home Page: https://rubygems.org/gems/git_statistics
License: MIT License
Installed the latest git_statistics (v0.8.0):
gem install git_statistics
Fetching: rugged-0.23.3.gem (100%)
Building native extensions. This could take a while...
Successfully installed rugged-0.23.3
Fetching: language_sniffer-1.0.2.gem (100%)
Successfully installed language_sniffer-1.0.2
Fetching: git_statistics-0.8.0.gem (100%)
Successfully installed git_statistics-0.8.0
3 gems installed
When running from the command line I get the following error:
git_statistics-0.8.0/lib/git_statistics/collector.rb:17:in `each': Object not found - no matching loose object
It would totally ballin' if you got some ASCII art graphs going on. :)
Just a thought!
While the move to mojombo/grit has been great (performance-wise and ease of collection), it currently has an issue dealing with commits which have GPG signing. The solution to this would be to move to libgit2/rugged.
In addition, grit is also not ready for Ruby 2.0.x. The switch over to rugged will allow us to also expand to Ruby 2.0.x.
Using abiczo/github-notifier, it seems that the -m
flag doesn't really influence the results of the end results.
There are multiple merges in the repo, and none seem to be correctly tracked in the commit JSON files.
The builds for Travis CI are failing now, even though nothing is explicitly breaking our specs. Seems like the culprit is the a Pipe spec:
GitStatistics::Pipe#io
Failure/Error: it { pipe.io.should be_an IO }
Errno::ENOENT:
No such file or directory - time
# ./lib/git_statistics/pipe.rb:24:in `open'
# ./lib/git_statistics/pipe.rb:24:in `io'
# ./spec/pipe_spec.rb:56:in `block (3 levels) in <top (required)>'
Only thing I really noticed is that Travis CI has changed its gem --system version from 1.8.x to 2.0.x. I updated my system gem's version to 2.0.x and I still can't reproduce the error.
I'm going to make another issue for this, so we can resolve this problem. Looks like it might be interpreting 'time' as a directory/file instead of a command though, not sure why this would be the case now though.
The major refactoring was needed. New features were added as well (e.g., #1)
These changes rendered the old specs useless. This issue is to bring back specs for the old and new functionality. In addition, if any refactoring can be done to improve the maintainability then it should also be done in accordance with the new specs.
In some situations it might be useful to know what branch a commit belongs too. This information can be further expressed in the summary results (i.e., which branch is most actively developed, etc...). This issue is about collecting the branch information and storing it within the commit JSON files.
As commits can reside in multiple branches at the same time (i.e., merged/rebased commits), this information will list the source branch (according to git log --source
command, which indicates how the commit can be reached).
It might be worth having some static analysis tool to ensure that the code base (and future contributions) share a similar style. For example, tailor will check style and report any violation from a set of rules.
Currently puts are used to log specific situations. A proper logging gem should be used instead, possibly that writes to a file.
We want to identify the languages of the files which are involved in each commit. This will allow authors to be associated with the normal attributes, yet now in specify languages (i.e., find author who did the most python, etc...)
We could simply use the file extensions as a language indicator, though there is likely to be many problems with this approach. We can use a language detection tool like github/linguist instead as their approach is more sophisticated and accurate. The only problem is that they require the actual file (git blob?). This mean we essentially have to iterate over every commit and revert the files to the state of the commit then identify the language. The mentioned library can also detect vendor/generator files, so we can exclude those from our calculations.
Right now the languages are unsorted (based on order of appearance). Ideally they should be sorted either by the sort criteria (which would result in different orderings for each author) or alphabetically (consistant ordering for every author).
On large projects (i.e., torvalds/linux) there are so many commits that git_statistics will run out of memory. This is because all the commit data is held in memory.
To avoid this problem the commits should be processed such that we only keep x number of commits in memory at a time. We can do this by only keeping cumulative data of the commits we process (i.e., read a commit and then toss it), though this will not allow us to save any of the data. We can also append commits to a file in batches. This way only x number of commits are in memory, yet we still have a complete collect of commits in a json file for loading/updating.
In the second approach during the results phase (cumulative data is gathered) we would read x number of commits in memory and collect the data, then read the next x and so forth.
Overall this approach should reduce the memory consumption, though at a little I/O cost.
In in the situation that the current repository to be analyzed is in a detached HEAD state (i.e., switched to a specific tag, or commit) the following will occur:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `git --no-pager log (no branch) general-cleanup issue_5 master --date=iso --reverse --no-color --find-copies-harder --numstat --encoding=utf-8 --summary --format="%H,%an,%ae,%ad,%p"'
/Users/jalbert/.rvm/gems/ruby-1.9.3-p362/gems/git_statistics-0.5.1/lib/git_statistics/formatters/console.rb:17:in `prepare_result_summary': Parameter for --sort is not valid (RuntimeError)
from /Users/jalbert/.rvm/gems/ruby-1.9.3-p362/gems/git_statistics-0.5.1/lib/git_statistics/formatters/console.rb:35:in `print_summary'
from /Users/jalbert/.rvm/gems/ruby-1.9.3-p362/gems/git_statistics-0.5.1/lib/git_statistics.rb:47:in `execute'
from /Users/jalbert/.rvm/gems/ruby-1.9.3-p362/gems/git_statistics-0.5.1/bin/git_statistics:5:in `<top (required)>'
from /Users/jalbert/.rvm/gems/ruby-1.9.3-p362/bin/git_statistics:23:in `load'
from /Users/jalbert/.rvm/gems/ruby-1.9.3-p362/bin/git_statistics:23:in `<main>'
As we can see the (no branch) from the git branch
command is being used and is causing the problem. We should probably just ignore the (no branch) and continue to grab git statistics from other valid branches.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.