Giter VIP home page Giter VIP logo

repostat's People

Contributors

cswimbound avatar evanpurkhiser avatar hmm34 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

repostat's Issues

Rename limit on big projects

$ time python gitty.py ../../collected-repos/git-repos/linux.git
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 434 and retry the command.
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 1414 and retry the command.
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 581 and retry the command.
warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 559 and retry the command.

real    49m27.307s
user    13m5.675s
sys 35m11.321s

Add support for merge commits

Right now, we don't support data analysis of certain commits. Or atleast - we try to analyze them, and they return incorrect information that could skew our results. These appear to only be on merge commits.

For example - feeding Spoon-Knife to repostat gives us the following information for commit bdd3996:
FILES: 0
HUNKS: 0
LINES: 0
However, this commit changed 1 file, 3 hunks, and 26+2 lines.

More examples can be found by running repostat on faker, which has more merge commits, where 57e677f provides the output:
FILES: 0
HUNKS: 0
LINES: 0
Even though this commit changed 8 files, 7 hunks, and 830+1 lines.

Note that this issue is not on all merge commits. See commits e77b223 and b5dad5c. Repostat shows files, lines, and hunks > 0 for these merge commits. However, the files and hunks are off. Expecting output for files to be 2 and 3, respectively, yet repostat outputs 14 and 15.

Configurable data fields

Make each data field that will be stored a flag that can be set with OR operations to allow for greater configurability. For example, setFields( AUTHOR | COMMITTER | MSG ); would only read and store the author, committer, and commit message from the repository.

Blue skying features

  • Improve the CLI to be more robust in terms of what to collect and how to output the data
  • Add threading to cut down on processing time
  • Provide timing metrics to see if the tool scales
  • D3 Reports

Output statistics into an interactive report

Output the information gathered from the repostat traversal to provide data drill down support in an interactive HTML report that render the charts (d3 is an option). It should be generic enough such that it could be customized to present different granularities of statistics as more information becomes available.

Some suggestions for useful charts:

  • lines/files/hunks/commits (LFHC) over time
  • LFHC within a branch
  • correlations between LFHC within a commit

LICENCE?

Does anyone have a preferred licence? I'm a fan of the MIT Licence since it's straight forward and easy to understand.

Segfault when using the diff struct

I'm stuck trying to use the git_diff struct that's returned when we call git_diff_tree_to_tree. Maybe someone better at C++ can figure it out?

I was testing by calling the git_diff_print function like so:

git_diff_print(diff, GIT_DIFF_FORMAT_PATCH, NULL, NULL);

Once this works we can convert it to a patch (which is already in there and seems to work ok even though printing segfaults).

Then we can get the following:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.