Defect Prediction
Scientific Computation Software
EMBLEM Paper repository:
The open source code for this work is updated here: https://github.com/sillywalk/defect-prediction/tree/dev
License: GNU General Public License v3.0
The open source code for this work is updated here: https://github.com/sillywalk/defect-prediction/tree/dev
Traditional approach:
1/ release based : where all the bugs for are calculated at once per release and all the independent metrics are collected for all the files within the project
2/ Learning from all the available examples in the past[:i_release]
=> build defect prediction model from traiining all the files from the past releases
Modern approach:
1/ just-in-time based : where all the rows are files being updated per comit within a release are and all the independent metrics are collected for only those updated files.
2/ incremental learning: learn from i
, predicton i + 1
*** Notes: let say abinit_core.F90
being updated 3 times, the dataset for that release will contain 3 defective_abinit_core.F90
and 3 clean_adbinit_core.F90
. If the file abinit_miscellaneous.F90
is updated in release 1 but not updated in release 2, the dataset in release 2 will not contain any information on abinit_miscellaneous.F90
Intuitions:
1. not all files are updated from release to release (only necessary files for bug fixing and functions adding).
2. incremental learning, the previous version is enough to understand and predict if the change of future files is bug fixing or not
Train Test Prec Pd Pf F1 IFA PCI20
1 2 67 66 52 67 0 20
2 3 59 63 64 61 2 21
3 4 59 37 37 45 0 19
4 5 81 62 58 71 1 16
5 6 53 62 60 57 1 21
6 7 86 48 31 62 0 17
7 8 82 45 40 58 4 16
8 9 53 57 58 54 0 23
9 10 65 57 49 61 0 20
Train Test Prec Pd Pf F1 IFA PCI20
1 2 52 53 49 52 0 18
2 3 50 62 60 56 0 16
3 4 52 51 52 52 3 16
4 5 50 44 45 47 0 16
5 6 51 40 41 45 0 15
6 7 54 43 40 48 3 17
7 8 52 48 46 50 0 15
8 9 54 46 46 50 1 16
9 10 51 50 49 51 1 16
Train Test Prec Pd Pf F1 IFA PCI20
1 2 55 46 50 51 0 21
2 3 52 47 45 50 13 23
3 4 46 56 68 50 0 21
4 5 54 56 56 55 3 22
5 6 57 59 54 58 1 21
6 7 52 52 51 52 0 24
7 8 48 46 47 47 5 23
8 9 47 44 58 46 0 20
9 10 62 54 44 58 1 21
RQ1: Are traditional method of keywords searches of commits with defect prediction on release levels performing well?
RQ3: Is keywords searching for commits consistent with our mechanical turks? How about human-in-the-loop AI bug reports reading method?
KEYWORDS:
Precision | Recall | F1 | |
---|---|---|---|
abinit | 58.94% | 90.13% | 71.20% |
libmesh | 43% | 92% | 59.12% |
lammps | 13.38% | 89.62% | 23.28% |
mdanalysis | 51.43% | 89.62% | 31.28% |
FASTREAD:
Precision | Recall | F1 | |
---|---|---|---|
abinit | 72.63% | 87.83% | 79.51% |
libmesh | 49.89% | 90.06% | 64.20% |
lammps | 23.85% | 97.53% | 34.27% |
mdanalysis | 41.44% | 94.43% | 57.60% |
For both precision and f1, FASTREAD achieved better performance than just keywords searches, human-in-the-loop AI bug reports reading method are more consistent to the result of our mechanical turks than just keyword searches.
From RQ2 + RQ1, we know our method on FASTREAD labeled data would perform great to predict our ideal human labeled defect prediction.
We can then generalize/scale it up to predict the next release of on it's own labeling method.
25%, 50%, and 75% percentile of the absolute difference between our method and commit-guru.
commit.guru doesn't use smote so the data imbalance is a big problem.
4 treatments in total that got repeated 15 times per treatment:
PF results summary:
https://docs.google.com/spreadsheets/d/1UNOxsWn_eDygba70HaNikUcZwWYsyeVeisOjPk6z2GY/edit?usp=sharing
Raw:
_all results: https://docs.google.com/spreadsheets/d/1iOpCQSixeIyofm1-GdizgWoueCc89NcCQefZC3f8UJY/edit?usp=sharing
_incremental results: https://docs.google.com/spreadsheets/d/1tmrfi3lbcgreN7WF2XhpUjD5OQFIiJEeLbLsdokGj6E/edit?usp=sharing
_reduce_1: https://docs.google.com/spreadsheets/d/10e0c7obf4RnI10gqOKhBiMbKINh0mL6aJqFEofAW09k/edit?usp=sharing
_reduce_2:
https://docs.google.com/spreadsheets/d/1KVAwkxvZtwfgUenracYQJrR6DVTDxfFmiDLiL0cfrB4/edit?usp=sharing
RQ2: Are traditional method of keywords searches of commits with defect prediction on commit levels performing well?
RQ1: Are traditional method of keywords searches of commits with defect prediction on release levels performing well?
mdanalysis - 775 bugs with 792 of estimated bugs in 1274 reviewed commits (3303 in total).
lammps - 573 bugs with 601 of estimated bugs in 858 reviewed commits (7324 in total).
libmesh - 1399 bugs with 1495 of estimated bugs in 2221 reviewed commits (8679 in total).
abinit - 676 bugs with 698 of estimated bugs in 1121 reviewed commits (5392 in total).
Figures below indicating the numerical improvements outside of statistical testing results from using F3T as a buggy commit identification and prediction system instead of the standard system of Commit.Guru. In this figure, the higher the vertical bars, the better the F3T performs in comparison to another learning method. Let X be the F3T score and Y is the score from another data mining method, then on this chart, the height of each bar is median X-Y seen across all tests in a project:
RQ4: Can we do better with human-in-the-loop AI bug reports readings with defect prediction on commit levels performing?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.