Giter VIP home page Giter VIP logo

comparative-analysis-functional-dependency's Introduction

Comparative Analysis of Text Mining and Clustering Techniques for Assessing Functional Dependency between Manual Test Cases

Appendix

The supplementary appendix materials for the article Performance comparison of Different Text Mining and Clustering Techniques for Functional Dependency are provided in the upcoming pages.

Plotting UMAP results

Figures 1 and 2 illustrate the results of utilizing 7 different string distance algorithms for the text mining where the Agglomerative algorithm is used for the clustering in Figure 1. A total of 5 clusters were achieved and mirrored by the the Agglomerative clustering algorithm in Figures 1a, 1b, 1c and 1d respectively. The results of using some normalized compression distance algorithms for text mining and DBSCAN and HDBSCAN algorithms are presented in Figures 3 and 4. As emphasized before the HDBSCAN algorithm can provide a cluster of the non-clusterable data points which can be interpreted as independent test cases in this study. Generally, the HDBSCAN algorithm provides more clusters compared to all other utilized clustering algorithms. As we can see in Figures 3a, 3b, 3c, 4a and 4b more than 200 clusters are generated where each color represent a unique cluster. However, the combination of the same text mining method with the DBSCAN leads to having all test cases inside of one cluster mirrored in Figure 3d. The visualization results of employing two machine learning approaches are mirrored in Figure 5, where Fig- ure 5a represents the combination of the Doc2Vec with Agglomerative and Figure 5b indicates the combination of SBERT with Affinity respectively.

Overlap coefficient with Agglomerative. (a) Overlap coefficient with Agglomerative. Ratcliff-Obershelp with Agglomerative. (b) Ratcliff-Obershelp with Agglomerative.
Jaro with Agglomerative. (c) Jaro with Agglomerative. Levenshtein with Agglomerative. (d) Levenshtein with Agglomerative.
Figure 1 - String distance algorithms are employed for text mining.
Jaccard with Affinity. (a) Jaccard with Affinity. Sorensen–Dice coefficient with Affinity. (b) Sorensen–Dice coefficient with Affinity.
q-gram with DBSCAN. (c) $q$-gram with DBSCAN.
Figure 2 - String distance algorithms are employed for text mining.
bzip with HDBSCAN. (a) bzip with HDBSCAN. Deflate with HDBSCAN. (b) Deflate with HDBSCAN.
gzip with HDBSCAN. (c) gzip with HDBSCAN. Levenshtein with Agglomerative. (d) XZ with DBSCAN.
Figure 3 - Normalized compression distance algorithms are employed for text mining.
Zlip with HDBSCAN. (a) Zlip with HDBSCAN. Zstd with HDBSCAN. (b) Zstd with HDBSCAN.
Figure 4 - Normalized compression distance algorithms are employed for text mining.
Doc2Vec with Agglomerative. (a) Doc2Vec with Agglomerative. SBERT with Affinity. (b) SBERT with Affinity.
Figure 5 - Machine learning algorithms are employed for text mining.

The results of sensitivity analysis using the Mantel model

Matrix of Mantel correlations between the employed text mining algorithms is presented in Figure 6.

Matrix of Mantel correlations

Figure 6 - Matrix of Mantel correlations between the employed text mining algorithms (both tokenized and non- tokenized version) distances between all pairs of 784 source points.The rows and columns of the matrix represent each of the 28 text mining algorithms. The color of the cell corresponds to the magnitude of the Mantel $r_M$ correlation between the algorithms distances, indicated by the intersection of the row and column.

comparative-analysis-functional-dependency's People

Contributors

leohatvani avatar sahartahvili avatar

Watchers

 avatar

Forkers

sahartahvili

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.