GeSumGenEval

Automatic Summary Generation and Evaluation for German Language

Datasets

The models have been trained on these data-sets:

baselines.ipynb contains code to load and pre-preprocess data-sets (MLSUM and GeWiki), generate baseline summaries (Random, Lead, TextRank) and evaluate them using different metrics (rouge, bleu, meteor, bert-score, mover-score, blanc, js-divergence, supert), and save the results to the file system.
bertsum.ipynb contains code to pre-process data for BertSum model, train BertSum model, and then predict both Oracle and BertSum summaries using the trained model for the test set and write them to the file system.
matchsum.ipynb contains code to pre-process data for MatchSum model, train MatchSum model, and then predict summaries using the trained model for the test set and write them to the file system.
quality_estimation.ipynb contains code to train our Quality Estimation models, report accuracy on the test set and save all the trained models to the file system.
data_analysis.ipynb contains code to predict the scores for the selected 60 summaries from MLSUM data-set using our trained Quality Estimation models, as well as to statistically analyse our previously saved evaluation results.

Summary	ROUGE-1	ROUGE-2	ROUGE-L	BLEU	METEOR	BERT-Score	Mover-Score	BLANC	JS
Random-3	0.143727	0.052665	0.127002	0.034043	0.10386	0.565969	0.51261	0.070865	0.359637
Lead-3	0.366559	0.276914	0.330138	0.173749	0.238569	0.668897	0.572056	0.069841	0.358985
TextRank	0.201283	0.084224	0.168468	0.048354	0.114291	0.579553	0.521592	0.05805	0.387468
BertSum	0.38795	0.286524	0.34782	0.174862	0.246264	0.673198	0.563852	0.07273	0.349151
MatchSum	0.419047	0.33233	0.389656	0.326003	0.241537	0.690606	0.607968	0.03716	0.430846
Oracle	0.552275	0.434004	0.513012	0.379408	0.320451	0.760874	0.676651	0.043763	0.417275

Summary	Overall Quality	Coherence	Readability	Fluency	Informativeness
TextRank	3.37	3.25	3.68	3.51	2.67
BertSum	3.44	3.23	3.77	3.39	3.2
MatchSum	3.85	3.49	4.22	3.78	3.11
Expert	3.84	3.65	4.33	3.98	3.07

Summary	ROUGE-	ROUGE-2	ROUGE-L	BLEU	METEOR	BERT-Score	Mover-Score	Blanc	JS
Random-3	0.186969	0.061311	0.148257	2.353784	0.118362	0.568394	0.513921	0.136810	0.338470
Lead-3	0.212807	0.076299	0.1672088	2.719979	0.127271	0.5871192	0.516711	0.133763	0.343669
TextRank	0.237437	0.086741	0.1761551	2.761368	0.140073	0.5919869	0.520563	0.133907	0.333639
BertSum	0.286785	0.124316	0.2222190	4.758008	0.152617	0.6245011	0.527420	0.136509	0.336441
MatchSum	0.252041	0.100066	0.2017617	3.991012	0.128695	0.6085708	0.524253	0.106179	0.378945
Oracle	0.383839	0.201093	0.3066306	10.48782	0.163482	0.6645835	0.543972	0.092569	0.398871

Coherence	Consistency	Fluency	Relevance
41.56%	80.94%	70.00%	54.37%

Coherence	Consistency	Fluency	Relevance
43.75%	41.56%	40.94%	53.44%

The project dependencies are defined in requirements.txt. We have borrowed code and datasets from the following repositories in this project: