Giter VIP home page Giter VIP logo

sumto_financial_summarization's Introduction

SumTO @ FNS 2020

The repository include the evaluation code fot the SumTO summarization system proposed for the FNS 2020 Shared Task.

Evaluation script

  • Summarizer.py include the code for the Summarizer python object. It is able to initialize the model and perform the summarization using the .summarize() function
  • summarize.py contains the code to initialize and apply the model to pre-parsed input data collections.
  • In summarize.py: DATA_DIR and TEST_DIR should be set according to your environment configuration
  • In summarize.py: YourSystemID should be set according to your output folder (it will contain the summarized documents at the end of the summarization process)
  • components/Dataset.py contains the Dataset class exploited by the summarization algorithm to predict the summaries.
  • create_dataset.py contains the instructions to create and store the Dataset object (this version is intended explictly for the test set).

Pre-trained Financial model

Available at https://huggingface.co/morenolq/SumTO_FNS2020 or using the transformers python library with the tag morenolq/SumTO_FNS2020

Citation (Coming Soon)

La Quatra, M., & Cagliero, L. (2020, December). End-to-end Training For Financial Report Summarization. In Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation (pp. 118-123).

https://www.aclweb.org/anthology/2020.fnp-1.20.pdf

sumto_financial_summarization's People

Contributors

morenolaquatra avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

sumto_financial_summarization's Issues

Unable to generate results

I ran create_datatset.py on a PDF and got Datatest_parsed.bkp. When I run summarize.py I get an error: ValueError: attempt to get argmax of an empty sequence

Complete Error:
2022-06-20 15:59:26,118 - root - INFO: Dataset - Parsing Test Data
0 / 10000
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/mayyankg/.conda/envs/finance_summ/lib/python3.7/site-packages/multiprocess/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/mayyankg/.conda/envs/finance_summ/lib/python3.7/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/mayyankg/Desktop/newspulse/summarization/SumTO_financial_summarization/components/Dataset.py", line 118, in job_each_file_test
d["raw_text"] = open(self.test_dir + k, "r", encoding="utf-8").read()
FileNotFoundError: [Errno 2] No such file or directory: './Data/test_articles/Annual Report for the year ended 31 December 2021.txt'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "create_dataset.py", line 28, in
data.parse_test_data()
File "/home/mayyankg/Desktop/newspulse/summarization/SumTO_financial_summarization/components/Dataset.py", line 133, in parse_test_data
p.map(self.job_each_file_test, list_keys)
File "/home/mayyankg/.conda/envs/finance_summ/lib/python3.7/site-packages/multiprocess/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/mayyankg/.conda/envs/finance_summ/lib/python3.7/site-packages/multiprocess/pool.py", line 657, in get
raise self._value
FileNotFoundError: [Errno 2] No such file or directory: './Data/test_articles/Annual Report for the year ended 31 December 2021.txt'
(finance_summ) mayyankg@D562:/Desktop/newspulse/summarization/SumTO_financial_summarization$ python create_dataset.py
2022-06-20 16:00:35,075 - root - INFO: Dataset - Parsing Test Data
(finance_summ) mayyankg@D562:
/Desktop/newspulse/summarization/SumTO_financial_summarization$ python create_dataset.py
2022-06-20 16:01:37,503 - root - INFO: Dataset - Parsing Test Data
(finance_summ) mayyankg@D562:~/Desktop/newspulse/summarization/SumTO_financial_summarization$ python summarize.py
2022-06-20 16:01:56,973 - root - INFO: Summarizer - initializing summarizer
2022-06-20 16:01:56,973 - root - INFO: Summarizer - Loading model (auto)
Traceback (most recent call last):
File "summarize.py", line 25, in
summy = Summarizer(test_set, "morenolq/SumTO_FNS2020")
File "/home/mayyankg/Desktop/newspulse/summarization/SumTO_financial_summarization/Summarizer.py", line 41, in init
free_gpu = int(self.get_freer_gpu())
File "/home/mayyankg/Desktop/newspulse/summarization/SumTO_financial_summarization/Summarizer.py", line 53, in get_freer_gpu
return np.argmax(memory_available)
File "<array_function internals>", line 6, in argmax
File "/home/mayyankg/.conda/envs/finance_summ/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1195, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out)
File "/home/mayyankg/.conda/envs/finance_summ/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/home/mayyankg/.conda/envs/finance_summ/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.