gregdurrett / berkeley-doc-summarizer Goto Github PK

The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploits syntactic information to compress it, and uses coreference constraints to ensure clarity.

License: GNU General Public License v3.0

Scala 65.34% Perl 21.20% Shell 0.43% Java 13.03%

berkeley-doc-summarizer's People

Contributors

Stargazers

Watchers

berkeley-doc-summarizer's Issues

Issue with formats used

Hi @gregdurrett

I am currently using the Entity Preprocessing Driver main method to turn my regular .txt files into the (Conll?) format understood by this summarizer however I am getting issues at the moment with the ConllReader class used in the Summarizer class unable to parse some of the generated lines (in the assembleConstTree method because some lines appear to be missing a "*")

Would you be able to shed more light on the Conll format that the summarizer is expecting?

Thanks,
Harry

Preparing the dataset

Your instructions mention:

To prepare the dataset, first you need to extract all the XML files from 2003-2007 and flatten them into a single directory

Is 2003-2007 referring to train_corefner_standoff or train_abstracts_standoff?

Within each of these directories, the files contained don't seem to have an XML format.

Not sure how to do the aforementioned step...

Is there a demo for checking your repository?

Kindly share a demo link of your project. I would like to test the summarization. Please let me know where I can get to see your project demo.
Thank you in advance.

get exceptions with running run-summarizer.sh

Hi,

I am trying to use your summarizer and refer your paper in my paper, but I got an exception as the following:

I am using mac os. I am trying to set java jni but I always got an error. There will be /usr/local/lib/jni in mac os wrote in the readme, but I can't find any folder with jni in my mac. Could you please tell me how to set jni with mac? I appreciate your help. Thank you.

Questions about NYT datasets processing

Thanks for your code sharing. Can you tell me what should I put into train_corefner_standoff and train_abstracts_standoff directory?

Alex

The joint model (COREF+NER+WIKI) of the Berkeley Entity Resolution System combines the output for all input documents (e.g. government.txt and music.txt) into a single file output.conll.
While the output produced by other models does not exactly match the test files in the Berkeley Document Summarizer (e.g. the last two columns of government.txt are off).
Would appreciate a clarification on the assumed data interface between the Berkeley Entity Resolution System and the Berkeley Document Summarizer.

gregdurrett / berkeley-doc-summarizer Goto Github PK

berkeley-doc-summarizer's People

Contributors

Stargazers

Watchers

Forkers

berkeley-doc-summarizer's Issues

Issue with formats used

Preparing the dataset

Is there a demo for checking your repository?

get exceptions with running run-summarizer.sh

Questions about NYT datasets processing

Alex

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent