Giter VIP home page Giter VIP logo

gregdurrett / berkeley-doc-summarizer Goto Github PK

View Code? Open in Web Editor NEW
742.0 742.0 64.0 17.11 MB

The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploits syntactic information to compress it, and uses coreference constraints to ensure clarity.

License: GNU General Public License v3.0

Scala 65.34% Perl 21.20% Shell 0.43% Java 13.03%

berkeley-doc-summarizer's People

Contributors

gregdurrett avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

berkeley-doc-summarizer's Issues

Issue with formats used

Hi @gregdurrett

I am currently using the Entity Preprocessing Driver main method to turn my regular .txt files into the (Conll?) format understood by this summarizer however I am getting issues at the moment with the ConllReader class used in the Summarizer class unable to parse some of the generated lines (in the assembleConstTree method because some lines appear to be missing a "*")

Would you be able to shed more light on the Conll format that the summarizer is expecting?

Thanks,
Harry

Preparing the dataset

Your instructions mention:

To prepare the dataset, first you need to extract all the XML files from 2003-2007 and flatten them into a single directory

Is 2003-2007 referring to train_corefner_standoff or train_abstracts_standoff?

Within each of these directories, the files contained don't seem to have an XML format.

Not sure how to do the aforementioned step...

get exceptions with running run-summarizer.sh

Hi,

I am trying to use your summarizer and refer your paper in my paper, but I got an exception as the following:
screen shot 2017-04-03 at 11 54 50 pm
I am using mac os. I am trying to set java jni but I always got an error. There will be /usr/local/lib/jni in mac os wrote in the readme, but I can't find any folder with jni in my mac. Could you please tell me how to set jni with mac? I appreciate your help. Thank you.

Alex

The joint model (COREF+NER+WIKI) of the Berkeley Entity Resolution System combines the output for all input documents (e.g. government.txt and music.txt) into a single file output.conll.
While the output produced by other models does not exactly match the test files in the Berkeley Document Summarizer (e.g. the last two columns of government.txt are off).
Would appreciate a clarification on the assumed data interface between the Berkeley Entity Resolution System and the Berkeley Document Summarizer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.