Giter VIP home page Giter VIP logo

discopy's People

Contributors

rknaebel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

discopy's Issues

Extend implicit relation classification

Based on the provided implicit sense classifier, add more features such as dependency rules and word pairs for better results.
Also, it might be useful to select informative features from all extracted features (as shown in Lin et al.)

train data json explanation

Hi! Thank you so much for this shallow discourse parser I would like to adapt it to the Ukrainian language. It would be very helpful if you give some step-by-step recommendations on how to create a JSON file, how to conduct the BERT training, and so on. It will be very helpful!
Thank you in advance!

Run on GPU

Hi Rene,

Can I run this on GPU? CUDA_VISIBLE_DEVICES are set to empty in Dockerfile.

One more question, can I parse only explicit connectives?

alternative for connective head mapper

This function comes from the CoNLL task. They provided a mapping from arbitrary connectives in the PDTB to their heads. This limits the application to PDTB samples and should be opened with a more general approach.

Convert PDTB data into conll

Data Conversion and Split Inquiry

Hello @RKnaeble,

I'm reaching out to seek guidance on a couple of points related to the pdtb2 dataset:

  1. Conversion to CoNLL Format: Could you provide insights or steps on how to convert the pdtb2 data into the CoNLL format?

  2. Data Splits: I'm interested in understanding the train, test, and dev splits you've chosen for this dataset.

For context, the pdtb2 data structure is as follows:

  • Folders labeled from 0 to 23.
  • Each folder contains multiple files with the naming convention wsj_XXX.pdtb.
    and I have pdtb2.csv formate as well
    Additionally, we have the pdtb3 dataset structured with both gold and raw data categories.

I'd appreciate any guidance or references you can provide on these topics.

Thank you for your time and assistance.

regards
Muhammed

Format of input data

Hi!

I would like to run your parser, but from the instructions it is not really clear what the input format is supposed to be (besides the fact that it should be json). How should I transform the PDTB data to make it work with your parser?

Thank you!

Installation problem

Installation problem with the new version 1.1.0
There is a long installation of dependencies installation, which doesn't give any results.
ERROR: Exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/base_command.py", line 180, in _main
status = self.run(options, args)
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/req_command.py", line 199, in wrapper
return func(self, options, args)
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/commands/install.py", line 319, in run
reqs, check_supported_wheels=not options.target_dir
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 128, in resolve
requirements, max_rounds=try_to_avoid_resolution_too_deep
File "/usr/local/lib/python3.7/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 473, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "/usr/local/lib/python3.7/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 384, in resolve
raise ResolutionTooDeep(max_rounds)
pip._vendor.resolvelib.resolvers.ResolutionTooDeep: 2000000

prediction doesn't work

There is an error
/bin/bash: discopy-predict: command not found

Output of the command
!compgen -ac | grep discopy
is:
discopy-nn-predict
discopy-add-parses
discopy-eval
discopy-parse
discopy-train
discopy-extract
discopy-nn-parse
discopy-add-annotations
discopy-tokenize
discopy-nn-train

There is no discopy-predict

training data

Hi,
can you please share the training data I need them for an education project,

Convert PDTB2 Data to JSON Format for Discourse Parser Training

Title:

Convert PDTB2 Data to JSON Format for Discourse Parser Training

Body:

Issue Description:
Hello Rknaebel,

I am working on adapting a discourse parser to work with the Penn Discourse Treebank version 2 (PDTB2) dataset and require assistance in converting the PDTB data into the specific JSON format used by your discourse parser.

Specific Needs:

  1. Conversion of PDTB2 Data: I have the PDTB2 dataset in CSV format (pdtb2.csv) as well as the WSJ texts and golden files. I need to convert these into en.train, en.dev, en.test, parses.json and relations.json files. Could you provide guidance on the conversion process? Moreover which splits have you used in train, dev, test

  2. Format Specifications: What are the specific format requirements for each of these files? For example, what should be the structure and headings in the relations.json file?

  3. Example Code or Scripts: If you have any example scripts or code snippets that could aid in this conversion process, it would be greatly beneficial.

Attempted Solutions:

However, I am still facing challenges in adapting these resources to the specific needs of the PDTB2 dataset.

Any Assistance Would Be Highly Appreciated:
Your expertise in this area would be immensely helpful for correctly formatting the PDTB2 data for use with the discourse parser.

Thank you for your time and consideration.

Best regards,

Error running

Hi,
I want to run this parser for only connective extraction, I do not want to train the model.
The following is the error I'm getting. Am I doing something wrong? Any help will be appreciated.
image

unsatisfying results on argument extraction

The current implementation of the argument extraction uses the inner node with the highest class probability for its prediction. Maybe there is something wrong with the implementation, e.g. features, prediction selection, etc.?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.