rknaebel / discopy Goto Github PK

View Code? Open in Web Editor NEW

20.0 5.0 4.0 359 KB

End-to-end shallow discourse parser

License: MIT License

Python 98.83% Dockerfile 1.17%

nlp discourse-parsing machine-learning pdtb discourse-analysis shallow-discourse-parser

discopy's People

Contributors

Stargazers

Watchers

Forkers

kai-wren ajeytiwary muhammed-saeed harel-coffee

discopy's Issues

Helpful README with usage examples

Add support for general text files

Extend implicit relation classification

Based on the provided implicit sense classifier, add more features such as dependency rules and word pairs for better results.
Also, it might be useful to select informative features from all extracted features (as shown in Lin et al.)

train data json explanation

Hi! Thank you so much for this shallow discourse parser I would like to adapt it to the Ukrainian language. It would be very helpful if you give some step-by-step recommendations on how to create a JSON file, how to conduct the BERT training, and so on. It will be very helpful!
Thank you in advance!

CRF based Argument extraction

add CoNLL exact evaluation functions

exact matches
based on the implementation in the neural-discourse project

Run on GPU

Hi Rene,

Can I run this on GPU? CUDA_VISIBLE_DEVICES are set to empty in Dockerfile.

One more question, can I parse only explicit connectives?

Add Dimlex support for Connective Classifier

Add different levels of sense classification

alternative for connective head mapper

This function comes from the CoNLL task. They provided a mapping from arbitrary connectives in the PDTB to their heads. This limits the application to PDTB samples and should be opened with a more general approach.

Convert PDTB data into conll

Data Conversion and Split Inquiry

Hello @RKnaeble,

I'm reaching out to seek guidance on a couple of points related to the pdtb2 dataset:

Conversion to CoNLL Format: Could you provide insights or steps on how to convert the pdtb2 data into the CoNLL format?
Data Splits: I'm interested in understanding the train, test, and dev splits you've chosen for this dataset.

For context, the pdtb2 data structure is as follows:

Folders labeled from 0 to 23.
Each folder contains multiple files with the naming convention wsj_XXX.pdtb.
and I have pdtb2.csv formate as well
Additionally, we have the pdtb3 dataset structured with both gold and raw data categories.

I'd appreciate any guidance or references you can provide on these topics.

Thank you for your time and assistance.

regards
Muhammed

Format of input data

Hi!

I would like to run your parser, but from the instructions it is not really clear what the input format is supposed to be (besides the fact that it should be json). How should I transform the PDTB data to make it work with your parser?

Thank you!

Add confidence score for relation prediction

Installation problem

Installation problem with the new version 1.1.0
There is a long installation of dependencies installation, which doesn't give any results.
ERROR: Exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/base_command.py", line 180, in _main
status = self.run(options, args)
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/req_command.py", line 199, in wrapper
return func(self, options, args)
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/commands/install.py", line 319, in run
reqs, check_supported_wheels=not options.target_dir
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 128, in resolve
requirements, max_rounds=try_to_avoid_resolution_too_deep
File "/usr/local/lib/python3.7/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 473, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "/usr/local/lib/python3.7/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 384, in resolve
raise ResolutionTooDeep(max_rounds)
pip._vendor.resolvelib.resolvers.ResolutionTooDeep: 2000000

prediction doesn't work

There is an error
/bin/bash: discopy-predict: command not found

Output of the command
!compgen -ac | grep discopy
is:
discopy-nn-predict
discopy-add-parses
discopy-eval
discopy-parse
discopy-train
discopy-extract
discopy-nn-parse
discopy-add-annotations
discopy-tokenize
discopy-nn-train

There is no discopy-predict

training data

Hi,
can you please share the training data I need them for an education project,

Convert PDTB2 Data to JSON Format for Discourse Parser Training

Title:

Convert PDTB2 Data to JSON Format for Discourse Parser Training

Body:

Issue Description:
Hello Rknaebel,

I am working on adapting a discourse parser to work with the Penn Discourse Treebank version 2 (PDTB2) dataset and require assistance in converting the PDTB data into the specific JSON format used by your discourse parser.

Specific Needs:

Conversion of PDTB2 Data: I have the PDTB2 dataset in CSV format (pdtb2.csv) as well as the WSJ texts and golden files. I need to convert these into en.train, en.dev, en.test, parses.json and relations.json files. Could you provide guidance on the conversion process? Moreover which splits have you used in train, dev, test
Format Specifications: What are the specific format requirements for each of these files? For example, what should be the structure and headings in the relations.json file?
Example Code or Scripts: If you have any example scripts or code snippets that could aid in this conversion process, it would be greatly beneficial.

Attempted Solutions:

I have explored the following resources for guidance:
- CoNLL 2016 Shared Task Tutorial
- Your discopy-data repository

However, I am still facing challenges in adapting these resources to the specific needs of the PDTB2 dataset.

Any Assistance Would Be Highly Appreciated:
Your expertise in this area would be immensely helpful for correctly formatting the PDTB2 data for use with the discourse parser.

Thank you for your time and consideration.

Best regards,

Error running

Hi,
I want to run this parser for only connective extraction, I do not want to train the model.
The following is the error I'm getting. Am I doing something wrong? Any help will be appreciated.

unsatisfying results on argument extraction

The current implementation of the argument extraction uses the inner node with the highest class probability for its prediction. Maybe there is something wrong with the implementation, e.g. features, prediction selection, etc.?