Hi, I want to run this parser for only connective extraction, I do not want to tra

Error running about discopy HOT 3 CLOSED

rknaebel commented on August 18, 2024

Error running

from discopy.

Comments (3)

rknaebel commented on August 18, 2024

You cannot, unfortunately, select individual components of a stored pipeline easily. However, you could adjust the config.json file of the corresponding model to include only the connective component (just remove the other entries that follow).

For data preparation, I recommend using discopy-data in combination with the parser as follows (I guess the interface changed over the last days):

traditional feature-based parser: discopy-tokenize -i examples/wsj_0336 | discopy-add-parses -c | discopy-parse lin models/lin
neural parser: discopy-tokenize --tokenize-only -i examples/wsj_0336 | discopy-nn-parse bert-base-cased models/pipeline-bert-2

You could also first process your documents, store them in a single file and then parse this file by either parser pipeline:

discopy-tokenize -i examples/wsj_0336 | discopy-add-parses -c > dataset.json
discopy-tokenize -i examples/wsj_034X | discopy-add-parses -c >> dataset.json
discopy-nn-parse bert-base-cased models/pipeline-bert-2 -i dataset.json

Note: discopy-tokenize splits text files into multiple documents by three or more consecutive newlines (r'\n\n\n+')

from discopy.

anamm1025 commented on August 18, 2024

You cannot, unfortunately, select individual components of a stored pipeline easily. However, you could adjust the config.json file of the corresponding model to include only the connective component (just remove the other entries that follow).

For data preparation, I recommend using discopy-data in combination with the parser as follows (I guess the interface changed over the last days):

traditional feature-based parser: discopy-tokenize -i examples/wsj_0336 | discopy-add-parses -c | discopy-parse lin models/lin

neural parser: discopy-tokenize --tokenize-only -i examples/wsj_0336 | discopy-nn-parse bert-base-cased models/pipeline-bert-2

You could also first process your documents, store them in a single file and then parse this file by either parser pipeline:

discopy-tokenize -i examples/wsj_0336 | discopy-add-parses -c > dataset.json

discopy-tokenize -i examples/wsj_034X | discopy-add-parses -c >> dataset.json

discopy-nn-parse bert-base-cased models/pipeline-bert-2 -i dataset.json

Note: discopy-tokenize splits text files into multiple documents by three or more consecutive newlines (r'\n\n\n+')

Thanks for your response. I have been able to run first two steps successfully. However, this repository doesn't contain 'models' folder, as well as config files. Can you please upload those too?

from discopy.

rknaebel commented on August 18, 2024

To not blow up the repository with models' histories, I put them under releases:
https://github.com/rknaebel/discopy/releases

from discopy.

Error running about discopy HOT 3 CLOSED

Comments (3)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent