lixin4ever / e2e-tbsa Goto Github PK

View Code? Open in Web Editor NEW

269.0 14.0 61.0 2.26 MB

[AAAI 2019] A Unified Model for Opinion Target Extraction and Target Sentiment Prediction

Home Page: https://arxiv.org/abs/1811.05082

Python 100.00%

aspect-based-sentiment-analysis aspect-term-extraction aspect-sentiment-classification opinion-target-extraction

e2e-tbsa's People

Contributors

Stargazers

Watchers

e2e-tbsa's Issues

cannot find this dataset

No such file or directory: '/projdata9/info_fil/lixin/Research/OTE/embeddings/glove_840B_300d.txt'

Reported results-Original paper

Hi,

Your model returns two results, one for ote and the other for ts. However, in your paper (https://arxiv.org/pdf/1811.05082.pdf) u reported a global result without specifying the task (e.g, for laptop dataset 61.27 54.89 57.90 for precision, recall. and F1-score, respectively). My question is: are these results selected from ote or ts ones in ur code?

Thank u

请问数据集中标点符号分不分词有影响吗？

数据集中标点符合好像都是紧跟着上一个word，没有用空格分隔，但是在标记的时候是单独标记的，请问这样有影响吗?

NotImplementedError when running

Hi @lixin4ever I am getting the following error.
Can you please tell me what should I do in this case

Prediction values

Hi,

Thank u for sharing ur code with us. However, could u plz tell me what does OTE and TS mean in the running results?

Exceed: test performance: ote: f1: 0.6512, ts: precision: 0.6156, recall: 0.4958, micro-f1: 0.5492

Does OTE mean the results of the aspect target extraction, and TS means the results of the aspect sentiment classification?
So in the example above, the model got an F1-score of 65.12% for OTE task and F1-score of 54.92% for aspect sentiment classification?

Thanks in advance

关于输出的问题想请教一下

我运行了程序
作为输出的txt文档里有四列结果 ote_tag, ote_tag_gold, ts_tag, ts_tag_gold
能否请您简要介绍一下这些输出，以及应该选择哪个作为最终结果
谢谢！

why the word like xxxAspectxxx in sentence?

Hi, lixin, thanks for your great work. I have a doubt about the dataset, For example, In laptop14_test.text:

In fact I still use manyLegacy programs (Appleworks, FileMaker Pro, Quicken, Photoshop etc)!####In=O fact=O I=O still=O use=O manyASPECT0=O Appleworks=T-NEU ,=O FileMaker=T-NEU Pro=T-NEU ,=O Quicken=T-NEU ,=O Photoshop=T-NEU etc=O !=O

In laptop14_train.text:

With the macbook pro it comes with freesecuritysoftware to protect it from viruses and other intrusive things from downloads and internet surfing or emails.####With=O the=O macbook=O pro=O it=O comes=O with=O freeASPECT0=O to=O protect=O it=O from=O viruses=O and=O other=O intrusive=O things=O from=O downloads=O and=O internet=O surfing=O or=O emails=O .=O

The Apple applications (ex.iPhoto) are fun, easy, and really cool to use (unlike the competition)!####The=O Apple=T-POS applications=T-POS exASPECT1=O are=O fun=O ,=O easy=O ,=O and=O really=O cool=O to=O use=O unlike=O the=O competition=O !=O .=O

Why 'manyLegacy', 'freeASPECT0' and 'ex.iPhoto' become 'manyASPECT0, 'freeASPECT0' and 'exASPECT1'？ This happens many times in Laptop and Restaurant datasets

关于twitter数据集的问题

您好，我在MitchellETAL等人在2013 EMNLP发表的那篇论文下载到了twitter数据集，但是发现twitter数据集共有3288个标有Person或Organization的aspect。但是您这篇论文的aspect共有3199个。我看了一下您的数据集，您没有用BIO标记，您是用T代替BIO标记。所以想请问，您论文中aspect比原论文中的aspect总数少是否有一部分原因是标记的问题呢？或者说还有没有其他原因呢？

Error while running main.py

Hi @lixin4ever I am getting the following error.
Can you please tell me what should I do in this case

训练好的模型在修改后的测试集上运行报错

您好，我在laptop14数据集上训练好了一个模型，想要看看在我自己的数据集上的效果。我根据测试集的格式标注了几句，用训练好的模型进行测试会报如下错误

我感觉这个不是我数据的问题，因此我将原有测试集删除了几行运行，同样也报错了。请问测试集的数量是需要设置的吗？
后面我想到了可以替换，于是我将原测试集后6句替换为自己标注的数据，运行后报错如下：

请问这是什么原因导致的啊？

Can this be used to extract aspect and predict sentiment for new data

Hi @lixin4ever, Thanks for this amazing repo.
Can this be used to extract aspect and its sentiment of a new raw text(in laptop and Restaurant domain itself)?
If so please let me know how.
Thanks !

error while using a trained model on different datasets

I am facing the following error while calculating predictions on the laptop14 dataset by using a model trained on rest_total dataset. Why is the model parameter loading process dependent on the size of training corpus vocabulary?

File "_dynet.pyx", line 1461, in _dynet.ParameterCollection.populate File "_dynet.pyx", line 1516, in _dynet.ParameterCollection.populate_from_textfile RuntimeError: Dimensions of lookup parameter /_0/_0 lookup up from file ({300,6465}) do not match parameters to be populated ({300,4738})

mismatch in default argument for tagging_scheme

the default value for the tagging_scheme argument is 'BIO', which fails an assertion in model.py line 212.

I cannot find these datasets ; please give me some info about it

'yelp_rest1': '/projdata9/info_fil/lixin/Research/yelp/yelp_vec_200_2_win5_sent.txt',
'yelp_rest2': '/projdata9/info_fil/lixin/Research/yelp/yelp_vec_200_2_new.txt', 'amazon_laptop':'/projdata9/info_fil/lixin/Resources/amazon_full/vectors/amazon_laptop_vec_200_5.txt'

No such file or directory: './raw_data/laptop14_train.xml'

您好，我想请问一下在运行process_data这个文件的时候，FileNotFoundError: [Errno 2] No such file or directory: './raw_data/laptop14_train.xml'出现了这样的错误，
代码部分是这样的
def extract_text(dataset_name):
"""
extract textual information from the xml file
:param dataset_name: dataset name
"""
delset = string.punctuation
fpath = './raw_data/%s.xml' % dataset_name
print("Process %s..." % fpath)
数据集不都是txt格式的吗'./raw_data/%s.xml'是不是应该将路径改为data下的txt格式文件

执行main.py报错:NotImplementedError

In Epoch 1 / 50 (current lr: 0.0010):
Traceback (most recent call last):
File "main.py", line 257, in
final_res_string, model_path = run(dataset=[train, val, test], model=model, params=args)
File "main.py", line 51, in run
loss, pred_ote_labels, pred_ts_labels = model.forward(x=train_set[i], is_train=True)
File "/data/E2E-TBSA/model.py", line 302, in forward
stm_lm_hs = [self.stm_lm(h) for h in ote_hs]
File "/data/E2E-TBSA/model.py", line 302, in
stm_lm_hs = [self.stm_lm(h) for h in ote_hs]
File "/data/E2E-TBSA/model.py", line 139, in call
Wx = self._W * x
File "_dynet.pyx", line 1859, in _dynet.Expression.mul
NotImplementedError

谢谢

关于运行main.py后的NotImplementedError问题

tp/ hit 计算的问题

def match_ot(gold_ote_sequence, pred_ote_sequence):
    n_hit = 0
    for t in pred_ote_sequence:
        if t in gold_ote_sequence:
            n_hit += 1
    return n_hit


def match_ts(gold_ts_sequence, pred_ts_sequence):
 
    # positive, negative and neutral
    tag2tagid = {'POS': 0, 'NEG': 1, 'NEU': 2}
    hit_count, gold_count, pred_count = np.zeros(3), np.zeros(3), np.zeros(3)
    for t in gold_ts_sequence:
        #print(t)
        ts_tag = t[2]
        tid = tag2tagid[ts_tag]
        gold_count[tid] += 1
    for t in pred_ts_sequence:
        ts_tag = t[2]
        tid = tag2tagid[ts_tag]
        if t in gold_ts_sequence:
            hit_count[tid] += 1
        pred_count[tid] += 1
    return hit_count, gold_count, pred_count

这里使用if in是否没有考虑tag的位置对应问题。

lixin4ever / e2e-tbsa Goto Github PK

e2e-tbsa's People

Contributors

Stargazers

Watchers

Forkers

e2e-tbsa's Issues

Recommend Projects

Recommend Topics

Recommend Org