yuewang-cuhk / takg Goto Github PK

View Code? Open in Web Editor NEW

153.0 4.0 33.0 78.91 MB

The official implementation of ACL 2019 paper "Topic-Aware Neural Keyphrase Generation for Social Media Language"

Home Page: https://www.aclweb.org/anthology/P19-1240

License: MIT License

Python 100.00%

nlp social-media topic-modeling keyphrase-generation

takg's People

Contributors

Stargazers

Watchers

takg's Issues

how to get model_path

Hello,i am a raw hand, i don't know how to get "the warmed up ntm model path" in
python train.py -data_tag Weibo_s100_t10 -copy_attention -use_topic_represent -load_pretrain_ntm -joint_train -topic_attn -check_pt_ntm_model_path [the warmed up ntm model path]' , and "seq2seq model path" and "ntm model path" in python predict.py -model [seq2seq model path] (-ntm_model [ntm model path]).

training in One2many mode

Hi,
I see we can train in One2one and evaluate in one2many mode.
How can I train in one2many mode? It seems lots of codes need to be changed for this purpose.

Thanks

稀疏问题咨询

大佬好，有一个疑问，我们的模型中x_bow输入，长度对应整个词表，会很稀疏。这一块就是直接输入的吗？有没有tricks处理这一块呢？稀疏的影响大不？

Jointly train error?

when I run the python train.py -data_tag Weibo_s100_t10 -copy_attention -use_topic_represent -load_pretrain_ntm -joint_train -topic_attn -check_pt_ntm_model_path [my ntm model path],come to the error RuntimeError: Invalid index in scatterAdd at /tmp/pip-req-build-4baxydiv/aten/src/TH/generic/THTensorEvenMoreMath.cpp:721.the 'my ntm model path' Is the ntm model path I ran in advance？

OSError: [WinError 126]

Excuse me，It took me a long time to solve this problem。Is it the version of the library？Do you have the required documentation？

How to get theta and phi?

thanks for your code.And could you please tell me how to get the θ(topic-document) and the φ(topic-words)?(only for topic model)

twitter dataset

How can i get the dataset of twitter？

训练集的标注请教

非常感谢你们能提供一个这么棒的模型!非常抱歉做打扰了,我们想对google搜索得到的网页进行一个关键短语的提取,想请教一下微博的训练集,其中的关键短语是怎么标注得到的,能否告知一下你们的标注流程或者关键短语获取方法,谢谢!

the ntm model loss is very large

hi,
thanks for your sharing. the code is high-quality and clear. But when I attempt to use your model with my own dataset, I find the ntm_loss is very large. I don't know why, can you give me some help.
the bow_dictionary_vocab_size is set to 10000.

09/14/2021 11:22:43 [INFO] train_mixture: ====>Train epoch: 109 Average loss: 6429827.4577
09/14/2021 11:22:43 [INFO] train_mixture: Overall sparsity = 0.999, l1 strength = 81.71844
09/14/2021 11:22:43 [INFO] train_mixture: Target sparsity = 0.850
09/14/2021 11:22:44 [INFO] train_mixture: ====> Test epoch: 109 Average loss: 6389828.5666
09/14/2021 11:22:51 [INFO] train_mixture: ====>Train epoch: 110 Average loss: 9138395.1334
09/14/2021 11:22:51 [INFO] train_mixture: Overall sparsity = 0.999, l1 strength = 73.68993
09/14/2021 11:22:51 [INFO] train_mixture: Target sparsity = 0.850
09/14/2021 11:22:52 [INFO] train_mixture: ====> Test epoch: 110 Average loss: 7285204.7357
09/14/2021 11:22:52 [INFO] train_mixture:
09/14/2021 11:23:01 [INFO] train_mixture: ====>Train epoch: 111 Average loss: 4194444.3072
09/14/2021 11:23:01 [INFO] train_mixture: Overall sparsity = 0.999, l1 strength = 66.44983
09/14/2021 11:23:01 [INFO] train_mixture: Target sparsity = 0.850
09/14/2021 11:23:01 [INFO] train_mixture: ====> Test epoch: 111 Average loss: 15234667.0275

neural topic model

Hello, when I only train the neural topic model (100 epochs), I cannot determine the original text corresponding to the generated topwords (eg topwords_e10.txt)? Where are they? Thank you

OSError: [WinError 126]

Excuse me，It took me a long time to solve this problem。Is it the version of the library？Do you have a requirements.txt
Traceback (most recent call last):
File "C:/study/TAKG-master/preprocess.py", line 6, in
import gensim
File "C:\Users\yuqiang\AppData\Roaming\Python\Python36\site-packages\gensim_init_.py", line 5, in
from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401
File "C:\Users\yuqiang\AppData\Roaming\Python\Python36\site-packages\gensim\parsing_init_.py", line 4, in
from .preprocessing import (remove_stopwords, strip_punctuation, strip_punctuation2, # noqa:F401
File "C:\Users\yuqiang\AppData\Roaming\Python\Python36\site-packages\gensim\parsing\preprocessing.py", line 42, in
from gensim import utils
File "C:\Users\yuqiang\AppData\Roaming\Python\Python36\site-packages\gensim\utils.py", line 40, in
import scipy.sparse
File "C:\APP\Python\lib\site-packages\scipy_init_.py", line 104, in
from . import distributor_init
File "C:\APP\Python\lib\site-packages\scipy_distributor_init.py", line 61, in
WinDLL(os.path.abspath(filename))
File "C:\APP\Python\lib\ctypes_init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。

Twitter dataset

how can i get the dataset of twitter？

Error in pred_evaluate

When I use pred_evaluate.py to evaluate a prediction ,there is something wrong.
The code：python pred_evaluate.py -pred pred\predict__Weibo_s100_t10.copy.seed9527.emb150.vs50000.dec300.20200517-170932__e4.val_loss=5.464.model-0h-03m/predictions.txt -src data/Weibo/test_src.txt -trg data/Weibo/test_trg.txt
The error：UnicodeDecodeError: 'gbk' codec can't decode byte 0xb4 in position 33: illegal multibyte sequence
What should I do to resolve it ?

how to predict model

Hi，when i predict model use this config：
python predict.py -model ./model/Weibo_s100_t10.copy.seed9527.emb150.vs50000.dec300.20191111-204118/e4.val_loss=1.432.model-0h-04m
error：
assert opt.model.count('/') == 2 and all([tag in opt.model for tag in ['vs', 'emb', 'dec', 'model']])
Should i add some configs？

yuewang-cuhk / takg Goto Github PK

takg's People

Contributors

Stargazers

Watchers

Forkers

takg's Issues

Recommend Projects

Recommend Topics

Recommend Org