Comments (8)
没有在英语上做类似的实验,因为其他语言没有类似HowNet的数据库。在加了sememe之后training速度会降低很多,因为相比于skipgram来说,复杂度大了很多。
生成word-sense-sememe对应关系的脚本在data process目录下,sememe文件只需要在一行中列出来所有的sememe即可,vocabulary只需要和word-sense-sememe文件的词序一致即可,请问在这种情况下还会出现问题吗?我没有保存这些数据处理文件,如果需要的话我再写一份。
from se-wrl.
那个提取Hornet的 file我用python 3 跑了一下, 会报错。
处理数据的文件还是帮我写一份吧,因为发现差个换行符什么的就会让程序报错,所以有个模版我好比较一下我哪里处理的有问题。
from se-wrl.
碰到同样的问题,换个vocab或training corpus就会报错,比如segmentation fault。另外,如果把训练模型的参数由-read-vocab改为-save-vocab也会报错。而在原始的Wordvec代码上跑就没有问题。
from se-wrl.
另外,对数据文件的编码格式有没有什么要求?
from se-wrl.
通过看代码发现,程序要默认vocab中的词和Word_Sense_Sememe_File中的词一样。否则,在初始化syn0的时候有的词可能没有初始化,导致后边读出的数值是随机数。
from se-wrl.
SAT.c中
if (sentence_length == 0) {
while (1) {
word = ReadWordIndex(fi);
if (feof(fi)) { // Read all the words until the end of file???? --lml
break;
}
if (word == -1) { // word not in vocab.
continue;
}
word_count++;
if (sample > 0) {
real ran = (sqrt(vocab[word].cn / (sample * train_words)) + 1) * (sample * train_words) / vocab[word].cn;
next_random = next_random * (unsigned long long)25214903917 + 11;
if (ran < (next_random & 0xFFFF) / (real)65536) continue;
}
sen[sentence_length] = word;
sentence_length++;
if (sentence_length >= MAX_SENTENCE_LENGTH) break;
}
sentence_position = 0;
}
这里是读取文件直到达到MAX_SENTENCE_LENGTH吗?这样读进来的不一定都是在一个句子中吧。Wordvec.c中有if (word == 0) break;这句,请问sat.c中没有这句的考虑是什么呢?
from se-wrl.
我们最近会放一个新版的出来,会比现在的版本在可读性和应用性上好很多,正在做验证实验。做完了会在我们组的GitHub上公布出来,并且在这个版本的README中附上链接。
from se-wrl.
另外,你们用的gcc的版本是哪个,很多错误有没有可能是编译器版本不一样导致的?
from se-wrl.
Related Issues (20)
- vectors.bin HOT 5
- 训练和评估问题 HOT 2
- 训练结果 HOT 1
- 您好,我目前已经跑出了SAT的结果,在similarity和analogy的mean rank指标上表现都很好,但唯独在analogy的accuracy指标上与论文中的结果相差很远。我的参数设置如下:
- how to get the "HowNet.txt" HOT 2
- muti-embedding for one context word
- VocabFile problem HOT 1
- 论文打不开,提示缺少字体
- 词义消歧 HOT 3
- word_vec的作用 HOT 3
- 未能复现 SSA 结果
- 未能复现结果
- 语料库 HOT 1
- MST.c 运行时段错误 HOT 1
- 您好,关于pretrained词表规模的一点问题。 HOT 1
- 请问只有一个sense的词的sememe id怎么得到? HOT 1
- 模型应用 HOT 5
- 请问这个模型用C写的初衷是什么呢? HOT 1
- 可以公开训练好的word embedding文件吗
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from se-wrl.