Giter VIP home page Giter VIP logo

Comments (8)

heyLinsir avatar heyLinsir commented on June 14, 2024

没有在英语上做类似的实验,因为其他语言没有类似HowNet的数据库。在加了sememe之后training速度会降低很多,因为相比于skipgram来说,复杂度大了很多。

生成word-sense-sememe对应关系的脚本在data process目录下,sememe文件只需要在一行中列出来所有的sememe即可,vocabulary只需要和word-sense-sememe文件的词序一致即可,请问在这种情况下还会出现问题吗?我没有保存这些数据处理文件,如果需要的话我再写一份。

from se-wrl.

guoyinwang avatar guoyinwang commented on June 14, 2024

那个提取Hornet的 file我用python 3 跑了一下, 会报错。

处理数据的文件还是帮我写一份吧,因为发现差个换行符什么的就会让程序报错,所以有个模版我好比较一下我哪里处理的有问题。

from se-wrl.

MingleiLI avatar MingleiLI commented on June 14, 2024

碰到同样的问题,换个vocab或training corpus就会报错,比如segmentation fault。另外,如果把训练模型的参数由-read-vocab改为-save-vocab也会报错。而在原始的Wordvec代码上跑就没有问题。

from se-wrl.

MingleiLI avatar MingleiLI commented on June 14, 2024

另外,对数据文件的编码格式有没有什么要求?

from se-wrl.

MingleiLI avatar MingleiLI commented on June 14, 2024

通过看代码发现,程序要默认vocab中的词和Word_Sense_Sememe_File中的词一样。否则,在初始化syn0的时候有的词可能没有初始化,导致后边读出的数值是随机数。

from se-wrl.

MingleiLI avatar MingleiLI commented on June 14, 2024

SAT.c中
if (sentence_length == 0) {
while (1) {
word = ReadWordIndex(fi);
if (feof(fi)) { // Read all the words until the end of file???? --lml
break;
}
if (word == -1) { // word not in vocab.
continue;
}
word_count++;
if (sample > 0) {
real ran = (sqrt(vocab[word].cn / (sample * train_words)) + 1) * (sample * train_words) / vocab[word].cn;
next_random = next_random * (unsigned long long)25214903917 + 11;
if (ran < (next_random & 0xFFFF) / (real)65536) continue;
}
sen[sentence_length] = word;
sentence_length++;
if (sentence_length >= MAX_SENTENCE_LENGTH) break;
}
sentence_position = 0;
}
这里是读取文件直到达到MAX_SENTENCE_LENGTH吗?这样读进来的不一定都是在一个句子中吧。Wordvec.c中有if (word == 0) break;这句,请问sat.c中没有这句的考虑是什么呢?

from se-wrl.

heyLinsir avatar heyLinsir commented on June 14, 2024

我们最近会放一个新版的出来,会比现在的版本在可读性和应用性上好很多,正在做验证实验。做完了会在我们组的GitHub上公布出来,并且在这个版本的README中附上链接。

from se-wrl.

MingleiLI avatar MingleiLI commented on June 14, 2024

另外,你们用的gcc的版本是哪个,很多错误有没有可能是编译器版本不一样导致的?

from se-wrl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.