First of all, very great and easy code, written in a extremely simple way. Just a

Question? about inltk HOT 1 CLOSED

oltip commented on August 17, 2024

Question?

from inltk.

Comments (1)

goru001 commented on August 17, 2024

Thank you for the appreciation.
Language Models have been trained on their respective Wikipedia corpus - which is available for download in Language specific repositories linked in README. The embeddings that iNLTK provides are essentially the weights of first layer in encoder. Since for classification, the LM has been fine tuned over classification dataset, It didn't make sense to me use those embeddings because they would be biased towards the classification dataset, whereas Wikipedia Dataset, represents, kind of, the whole universe. This was the thinking behind choosing LM for embeddings.

As I explained, because embeddings are essentially weights of the first layer, I expect BERT LM to do slightly better than ULMFiT, and hence embeddings might be slightly better, But I'm not aware of any techniques by which we can quantitatively compare embeddings. Currently we only do this qualitatively by visualizing in 3 dimensions.

Hope this answers your question. I'll close this issue, but feel free to reopen if there's anything!

from inltk.

Question? about inltk HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent