Comments (3)
Hi rsimd, thank you for reaching out!
Your advice was very helpful! Reviewing the code I found some errors with both metrics.
To make the code easier to read and fix the problems I rewrote both of the metrics following in detail the formulas of the reference paper:
https://doi.org/10.18653/v1/d18-1096
To test the code you can clone the repo and execute: python setup.py install
Let me know if it works now!
from octis.
Thank you for fixing it.
Since I don't have time to do the experiments again that I did when I created the figure shown above, I will show the values of WECoherencePairwise
and WECoherenceCentroid
during training on a single model.
In the output shown below, wetc_c
means WECoherencePairwise
and wetc_pw
means WECoherenceCentroid
. I think these are roughly the expected scores.
Run RecurrentStickBreakingModel num_topics=10, embed_dim=300
epoch td train_loss train_ppl valid_loss valid_ppl wetc_c wetc_pw dur
------- ------ ------------ -------------- ------------ ------------- -------- --------- ------
1 0.9600 73.9698 320736165.8069 68.5748 75777356.2360 0.6372 0.4447 0.5634
2 0.9600 63.6994 43644642.7318 52.8878 12319750.3907 0.6233 0.4431 0.5034
3 0.9700 52.4110 12532657.1333 42.7649 4528677.4107 0.6238 0.4450 0.5192
4 0.9700 42.5961 4518471.2740 35.0613 1901227.4254 0.6257 0.4432 0.5077
5 0.9700 35.2624 1956250.7926 29.6098 993814.0761 0.6372 0.4486 0.5189
6 0.9700 29.7505 971985.6298 25.0100 529332.6143 0.6415 0.4428 0.5199
7 0.9700 25.6807 527041.6114 21.7965 332640.1859 0.6406 0.4484 0.5142
8 0.9800 21.7055 300979.1405 17.8883 191409.8175 0.6378 0.4453 0.5131
9 0.9800 19.0901 189746.0816 16.8559 128860.5507 0.6419 0.4464 0.5159
10 0.9800 16.9686 117649.4644 13.5188 79297.2655 0.6484 0.4468 0.5327
11 0.9800 13.6051 76754.6987 10.2547 53539.9720 0.6469 0.4449 0.5236
12 0.9700 10.4957 51914.8373 7.6733 37806.9141 0.6395 0.4488 0.5249
13 0.9700 7.8220 36336.7206 5.2881 27193.4501 0.6402 0.4453 0.5001
14 0.9600 5.7267 26571.4181 3.5589 20158.4579 0.6469 0.4491 0.5000
15 0.9600 3.7754 19824.1490 1.6373 15315.5823 0.6482 0.4502 0.4965
16 0.9700 1.8117 15046.2791 -0.4119 11771.6308 0.6714 0.4546 0.6522
17 0.9800 -0.0840 11651.8547 0.9896 11116.5698 0.6808 0.4586 0.5274
18 0.9800 -1.9134 9199.4690 -2.5322 7608.3783 0.6812 0.4589 0.5376
19 0.9800 -3.5271 7390.6316 -5.3993 5911.0744 0.6841 0.4573 0.5124
20 0.9900 -5.0035 6062.4598 -6.8578 4966.2950 0.6846 0.4595 0.5195
21 1.0000 -6.1741 5071.2266 -7.8498 4222.3710 0.6839 0.4607 0.5133
22 1.0000 -7.3060 4278.6677 -9.2016 3589.5463 0.6956 0.4570 0.5238
23 1.0000 -8.6377 3651.7721 -10.4976 3053.1089 0.6804 0.4590 0.5193
24 1.0000 -9.9328 3156.5071 -11.4241 2732.4987 0.6938 0.4623 0.5381
25 1.0000 -10.9590 2768.7093 -12.5331 2381.2803 0.6950 0.4638 0.5272
26 1.0000 -11.8022 2445.4564 -13.3054 2106.8487 0.6919 0.4634 0.5341
27 1.0000 -12.6172 2195.8951 -13.7952 1943.4262 0.6913 0.4596 0.5179
28 1.0000 -13.2920 1984.6121 -14.4775 1773.9326 0.7012 0.4627 0.5317
29 1.0000 -13.8840 1820.3619 -15.2518 1593.0502 0.6997 0.4611 0.5196
30 1.0000 -14.5299 1673.6254 -15.6372 1497.4241 0.6988 0.4591 0.5276
31 1.0000 -15.1335 1556.9596 -16.4473 1381.1824 0.6988 0.4591 0.5255
32 1.0000 -15.6565 1451.4193 -16.8722 1308.9742 0.6968 0.4615 0.5157
33 1.0000 -16.0427 1378.1303 -17.2128 1228.9282 0.6992 0.4619 0.5229
34 1.0000 -16.3261 1305.7176 -17.2658 1193.9426 0.6960 0.4582 0.5010
35 1.0000 -16.6525 1243.8213 -17.7837 1104.6458 0.6982 0.4548 0.5249
36 1.0000 -16.8720 1195.2415 -17.7987 1078.8660 0.6994 0.4562 0.5151
37 1.0000 -17.1660 1147.1929 -17.8148 1083.4223 0.7003 0.4583 0.5334
38 1.0000 -17.4919 1113.5443 -18.3335 1044.0485 0.7005 0.4584 0.5047
39 1.0000 -17.7039 1079.5254 -17.9807 1010.1541 0.7005 0.4560 0.5252
40 1.0000 -17.8705 1050.6049 -18.1552 987.7849 0.7015 0.4571 0.5175
41 1.0000 -18.0920 1023.6774 -18.9372 947.5119 0.7015 0.4571 0.5163
42 1.0000 -18.2971 1005.2088 -19.2902 935.5594 0.7142 0.4637 0.5064
43 1.0000 -18.4749 989.1840 -19.4979 899.5156 0.7150 0.4652 0.5139
44 1.0000 -18.6154 970.7104 -19.5591 888.7733 0.7144 0.4650 0.5148
45 1.0000 -18.7052 957.0750 -19.4846 890.6475 0.7112 0.4629 0.5117
46 1.0000 -18.8325 944.8661 -19.8619 875.3139 0.7107 0.4636 0.5284
47 1.0000 -18.9852 933.2427 -20.0767 859.9098 0.7124 0.4664 0.5268
48 1.0000 -19.0465 926.8892 -16.0793 1348.9520 0.6984 0.4589 0.5065
49 1.0000 -19.0715 925.1578 -19.7939 868.3454 0.6971 0.4616 0.5204
50 1.0000 -19.1469 914.9127 -19.9731 849.2092 0.6967 0.4614 0.5154
However, here we use glove.42B.300d (collected from https://nlp.stanford.edu/projects/glove/) to calculate the score.
from octis.
Hi, thank you for testing. The computation of the score should be independent of the used word representation technique.
In case you find any problems, please do not hesitate to reach out again!
Pietro
from octis.
Related Issues (20)
- Docker image failed with OCTIS in requirement HOT 7
- problems partitioning custom dataset
- Dependency incompatibility HOT 2
- AttributeError: module 'numpy' has no attribute 'int'. HOT 1
- Input contains NaN, infinity or a value too large for ('float64') HOT 2
- Cannot install OCTIS HOT 4
- Attribute Error HOT 1
- OCTIS install error
- cy
- OCTIS install fails due to gensim version HOT 3
- Preprocessing custom dataset without removing punctuation HOT 1
- How do I handle this error
- Python 3.12.1 pip Installation Error HOT 3
- Can I get the original dataset?
- Error calculating coherence score for BERTopic model trained on Indic language HOT 1
- doc2bow error when running lda optimizer described in your docs HOT 1
- Memory issue with optimizer
- Installation error HOT 3
- The `python` and `scipy` version-compatibility, and KLDivergence() needs attention!
- AttributeError: 'list' object has no attribute 'lower' preprocessor.preprocess_dataset when num_processes != None HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from octis.