Comments (6)
编码是utf8么
from dict_build.
我的其他文件有数据,就是words_sort.data为空?确定是UTF—8编码,怎么尝试都试不出来,求大神解答
from dict_build.
@ChingO22 文件方便给我一个小的语料
from dict_build.
https://pan.baidu.com/s/1pL3L15T 这是三国演义的txt文档。我用Nodepad++改过编码格式,都不得行。弱弱问下,如果这样改出来的UTF-8的文件不行,那怎么样才能符合条件呐?谢谢!
from dict_build.
└─[0] <> head words_sort.data
玄德 1813 7.820178962415189 2.231933219804196 0.23144031741479038
孔明 1689 7.851749041416057 2.11310026563853 0.32517726954958553
将军 942 4.754887502163469 5.557031866849143 0.2581047381546135
曹操 940 6.39231742277876 4.286433350117893 0.25901833234772326
二人 561 5.08746284125034 4.615099271958424 0.2655577860625179
引兵 486 5.044394119358453 5.8267604364693 0.39905700814402056
云长 443 7.607330313749611 3.0663407009774173 0.21072623362141066
蜀兵 392 5.7279204545632 4.063440037814358 0.5778421433743663
夏侯 387 8.930737337562887 3.0015972148199035 0.18920373027259685
如此 385 5.459431618637297 3.405855803459433 0.2234513274336283
...
好好研究吧
from dict_build.
哇得一声哭出来😭…人品呐。
from dict_build.
Related Issues (20)
- 能不能讲解下您这个算法的思路? HOT 1
- 输出结果分别代表什么意思 HOT 4
- 应该直接引入停词库 HOT 1
- 请问一下第五列位置成词概率是怎么算出来的? HOT 11
- 左右信息熵计算问题 HOT 3
- windows HOT 2
- words_sort.data 无结果 HOT 20
- linux和windows同样数据跑的结果不一样 HOT 2
- 关于 pmi 的计算 HOT 8
- words.data和words_sort.data为空 HOT 5
- 最终的排序只按词频合理吗 HOT 7
- 小白问下第三步什么意思啊,点开后直接闪退 HOT 1
- 设备上没有空间 HOT 5
- words.data和words_sort.data为空的问题已解决。并在win mac linux上测试 HOT 1
- total 与freq的计算问题 HOT 4
- 关于 isChinese 的字符编码范围 HOT 1
- 请问怎么用gradle编译的 HOT 2
- 抽取结果与示例不太一致 HOT 4
- found a boundary condition problem HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dict_build.