kepmov / elasticsearch-analysis-hanlp Goto Github PK
View Code? Open in Web Editor NEW基于hanlp的elasticsearch分词插件
Home Page: https://github.com/hankcs/HanLP
基于hanlp的elasticsearch分词插件
Home Page: https://github.com/hankcs/HanLP
看README文档有“下载插件”,请问是否有已编译好的插件提供下载
请教下如何只定查询的分析器呢?
GET /product/_search
{
"query": {
"multi_match" : {
"query": "松鼠",
"fields": [ "name", "keywords"]
}
}
}
这样不指定查询分析器,感觉查询的结果还是拆分成了单个字去查询了
下面这一步已经通过了
GET /_analyze
{
"analyzer":"hanlp-index",
"text":"三只松鼠"
}
报错:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[node45-1][ip:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to find global analyzer [hanlp-index]"
},
"status": 400
}
es 7.3.1按教程安装, 未报错, 但是插件也没起作用
Jul 02, 2018 5:29:06 PM com.hankcs.hanlp.HanLP$Config
SEVERE: 没有找到HanLP.properties,可能会导致找不到data
========Tips========
请将HanLP.properties放在下列目录:
Web项目则请放到下列目录:
Webapp/WEB-INF/lib
Webapp/WEB-INF/classes
Appserver/lib
JRE/lib
并且编辑root=PARENT/path/to/your/data
现在HanLP将尝试从/opt/elasticsearch-6.2.4读取data……
data/dictionary/CoreNatureDictionary.txt
核心词典data/dictionary/CoreNatureDictionary.txt加载失败
您好,问一下,我在本地下载了源码,想分析下,怎么能实现这个代码调试呢?万分感谢
麻烦给解决一下这个问题被一直报这个错 我用的是es7.4.2 failed to find global analyzer [hanlp-index]
SEVERE: 没有找到HanLP.properties,可能会导致找不到data
我的plugin-security.policy文件内容是:
grant {
permission java.util.PropertyPermission "*", "read,write";
};
也修改了es config目录下的jvm.options文件
es启动时已经加载 -Djava.security.policy=../plugins/analysis-hanlp/plugin-security.policy
但用hanlp分词时es后台报错:
java.security.AccessControlException: access denied ("java.util.PropertyPermission" "*" "read,write")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:1.8.0_66]
at java.security.AccessController.checkPermission(AccessController.java:884) ~[?:1.8.0_66]
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) ~[?:1.8.0_66]
at java.lang.SecurityManager.checkPropertiesAccess(SecurityManager.java:1262) ~[?:1.8.0_66]
at java.lang.System.getProperties(System.java:630) ~[?:1.8.0_66]
at com.hankcs.hanlp.HanLP$Config.(HanLP.java:240) ~[?:?]
这个问题折磨了我好长时间,查了很多资料,也没找到原因,特意登录来问问。如能指点,不胜感激。
[2020-03-11T12:45:03,997][INFO ][o.e.x.s.a.s.FileRolesStore] [RANYINGCOMPUTER] parsed [0] roles from file [D:\MyDevelopmentTool\elasticsearch\elasticsearch-7.4.2\elasticsearch-7.4.2\config\roles.yml]
[2020-03-11T12:45:05,143][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [RANYINGCOMPUTER] [controller/17916] [Main.cc@110] controller (64 bit): Version 7.4.2 (Build 473f61b8a5238b) Copyright (c) 2019 Elasticsearch BV
[2020-03-11T12:45:05,800][DEBUG][o.e.a.ActionModule ] [RANYINGCOMPUTER] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security
[2020-03-11T12:45:06,429][INFO ][o.e.d.DiscoveryModule ] [RANYINGCOMPUTER] using discovery type [zen] and seed hosts providers [settings]
[2020-03-11T12:45:07,501][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [RANYINGCOMPUTER] fatal error in thread [main], exiting
java.lang.NoSuchMethodError: org.elasticsearch.rest.BaseRestHandler.(Lorg/elasticsearch/common/settings/Settings;)V
at com.hankcs.rest.RestDicOperateAction.(RestDicOperateAction.java:36) ~[?:?]
at org.elasticsearch.plugin.analysis.hanlp.AnalysisHanlpPlugin.getRestHandlers(AnalysisHanlpPlugin.java:71) ~[?:?]
at org.elasticsearch.action.ActionModule.initRestHandlers(ActionModule.java:692) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.node.Node.(Node.java:609) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.node.Node.(Node.java:255) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:221) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:221) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:349) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:125) ~[elasticsearch-cli-7.4.2.jar:7.4.2]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.4.2.jar:7.4.2]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115) ~[elasticsearch-7.4.2.jar:7.4.2]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.4.2.jar:7.4.2]
java.lang.IllegalArgumentException: plugin policy [/usr/share/elasticsearch/plugins/analysis-hanlp/plugin-security.policy] contains illegal permission ("java.io.FilePermission" "-#plus" "read,write") in global grant
at org.elasticsearch.bootstrap.PolicyUtil.validatePolicyPermissionsForJar(PolicyUtil.java:303)
at org.elasticsearch.bootstrap.PolicyUtil.validatePolicyPermissions(PolicyUtil.java:313)
at org.elasticsearch.bootstrap.PolicyUtil.getPluginPolicyInfo(PolicyUtil.java:324)
at org.elasticsearch.bootstrap.Security.getPluginAndModulePermissions(Security.java:142)
at org.elasticsearch.bootstrap.Security.configure(Security.java:106)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:214)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:399)
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159)
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:75)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:116)
at org.elasticsearch.cli.Command.main(Command.java:79)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:81)
For complete error details, refer to the log at /usr/share/elasticsearch/logs/elasticsearch.log
遇到个日文人名的识别问题,索引文档是"xxx藤田淑子",查询时输入“藤田淑子”但分词不一致(藤田淑 匹配不上):
POST _analyze
{
"analyzer": "hanlp_index",
"text": "日本声优藤田淑子逝世"
}
{
"tokens": [
{
"token": "日本",
"start_offset": 0,
"end_offset": 2,
"type": "ns",
"position": 0
},
{
"token": "声优",
"start_offset": 2,
"end_offset": 4,
"type": "nz",
"position": 1
},
{
"token": "藤田",
"start_offset": 4,
"end_offset": 6,
"type": "nr",
"position": 2
},
{
"token": "淑",
"start_offset": 6,
"end_offset": 7,
"type": "ng",
"position": 3
},
{
"token": "子",
"start_offset": 7,
"end_offset": 8,
"type": "ng",
"position": 4
},
{
"token": "逝世",
"start_offset": 8,
"end_offset": 10,
"type": "vi",
"position": 5
}
]
}
查询时分词:
POST _analyze
{
"analyzer": "hanlp_standard",
"text": "藤田淑子"
}
{
"tokens": [
{
"token": "藤田淑",
"start_offset": 0,
"end_offset": 3,
"type": "nr",
"position": 0
},
{
"token": "子",
"start_offset": 0,
"end_offset": 1,
"type": "ng",
"position": 1
}
]
}
这类不一致问题,请问有什么办法可以干预吗?看HanLP文档,不知道开启日文人名识别是否可以解决?
我用的是ES 5.5.0版本 配置完 启动报错,
是否支持elasticsearch6.5.4版本和其他版本呢?如果不支持需要怎么改动?要改哪些东西?不了解麻烦作者提供详细的介绍也方便后面用户,不胜感激!!
[2018-08-29T14:19:56,487][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [] fatal error in thread [elasticsearch[vFHEl-z][generic][T#3]], exiting
java.lang.ExceptionInInitializerError: null
at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:363) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenizerFactories(AnalysisRegistry.java:177) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:155) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.index.IndexService.(IndexService.java:162) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:367) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:453) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.indices.IndicesService.verifyIndexMetadata(IndicesService.java:497) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.gateway.Gateway.performStateRecovery(Gateway.java:127) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.gateway.GatewayService$1.doRun(GatewayService.java:223) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:725) ~[elasticsearch-6.3.2.jar:6.3.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.3.2.jar:6.3.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.security.AccessControlException: access denied ("java.util.PropertyPermission" "*" "read,write")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:1.8.0_131]
at java.security.AccessController.checkPermission(AccessController.java:884) ~[?:1.8.0_131]
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) ~[?:1.8.0_131]
at java.lang.SecurityManager.checkPropertiesAccess(SecurityManager.java:1262) ~[?:1.8.0_131]
at java.lang.System.getProperties(System.java:630) ~[?:1.8.0_131]
at org.elasticsearch.index.analysis.HanLPTokenizerFactory.(HanLPTokenizerFactory.java:22) ~[?:?]
... 14 more
明明已经是777权限来,还是一直报异常,求救
自定义analyzer若包含同义词过滤器
"hanlp_syno_index": {
"type": "custom",
"tokenizer": "hanlp-index",
"filter": [
"my_synonym_filter"
]
},
创建索引时就会报错,如下所示:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "failed to build synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: 美国和伊拉克 analyzed to a token (伊拉克) with position increment != 1 (got: 2)"
}
}
},
"status": 400
}
这种问题如何解决?还请指教,谢谢!
如题,非常感谢!
启动会报错
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: plugin policy [D:\search\7.17.15\elasticsearch-7.17.15\plugins\analysis-hanlp\plugin-security.policy] contains illegal permission ("java.io.FilePermission" "D:/search/7.17.15/elasticsearch-7.17.15/plugins/analysis-hanlp/data/-" "read,write,delete") in global grant
当把plugin-security.policy文件中的相关内容注释掉后,再启动,报另外的错
java.lang.IllegalStateException: failed to load plugin class [org.elasticsearch.plugin.analysis.hanlp.AnalysisHanLPPlugin]
Likely root cause: java.lang.ClassNotFoundException: org.elasticsearch.common.io.PathUtils
请问是版本兼容问题嘛?该怎么解决呢?
SEVERE: 没有找到hanlp.properties,可能会导致找不到data
========Tips========
请将hanlp.properties放在下列目录:
Web项目则请放到下列目录:
Webapp/WEB-INF/lib
Webapp/WEB-INF/classes
Appserver/lib
JRE/lib
并且编辑root=PARENT/path/to/your/data
你的qq加不了 需要验证
请问没有加载自定义词典是什么问题呢?
[2018-12-25T09:34:00,228][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [k
1dzoLq] fatal error in thread [elasticsearch[k1dzoLq][analyze][T#1]], exiting
java.lang.ExceptionInInitializerError: null
at com.hankcs.hanlp.seg.Segment.seg(Segment.java:338) ~[?:?]
at com.hankcs.lucene4.HanlpSegmenter.next(HanlpSegmenter.java:65) ~[?:?]
at com.hankcs.lucene4.HanLPTokenizer.incrementToken(HanLPTokenizer.java:
~[?:?]
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction
.simpleAnalyze(TransportAnalyzeAction.java:267) ~[elasticsearch-6.5.3.jar:6.5.3]
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction
.analyze(TransportAnalyzeAction.java:244) ~[elasticsearch-6.5.3.jar:6.5.3]
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction
.shardOperation(TransportAnalyzeAction.java:165) ~[elasticsearch-6.5.3.jar:6.5.3
]
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction
.shardOperation(TransportAnalyzeAction.java:81) ~[elasticsearch-6.5.3.jar:6.5.3]
at org.elasticsearch.action.support.single.shard.TransportSingleShardAct
ion$1.doRun(TransportSingleShardAction.java:112) ~[elasticsearch-6.5.3.jar:6.5.3
]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreserv
ingAbstractRunnable.doRun(ThreadContext.java:723) ~[elasticsearch-6.5.3.jar:6.5.
3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(Abstrac
tRunnable.java:37) ~[elasticsearch-6.5.3.jar:6.5.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149) ~[?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624) ~[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.security.AccessControlException: access denied ("java.util.Prope
rtyPermission" "*" "read,write")
at java.security.AccessControlContext.checkPermission(AccessControlConte
xt.java:472) ~[?:1.8.0_171]
at java.security.AccessController.checkPermission(AccessController.java:
884) ~[?:1.8.0_171]
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) ~
[?:1.8.0_171]
at java.lang.SecurityManager.checkPropertiesAccess(SecurityManager.java:
1262) ~[?:1.8.0_171]
at java.lang.System.getProperties(System.java:630) ~[?:1.8.0_171]
at com.hankcs.hanlp.HanLP$Config.(HanLP.java:240) ~[?:?]
... 13 more
我现在想要调用原生HanLP生成与hanlp-index(索引模式)和hanlp-smart(智能模式)一样的分词结果该调用什么函数?HanLP.segment()这个好像不对。
目前按照说明已经完成插件的安装但是发生个错误导致无法启动,请帮忙查看原因?
[2020-06-03T17:05:23,142][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-local] fatal error in thread [main], exiting
java.lang.NoSuchMethodError: org.elasticsearch.rest.BaseRestHandler.(Lorg/elasticsearch/common/settings/Settings;)V
at com.hankcs.rest.RestDicOperateAction.(RestDicOperateAction.java:36) ~[?:?]
at org.elasticsearch.plugin.analysis.hanlp.AnalysisHanlpPlugin.getRestHandlers(AnalysisHanlpPlugin.java:71) ~[?:?]
at org.elasticsearch.action.ActionModule.initRestHandlers(ActionModule.java:781) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.node.Node.(Node.java:633) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.node.Node.(Node.java:264) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:227) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:227) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:393) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) ~[elasticsearch-cli-7.7.0.jar:7.7.0]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.7.0.jar:7.7.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.7.0.jar:7.7.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.7.0.jar:7.7.0]
java.security.AccessControlException: access denied ("java.io.FilePermission" "data/dictionary/CoreNatureDictionary.tr.txt" "read")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:1.8.0_181]
at java.security.AccessController.checkPermission(AccessController.java:884) ~[?:1.8.0_181]
数据持久化到ES是否有遇到权限报错?这个会导致es服务下线 是怎么处理的 ?
@pengcong90 方便的话加一下我微信1042185520有几个问题请教下 或者邮件我[email protected]
-Djava.security.policy=F:/soft/servers/elasticsearch-6.2.4/plugins/analysis-hanlp/plugin-security.policy
这里设置相对路径会出现无权限的问题,换成绝对路径后没有出现无权限问题
GET /_analyze
{
"analyzer":"hanlp-index",
"text":"三只松鼠"
}
结果是504
{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
}
应该怎么设置呢? 操作系统是win10 jdk1.8
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.