Comments (13)
分词配置里面,把ngram去掉,重试
send via my Phone.
在 2015年8月12日,上午8:50,starckgates [email protected] 写道:
hi,medcl大神,我又来这边看pinyin了,请教您个问题,如果我想要通过pinyin进行正向搜索,应该怎么处理呢?比如,我想搜索 “我们来自**” “**是个大国”,我输入‘zg’ 就会把前面的搜出来,您明白这个意思了吗?就是说我想把正向的内容放在前面,或者是像sql中like zg% 这样的。
另外,我分词设置成如下
################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard,nGram]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prifix"
padding_char: " "映射设置成如下:
POST /tag/keywords/_mapping
{
"keywords": {
"properties": {
"kwname": {
"type": "multi_field",
"fields": {
"kwname": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"analyzer": "pinyin_analyzer",
"boost": 10
},
"primitive": {
"type": "string",
"store": "yes",
"analyzer": "keyword"
}
}
}
}
}
}http://localhost:9200/tag/_analyze?text=%E5%88%98%E5%BE%B7%E5%8D%8E&analyzer=pinyin_analyzer
结果是
{"tokens":[{"token":"l","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ld","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"d","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"dh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"h","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hl","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"l","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"li","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"i","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"iu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"u","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ud","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"d","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"de","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"e","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"eh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"h","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"u","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ua","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"a","start_offset":0,"end_offset":3,"type":"word","position":1}]}
这个有点太细了,怎么让他粗一些
比如就要
{"token":"liu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"de","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hua","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ldh","start_offset":0,"end_offset":3,"type":"word","position":1},应该怎么设置?
谢谢大神~~~~~
—
Reply to this email directly or view it on GitHub.
from elasticsearch-analysis-pinyin.
这样就变成
{"tokens":[{"token":"ldh liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}
这样了,并不是每个pinyin都分开索引的~
from elasticsearch-analysis-pinyin.
这个时候搜索不到内容了
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
from elasticsearch-analysis-pinyin.
用prefix query试试,你的需求不就这个么?
send via my Phone.
在 2015年8月12日,上午9:21,starckgates [email protected] 写道:
这个时候搜索不到内容了
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
—
Reply to this email directly or view it on GitHub.
from elasticsearch-analysis-pinyin.
额,好复杂,我试试,谢谢大神~
哦,对了大神,我看pinyin这里面创建索引用的settings,而不是在配置文件里设置的,这个是不是可以直接写在配置文件里来用?像ik分词那样,把analysis定义在.yml配置文件中?
from elasticsearch-analysis-pinyin.
standard filter你也去掉了么?
send via my Phone.
在 2015年8月12日,上午9:19,starckgates [email protected] 写道:
这样就变成
{"tokens":[{"token":"ldh liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}
这样了,并不是每个pinyin都分开索引的~
—
Reply to this email directly or view it on GitHub.
from elasticsearch-analysis-pinyin.
filter没去掉。
我写在.yml里了
如下
################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "
from elasticsearch-analysis-pinyin.
拼音analyzer的type没有设置为custom
send via my Phone.
在 2015年8月12日,上午10:00,starckgates [email protected] 写道:
filter没去掉。
我写在.yml里了
如下################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "—
Reply to this email directly or view it on GitHub.
from elasticsearch-analysis-pinyin.
################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
type: custom
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "
这样吗?
from elasticsearch-analysis-pinyin.
是的,注意格式
from elasticsearch-analysis-pinyin.
嗯,这里面好像一写代码就都靠右对齐了。。。
我试试
from elasticsearch-analysis-pinyin.
可以了大神,谢谢大神。~~
from elasticsearch-analysis-pinyin.
cool~
from elasticsearch-analysis-pinyin.
Related Issues (20)
- 关于zh,ch,sh无法查询到相关的词语
- 关于示例中name.pinyin搜索能直接搜中文英文 HOT 1
- 没有高亮 HOT 2
- No installable zip in release assets for v8.4.2 and v8.4.3 HOT 1
- es 8.5X版本无法建立mapping HOT 1
- v6.8.20 源码和jar包对不上
- es7.17.0 使用7.17.0版本依然报错startOffset HOT 3
- 求助,使用match_phrase搜索不到结果 HOT 4
- 中文首字符携带数字排序不理解大小
- 严重BUG:当分词内容中包含单独的A字母时,这个A字母会被分词器扔掉 HOT 1
- 如何解决同音字的问题 HOT 2
- 没有7.17.10版本吗? HOT 1
- elasticsearch8.7.0可以使用7.x版本的拼音吗 HOT 3
- 怎么在分词后保留"c++软件工程师"中“+”号在结果中,为什么拼音分词器会过滤掉符号呢
- 中英文混合时能否也支持下提取英文单词首字母
- 拼音首字母查询问题,当第二个字的拼音首字母为第一个字的韵母时查询不到结果 HOT 1
- 构建了 8.10.2 ,8.10.3,8.10.4,7.17.14供使用 HOT 6
- 求ES 8.12版本的插件 HOT 2
- ES8.9.2,release版本没有编译好的jar包 HOT 1
- 希望提供8.13.2、7.17.19版本插件 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-analysis-pinyin.