关于pinyin分词正向搜索 about elasticsearch-analysis-pinyin HOT 13 CLOSED

medcl commented on July 30, 2024

关于pinyin分词正向搜索

from elasticsearch-analysis-pinyin.

Comments (13)

medcl commented on July 30, 2024

分词配置里面，把ngram去掉，重试

send via my Phone.

在 2015年8月12日，上午8:50，starckgates [email protected] 写道：

hi，medcl大神，我又来这边看pinyin了，请教您个问题，如果我想要通过pinyin进行正向搜索，应该怎么处理呢？比如，我想搜索 “我们来自**” “**是个大国”，我输入‘zg’ 就会把前面的搜出来，您明白这个意思了吗？就是说我想把正向的内容放在前面，或者是像sql中like zg% 这样的。

另外，我分词设置成如下

################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard,nGram]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prifix"
padding_char: " "

映射设置成如下：

POST /tag/keywords/_mapping
{
"keywords": {
"properties": {
"kwname": {
"type": "multi_field",
"fields": {
"kwname": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"analyzer": "pinyin_analyzer",
"boost": 10
},
"primitive": {
"type": "string",
"store": "yes",
"analyzer": "keyword"
}
}
}
}
}
}

http://localhost:9200/tag/_analyze?text=%E5%88%98%E5%BE%B7%E5%8D%8E&analyzer=pinyin_analyzer

结果是

{"tokens":[{"token":"l","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ld","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"d","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"dh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"h","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hl","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"l","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"li","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"i","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"iu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"u","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ud","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"d","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"de","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"e","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"eh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"h","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"u","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ua","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"a","start_offset":0,"end_offset":3,"type":"word","position":1}]}

这个有点太细了，怎么让他粗一些
比如就要
{"token":"liu","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"de","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"hua","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"ldh","start_offset":0,"end_offset":3,"type":"word","position":1},

应该怎么设置？

谢谢大神~~~~~

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

starckgates commented on July 30, 2024

这样就变成

{"tokens":[{"token":"ldh liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}

这样了，并不是每个pinyin都分开索引的~

from elasticsearch-analysis-pinyin.

starckgates commented on July 30, 2024

这个时候搜索不到内容了

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

from elasticsearch-analysis-pinyin.

medcl commented on July 30, 2024

用prefix query试试，你的需求不就这个么？

send via my Phone.

在 2015年8月12日，上午9:21，starckgates [email protected] 写道：

这个时候搜索不到内容了

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

starckgates commented on July 30, 2024

额，好复杂，我试试，谢谢大神~

哦，对了大神，我看pinyin这里面创建索引用的settings，而不是在配置文件里设置的，这个是不是可以直接写在配置文件里来用？像ik分词那样，把analysis定义在.yml配置文件中？

from elasticsearch-analysis-pinyin.

medcl commented on July 30, 2024

standard filter你也去掉了么？

send via my Phone.

在 2015年8月12日，上午9:19，starckgates [email protected] 写道：

这样就变成

{"tokens":[{"token":"ldh liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}

这样了，并不是每个pinyin都分开索引的~

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

starckgates commented on July 30, 2024

filter没去掉。
我写在.yml里了
如下

################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "

from elasticsearch-analysis-pinyin.

medcl commented on July 30, 2024

拼音analyzer的type没有设置为custom

send via my Phone.

在 2015年8月12日，上午10:00，starckgates [email protected] 写道：

filter没去掉。
我写在.yml里了
如下

################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "

—
Reply to this email directly or view it on GitHub.

from elasticsearch-analysis-pinyin.

starckgates commented on July 30, 2024

################################## Pinyin ###################################
index:
analysis:
analyzer:
pinyin_analyzer:
type: custom
tokenizer: my_pinyin
filter: [standard]
tokenizer:
my_pinyin:
type: pinyin
first_letter: "prefix"
padding_char: " "

这样吗？

from elasticsearch-analysis-pinyin.

medcl commented on July 30, 2024

是的,注意格式

from elasticsearch-analysis-pinyin.

starckgates commented on July 30, 2024

嗯，这里面好像一写代码就都靠右对齐了。。。
我试试

from elasticsearch-analysis-pinyin.

starckgates commented on July 30, 2024

可以了大神，谢谢大神。~~

from elasticsearch-analysis-pinyin.

medcl commented on July 30, 2024

cool~

from elasticsearch-analysis-pinyin.

关于pinyin分词正向搜索 about elasticsearch-analysis-pinyin HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent