Giter VIP home page Giter VIP logo

zhparser's Introduction

Zhparser

Zhparser is a PostgreSQL extension for full-text search of Chinese language (Mandarin Chinese). It implements a Chinese language parser base on the Simple Chinese Word Segmentation(SCWS).

Project home page: http://blog.amutu.com/zhparser/

注意:对于分词结果不满意的,或者需要调试分词结果的,可以在这个页面调试:http://www.xunsearch.com/scws/demo/v48.php

Docker快速体验

run the container:

docker run --name pgzhparser -d -e POSTGRES_PASSWORD=somepassword zhparser/zhparser:bookworm-16

login the postgres database as user postgres:

docker exec -it pgzhparser psql postgres postgres

create the extension and use it:

CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION testzhcfg (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION testzhcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple;
SELECT * FROM ts_parse('zhparser', 'hello world! 2010年保障房建设在全国范围内获全面启动');

you will get:
tokid | token
-------+-------
101 | hello
101 | world
117 | !
101 | 2010
113 | 年
118 | 保障
110 | 房建
118 | 设在
110 | 全国
110 | 范围
102 | 内
118 | 获
97 | 全面
118 | 启动
(14 行记录)

更多docker镜像信息,访问这里:zhparser的dockerub
zhparser的docker镜像基于PostgreSQL的docker官方镜像构建,更多的用法参见:https://hub.docker.com/_/postgres

INSTALL

0.前置条件

zhparser支持PostgreSQL 9.2及以上版本,请确保你的PG版本符合要求。 对于REDHAT/CentOS Linux系统,请确保安装了相关的库和头文件,一般它们在postgresql-devel软件包中。

1.安装SCWS

 wget -q -O - http://www.xunsearch.com/scws/down/scws-1.2.3.tar.bz2 | tar xf -

 cd scws-1.2.3 ; ./configure ; make install

注意:在FreeBSD release 10及以上版本上运行configure时,需要增加--with-pic选项。

如果是从github上下载的scws源码需要先运行以下命令生成configure文件: 

 touch README;aclocal;autoconf;autoheader;libtoolize;automake --add-missing

2.下载zhparser源码

 git clone https://github.com/amutu/zhparser.git

3.编译和安装zhparser

 make && make install

如果scws的路径不在默认的 /usr/local 下,可以设置SCWS_HOME 例如: SCWS_HOME=/usr make && make install

如果你同时安装了多个版本的PostgreSQL, 可以通过指定 PG_CONFIG 来为指定的版本编译扩展:

 PG_CONFIG=/usr/lib/postgresql/9.5/bin/pg_config make && make install

注意:在*BSD上编译安装时,使用gmake代替make

4.创建extension

 psql dbname superuser -c 'CREATE EXTENSION zhparser'

CONFIGURATION

以下配置在PG9.2及以上版本使用,这些选项是用来控制字典加载行为和分词行为的,这些选项都不是必须的,默认都为false(即如果没有在配置文件中设置这些选项,则zhparser的行为与将下面的选项设置为false一致)。

忽略所有的标点等特殊符号: zhparser.punctuation_ignore = f

闲散文字自动以二字分词法聚合: zhparser.seg_with_duality = f

将词典全部加载到内存里: zhparser.dict_in_memory = f

短词复合: zhparser.multi_short = f

散字二元复合: zhparser.multi_duality = f

重要单字复合: zhparser.multi_zmain = f

全部单字复合: zhparser.multi_zall = f

除了zhparser自带的词典,用户可以增加自定义词典,自定义词典的优先级高于自带的词典。自定义词典的文件必须放在share/tsearch_data目录中,zhparser根据文件扩展名确定词典的格式类型,.txt扩展名表示词典是文本格式,.xdb扩展名表示这个词典是xdb格式,多个文件使用逗号分隔,词典的分词优先级由低到高,如:

zhparser.extra_dicts = 'dict_extra.txt,mydict.xdb'

注意:zhparser.extra_dicts和zhparser.dict_in_memory两个选项需要在backend启动前设置(可以在配置文件中修改然后reload,之后新建连接会生效),其他选项可以随时在session中设置生效。zhparser的选项与scws相关的选项对应,关于这些选项的含义,可以参考scws的文档:http://www.xunsearch.com/scws/docs.php#libscws

EXAMPLE

-- create the extension

CREATE EXTENSION zhparser;

-- make test configuration using parser

CREATE TEXT SEARCH CONFIGURATION testzhcfg (PARSER = zhparser);

-- add token mapping

ALTER TEXT SEARCH CONFIGURATION testzhcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple;

-- ts_parse

SELECT * FROM ts_parse('zhparser', 'hello world! 2010年保障房建设在全国范围内获全面启动,从**到地方纷纷加大 了保障房的建设和投入力度 。2011年,保障房进入了更大规模的建设阶段。住房城乡建设部党组书记、部长姜伟新去年底在全国住房城乡建设工作会议上表示,要继续推进保障性安居工程建设。');

-- test to_tsvector

SELECT to_tsvector('testzhcfg','“今年保障房新开工数量虽然有所下调,但实际的年度在建规模以及竣工规模会超以往年份,相对应的对资金的需求也会创历>史纪录。”陈国强说。在他看来,与2011年相比,2012年的保障房建设在资金配套上的压力将更为严峻。');

-- test to_tsquery

SELECT to_tsquery('testzhcfg', '保障房资金压力');

自定义词库

** 详解 TXT 词库的写法 (TXT词库目前已兼容 cli/scws_gen_dict 所用的文本词库) **

  1. 每行一条记录,以 # 或 分号开头的相当于注释,忽略跳过

  2. 每行由4个字段组成,依次为“词语"(由中文字或3个以下的字母合成), "TF", "IDF", "词性",字段使用空格或制表符分开,数量不限,可自行对齐以美化

  3. 除“词语”外,其它字段可忽略不写。若忽略,TF和IDF默认值为 1.0 而 词性为 "@"

  4. 由于 TXT 库动态加载(内部监测文件修改时间自动转换成 xdb 存于系统临时目录),故建议TXT词库不要过大

  5. 删除词做法,请将词性设为“!“,则表示该词设为无效,即使在其它核心库中存在该词也视为无效

注意:1.自定义词典的格式可以是文本TXT,也可以是二进制的XDB格式。XDB格式效率更高,适合大辞典使用。可以使用scws自带的工具scws-gen-dict将文本词典转换为XDB格式;2.zhparser默认的词典是简体中文,如果需要繁体中文,可以在这里下载已经生成好的XDB格式此词典。3.自定义词典的例子可以参考dict_extra.txt。更多信息参见SCWS官方文档

自定义词库 2.1

** 自定义词库2.1 增加自定义词库的易容性, 并兼容1.0提供的功能 **

自定义词库需要superuser权限, 自定义库是数据库级别的(不是实例),每个数据库拥有自己的自定义分词, 并存储在data目录下base/数据库ID下(2.0 版本存储在share/tsearch_data下)

生成环境版本升级(新环境直接安装就可以): alter extension zhparser update ;

test=# SELECT * FROM ts_parse('zhparser', '保障房资金压力');
 tokid | token
-------+-------
   118 | 保障
   110 | 房
   110 | 资金
   110 | 压力

test=# insert into zhparser.zhprs_custom_word values('资金压力');
--删除词insert into zhprs_custom_word(word, attr) values('word', '!');
--\d zhprs_custom_word 查看其表结构,支持TD, IDF
test=# select sync_zhprs_custom_word();
 sync_zhprs_custom_word
------------------------

(1 row)

test=# \q --sync 后重新建立连接
[lzzhang@lzzhang-pc bin]$ ./psql -U lzzhang -d test -p 1600
test=# SELECT * FROM ts_parse('zhparser', '保障房资金压力');
 tokid |  token
-------+----------
   118 | 保障
   110 | 房
   120 | 资金压力

COPYRITE

zhparser

Portions Copyright (c) 2012-2017, Jov([email protected])

Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

zhparser's People

Contributors

amutu avatar dreamsxin avatar zlianzhuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zhparser's Issues

搜索比较慢,求助怎么优化

我大概有50w的数据,搜索起来特别慢

master=> SELECT id FROM question            WHERE to_tsvector('testzhcfg', content) @@ to_tsquery('testzhcfg', 'sdfdsfsdf') order by id desc limit 10;
 id
----
(0 rows)

master=> select count(*) from question;
 count
--------
 504440
(1 row)

求助能怎么优化?

冒昧的问一下,这个项目还在维护吗?打扰了。

最近打算使用postgresql进行中文搜索,搜到推荐最多的就是pg_jieba和zhparser。不过查看这两个github情况,都是一两年没更新过了。冒昧的问一下作者,停止维护的原因是什么?是否可以放心的在现有的项目里使用zhparser?如果这个项目不打算继续维护了,有什么好的替代方案?使用Elasticsearch吗?非常抱歉打扰您。非常感谢。

执行 select sync_zhprs_custom_word() 时失败

在使用 执行函数( select sync_zhprs_custom_word() )同步自定义词库时,出现了以下错误。不知道是何原因,由于使用了云端数据库,没权限进行源码或词库文本文件打补丁之类的操作,请帮我看看有什么解决办法,十分感谢!

2021年1月18日 19:24:49

STATEMENT: /* Query from DMS-WEBSQL-0-Qid_1610969088976 by user 1943298325287307 */ select sync_zhprs_custom_word()
2021年1月18日 19:24:49

CONTEXT: PL/pgSQL function sync_zhprs_custom_word() line 17 at EXECUTE
2021年1月18日 19:24:49

ERROR: 22004: query string argument of EXECUTE is null

zhparser install

SCWS_HOME=/usr/local make

zhparser.c: In function ‘init’:
zhparser.c:106: error: too many arguments to function ‘DefineCustomBoolVariable’
zhparser.c:118: error: too many arguments to function ‘DefineCustomStringVariable’
zhparser.c:130: error: too many arguments to function ‘DefineCustomBoolVariable’
zhparser.c:143: error: too many arguments to function ‘DefineCustomBoolVariable’
zhparser.c:155: error: too many arguments to function ‘DefineCustomBoolVariable’
zhparser.c:167: error: too many arguments to function ‘DefineCustomBoolVariable’
zhparser.c:179: error: too many arguments to function ‘DefineCustomBoolVariable’
zhparser.c:191: error: too many arguments to function ‘DefineCustomBoolVariable’
make: *** [zhparser.o] Error 1

Permission denied

在create extension的时候报错Permission denied.

jupiter=# create extension zhparser ;
ERROR:  could not open file "/usr/share/postgresql/10/tsearch_data/qc_dict_jupiter.txt" for writing: Permission denied
HINT:  COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.
CONTEXT:  SQL statement "copy (select word, tf, idf, attr from zhparser.zhprs_custom_word) to '/usr/share/postgresql/10/tsearch_data/qc_dict_jupiter.txt' encoding 'utf8'"
PL/pgSQL function sync_zhprs_custom_word() line 11 at EXECUTE

mac osx 编译报错

mac osx 版本:10.12.1
postgresql 版本:10.1.3
步骤:
下载:http://www.xunsearch.com/scws/down/scws-1.2.3.tar.bz2
./configure --with-pic
sudo make install

下载:https://github.com/amutu/zhparser
sudo gmake
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -mmacosx-version-min=10.8 -arch i386 -arch x86_64 -O2 -bundle -multiply_defined suppress -o zhparser.so zhparser.o -L/mizi/pgsql-10.1.3/lib -L/opt/local/Current/lib -L/opt/local/20151229/lib -Wl,-dead_strip_dylibs -lscws -L/usr/local/lib -Wl,-rpath -Wl,/usr/local/lib -bundle_loader /mizi/pgsql-10.1.3/bin/postgres
ld: warning: directory not found for option '-L/opt/local/Current/lib'
ld: warning: directory not found for option '-L/opt/local/20151229/lib'
ld: warning: ignoring file /usr/local/lib/libscws.dylib, file was built for x86_64 which is not the architecture being linked (i386): /usr/local/lib/libscws.dylib
Undefined symbols for architecture i386:
"_scws_add_dict", referenced from:
_zhprs_start in zhparser.o
"_scws_free", referenced from:
_zhprs_start in zhparser.o
"_scws_free_result", referenced from:
_zhprs_getlexeme in zhparser.o
"_scws_get_result", referenced from:
_zhprs_start in zhparser.o
_zhprs_getlexeme in zhparser.o
"_scws_new", referenced from:
_zhprs_start in zhparser.o
"_scws_send_text", referenced from:
_zhprs_start in zhparser.o
"_scws_set_charset", referenced from:
_zhprs_start in zhparser.o
"_scws_set_dict", referenced from:
_zhprs_start in zhparser.o
"_scws_set_duality", referenced from:
_zhprs_start in zhparser.o
"_scws_set_ignore", referenced from:
_zhprs_start in zhparser.o
"_scws_set_multi", referenced from:
_zhprs_start in zhparser.o
"_scws_set_rule", referenced from:
_zhprs_start in zhparser.o
ld: symbol(s) not found for architecture i386

MacOS 10.9 编译貌似成功了,但是无法执行 CREATE EXTENSION zhparser

scws-1.2.2安装过程:

cd scws-1.2.2 ; ./configure ; make install
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking for a sed that does not truncate output... /usr/bin/sed
checking whether ln -s works... yes
checking whether make sets $(MAKE)... (cached) yes
checking build system type... i386-apple-darwin13.4.0
checking host system type... i386-apple-darwin13.4.0
checking for a sed that does not truncate output... (cached) /usr/bin/sed
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /Library/Developer/CommandLineTools/usr/bin/ld
checking if the linker (/Library/Developer/CommandLineTools/usr/bin/ld) is GNU ld... no
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm
checking the name lister (/usr/bin/nm) interface... BSD nm
checking the maximum length of command line arguments... 196608
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking for /Library/Developer/CommandLineTools/usr/bin/ld option to reload object files... -r
checking how to recognize dependent libraries... pass_all
checking for ar... ar
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm output from gcc object... ok
checking for dsymutil... dsymutil
checking for nmedit... nmedit
checking for lipo... lipo
checking for otool... otool
checking for otool64... no
checking for -single_module linker flag... yes
checking for -exported_symbols_list linker flag... yes
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... yes
checking for gcc option to produce PIC... -fno-common -DPIC
checking if gcc PIC flag -fno-common -DPIC works... yes
checking if gcc static flag -static works... no
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/Library/Developer/CommandLineTools/usr/bin/ld) supports shared libraries... yes
checking dynamic linker characteristics... darwin13.4.0 dyld
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... no
checking for logf in -lm... yes
checking fcntl.h usability... yes
checking fcntl.h presence... yes
checking for fcntl.h... yes
checking netinet/in.h usability... yes
checking netinet/in.h presence... yes
checking for netinet/in.h... yes
checking math.h usability... yes
checking math.h presence... yes
checking for math.h... yes
checking for stdlib.h... (cached) yes
checking for string.h... (cached) yes
checking sys/file.h usability... yes
checking sys/file.h presence... yes
checking for sys/file.h... yes
checking sys/param.h usability... yes
checking sys/param.h presence... yes
checking for sys/param.h... yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking for unistd.h... (cached) yes
checking for an ANSI C-conforming const... yes
checking for inline... inline
checking whether time.h and sys/time.h may both be included... yes
checking size of int... 4
checking size of float... 4
checking for struct flock...
checking whether lstat correctly handles trailing slash... no
checking whether lstat accepts an empty string... no
checking whether lstat correctly handles trailing slash... (cached) no
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for sys/param.h... (cached) yes
checking for getpagesize... yes
checking for working mmap... yes
checking for working memcmp... yes
checking for flock... yes
checking for gettimeofday... yes
checking for malloc... yes
checking for memset... yes
checking for munmap... yes
checking for pow... yes
checking for realpath... yes
checking for strcasecmp... yes
checking for strchr... yes
checking for strdup... yes
checking for strrchr... yes
checking for strndup... yes
checking for strtok_r... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating cli/Makefile
config.status: creating etc/Makefile
config.status: creating libscws/Makefile
config.status: creating libscws/version.h
config.status: creating config.h
config.status: executing depfiles commands
config.status: executing libtool commands
Making install in .
make[2]: Nothing to be done for `install-exec-am'.
make[2]: Nothing to be done for `install-data-am'.
Making install in libscws
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT charset.lo -MD -MP -MF .deps/charset.Tpo -c -o charset.lo charset.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT charset.lo -MD -MP -MF .deps/charset.Tpo -c charset.c  -fno-common -DPIC -o .libs/charset.o
mv -f .deps/charset.Tpo .deps/charset.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT crc32.lo -MD -MP -MF .deps/crc32.Tpo -c -o crc32.lo crc32.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT crc32.lo -MD -MP -MF .deps/crc32.Tpo -c crc32.c  -fno-common -DPIC -o .libs/crc32.o
mv -f .deps/crc32.Tpo .deps/crc32.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT pool.lo -MD -MP -MF .deps/pool.Tpo -c -o pool.lo pool.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT pool.lo -MD -MP -MF .deps/pool.Tpo -c pool.c  -fno-common -DPIC -o .libs/pool.o
mv -f .deps/pool.Tpo .deps/pool.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT scws.lo -MD -MP -MF .deps/scws.Tpo -c -o scws.lo scws.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT scws.lo -MD -MP -MF .deps/scws.Tpo -c scws.c  -fno-common -DPIC -o .libs/scws.o
scws.c:349:28: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                txt = (char *) _mem_ndup(s->txt + start, wlen);
                                         ^~~~~~~~~~~~~~
/usr/include/string.h:132:27: note: passing argument to parameter here
char    *strndup(const char *, size_t) __OSX_AVAILABLE_STARTING(__MAC_10_7, __IPHONE_4_3);
                             ^
scws.c:349:7: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                txt = (char *) _mem_ndup(s->txt + start, wlen);
                    ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scws.c:350:16: warning: passing 'unsigned char *' to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                _str_toupper(txt, txt);
                             ^~~
scws.c:208:32: note: passing argument to parameter 'src' here
static void _str_toupper(char *src, char *dst)
                               ^
scws.c:350:21: warning: passing 'unsigned char *' to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                _str_toupper(txt, txt);
                                  ^~~
scws.c:208:43: note: passing argument to parameter 'dst' here
static void _str_toupper(char *src, char *dst)
                                          ^
scws.c:351:23: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                if (SCWS_IS_SPECIAL(txt, wlen))
                                    ^~~
scws.c:28:54: note: expanded from macro 'SCWS_IS_SPECIAL'
#define SCWS_IS_SPECIAL(x,l)    scws_rule_checkbit(s->r,x,l,SCWS_RULE_SPECIAL)
                                                        ^
./rule.h:75:46: note: passing argument to parameter 'str' here
int scws_rule_checkbit(rule_t r, const char *str, int len, unsigned int bit);
                                             ^
scws.c:794:30: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        query = xdict_query(s->d, txt + start, clen);
                                                  ^~~~~~~~~~~
./xdict.h:67:44: note: passing argument to parameter 'key' here
word_t xdict_query(xdict_t xd, const char *key, int len);
                                           ^
scws.c:830:30: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        query = xdict_query(s->d, txt + zmap[i].start, zmap[j].end - zmap[i].start);
                                                  ^~~~~~~~~~~~~~~~~~~
./xdict.h:67:44: note: passing argument to parameter 'key' here
word_t xdict_query(xdict_t xd, const char *key, int len);
                                           ^
scws.c:855:11: warning: equality comparison with extraneous parentheses [-Wparentheses-equality]
                        if ((k == (i+1)))
                             ~~^~~~~~~~
scws.c:855:11: note: remove extraneous parentheses around the comparison to silence this warning
                        if ((k == (i+1)))
                            ~  ^       ~
scws.c:855:11: note: use '=' to turn this equality comparison into an assignment
                        if ((k == (i+1)))
                               ^~
                               =
scws.c:880:28: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                r1 = scws_rule_get(s->r, txt + zmap[i].start, zmap[i].end - zmap[i].start);
                                         ^~~~~~~~~~~~~~~~~~~
./rule.h:72:49: note: passing argument to parameter 'str' here
rule_item_t scws_rule_get(rule_t r, const char *str, int len);
                                                ^
scws.c:895:5: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ___ZRULE_CHECKER3___
                                ^~~~~~~~~~~~~~~~~~~~
scws.c:752:32: note: expanded from macro '___ZRULE_CHECKER3___'
if (!scws_rule_check(s->r, r1, txt + zmap[j].start, zmap[j].end - zmap[j].start))       \
                               ^~~~~~~~~~~~~~~~~~~
./rule.h:81:59: note: passing argument to parameter 'str' here
int scws_rule_check(rule_t r, rule_item_t cr, const char *str, int len);
                                                          ^
scws.c:908:5: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ___ZRULE_CHECKER3___
                                ^~~~~~~~~~~~~~~~~~~~
scws.c:752:32: note: expanded from macro '___ZRULE_CHECKER3___'
if (!scws_rule_check(s->r, r1, txt + zmap[j].start, zmap[j].end - zmap[j].start))       \
                               ^~~~~~~~~~~~~~~~~~~
./rule.h:81:59: note: passing argument to parameter 'str' here
int scws_rule_check(rule_t r, rule_item_t cr, const char *str, int len);
                                                          ^
scws.c:946:5: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ___ZRULE_CHECKER3___
                                ^~~~~~~~~~~~~~~~~~~~
scws.c:752:32: note: expanded from macro '___ZRULE_CHECKER3___'
if (!scws_rule_check(s->r, r1, txt + zmap[j].start, zmap[j].end - zmap[j].start))       \
                               ^~~~~~~~~~~~~~~~~~~
./rule.h:81:59: note: passing argument to parameter 'str' here
int scws_rule_check(rule_t r, rule_item_t cr, const char *str, int len);
                                                          ^
scws.c:959:5: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ___ZRULE_CHECKER3___
                                ^~~~~~~~~~~~~~~~~~~~
scws.c:752:32: note: expanded from macro '___ZRULE_CHECKER3___'
if (!scws_rule_check(s->r, r1, txt + zmap[j].start, zmap[j].end - zmap[j].start))       \
                               ^~~~~~~~~~~~~~~~~~~
./rule.h:81:59: note: passing argument to parameter 'str' here
int scws_rule_check(rule_t r, rule_item_t cr, const char *str, int len);
                                                          ^
scws.c:994:28: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                r1 = scws_rule_get(s->r, txt + zmap[i].start, zmap[k].end - zmap[i].start);
                                         ^~~~~~~~~~~~~~~~~~~
./rule.h:72:49: note: passing argument to parameter 'str' here
rule_item_t scws_rule_get(rule_t r, const char *str, int len);
                                                ^
scws.c:1005:5: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ___ZRULE_CHECKER3___
                                ^~~~~~~~~~~~~~~~~~~~
scws.c:752:32: note: expanded from macro '___ZRULE_CHECKER3___'
if (!scws_rule_check(s->r, r1, txt + zmap[j].start, zmap[j].end - zmap[j].start))       \
                               ^~~~~~~~~~~~~~~~~~~
./rule.h:81:59: note: passing argument to parameter 'str' here
int scws_rule_check(rule_t r, rule_item_t cr, const char *str, int len);
                                                          ^
scws.c:1018:5: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ___ZRULE_CHECKER3___
                                ^~~~~~~~~~~~~~~~~~~~
scws.c:752:32: note: expanded from macro '___ZRULE_CHECKER3___'
if (!scws_rule_check(s->r, r1, txt + zmap[j].start, zmap[j].end - zmap[j].start))       \
                               ^~~~~~~~~~~~~~~~~~~
./rule.h:81:59: note: passing argument to parameter 'str' here
int scws_rule_check(rule_t r, rule_item_t cr, const char *str, int len);
                                                          ^
scws.c:1046:5: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ___ZRULE_CHECKER3___
                                ^~~~~~~~~~~~~~~~~~~~
scws.c:752:32: note: expanded from macro '___ZRULE_CHECKER3___'
if (!scws_rule_check(s->r, r1, txt + zmap[j].start, zmap[j].end - zmap[j].start))       \
                               ^~~~~~~~~~~~~~~~~~~
./rule.h:81:59: note: passing argument to parameter 'str' here
int scws_rule_check(rule_t r, rule_item_t cr, const char *str, int len);
                                                          ^
scws.c:1059:5: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ___ZRULE_CHECKER3___
                                ^~~~~~~~~~~~~~~~~~~~
scws.c:752:32: note: expanded from macro '___ZRULE_CHECKER3___'
if (!scws_rule_check(s->r, r1, txt + zmap[j].start, zmap[j].end - zmap[j].start))       \
                               ^~~~~~~~~~~~~~~~~~~
./rule.h:81:59: note: passing argument to parameter 'str' here
int scws_rule_check(rule_t r, rule_item_t cr, const char *str, int len);
                                                          ^
scws.c:1375:22: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                word = _mem_ndup(s->txt + cur->off, cur->len);
                                                 ^~~~~~~~~~~~~~~~~
/usr/include/string.h:132:27: note: passing argument to parameter here
char    *strndup(const char *, size_t) __OSX_AVAILABLE_STARTING(__MAC_10_7, __IPHONE_4_3);
                             ^
scws.c:1386:31: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if (!(top = xtree_nget(xt, s->txt + cur->off, cur->len, NULL)))
                                                   ^~~~~~~~~~~~~~~~~
./xtree.h:47:42: note: passing argument to parameter 'key' here
void *xtree_nget(xtree_t xt, const char *key, int len, int *vlen);
                                         ^
scws.c:1392:54: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                xtree_nput(xt, top, sizeof(struct scws_topword), s->txt + cur->off, cur->len);
                                                                                 ^~~~~~~~~~~~~~~~~
./xtree.h:44:64: note: passing argument to parameter 'key' here
void xtree_nput(xtree_t xt, void *value, int vlen, const char *key, int len);
                                                               ^
scws.c:1517:31: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if (!(top = xtree_nget(xt, s->txt + cur->off, cur->len, NULL)))
                                                   ^~~~~~~~~~~~~~~~~
./xtree.h:47:42: note: passing argument to parameter 'key' here
void *xtree_nget(xtree_t xt, const char *key, int len, int *vlen);
                                         ^
scws.c:1523:35: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                top->word = (char *)_mem_ndup(s->txt + cur->off, cur->len);
                                                              ^~~~~~~~~~~~~~~~~
/usr/include/string.h:132:27: note: passing argument to parameter here
char    *strndup(const char *, size_t) __OSX_AVAILABLE_STARTING(__MAC_10_7, __IPHONE_4_3);
                             ^
scws.c:1533:54: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                xtree_nput(xt, top, sizeof(struct scws_topword), s->txt + cur->off, cur->len);
                                                                                 ^~~~~~~~~~~~~~~~~
./xtree.h:44:64: note: passing argument to parameter 'key' here
void xtree_nput(xtree_t xt, void *value, int vlen, const char *key, int len);
                                                               ^
24 warnings generated.
mv -f .deps/scws.Tpo .deps/scws.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT xdict.lo -MD -MP -MF .deps/xdict.Tpo -c -o xdict.lo xdict.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT xdict.lo -MD -MP -MF .deps/xdict.Tpo -c xdict.c  -fno-common -DPIC -o .libs/xdict.o
xdict.c:200:14: warning: using the result of an assignment as a condition without parentheses [-Wparentheses]
                                if (part = _strtok_r(NULL, delim, &last))
                                    ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
xdict.c:200:14: note: place parentheses around the assignment to silence this warning
                                if (part = _strtok_r(NULL, delim, &last))
                                         ^
                                    (                                   )
xdict.c:200:14: note: use '==' to turn this assignment into an equality comparison
                                if (part = _strtok_r(NULL, delim, &last))
                                         ^
                                         ==
xdict.c:244:3: warning: implicit declaration of function 'unlink' is invalid in C99 [-Wimplicit-function-declaration]
                unlink(tmpfile);
                ^
2 warnings generated.
mv -f .deps/xdict.Tpo .deps/xdict.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT darray.lo -MD -MP -MF .deps/darray.Tpo -c -o darray.lo darray.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT darray.lo -MD -MP -MF .deps/darray.Tpo -c darray.c  -fno-common -DPIC -o .libs/darray.o
mv -f .deps/darray.Tpo .deps/darray.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT rule.lo -MD -MP -MF .deps/rule.Tpo -c -o rule.lo rule.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT rule.lo -MD -MP -MF .deps/rule.Tpo -c rule.c  -fno-common -DPIC -o .libs/rule.o
rule.c:55:15: warning: passing 'unsigned char [512]' to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
        while (fgets(buf, sizeof(buf) - 1, fp))
                     ^~~
/usr/include/stdio.h:236:30: note: passing argument to parameter here
char    *fgets(char * __restrict, int, FILE *);
                                ^
rule.c:57:39: warning: passing 'unsigned char [512]' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                if (buf[0] != '[' || !(ptr = strchr(buf, ']')))
                                                    ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:57:30: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                if (buf[0] != '[' || !(ptr = strchr(buf, ']')))
                                           ^ ~~~~~~~~~~~~~~~~
rule.c:62:53: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                if (ptr == str || (ptr - str) > 15 || !strcasecmp(str, "attrs"))
                                                                  ^~~
/usr/include/strings.h:78:29: note: passing argument to parameter here
int      strcasecmp(const char *, const char *);
                                ^
rule.c:65:26: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                if (_rule_index_get(r, str) >= 0)
                                       ^~~
rule.c:21:57: note: passing argument to parameter 'name' here
static inline int _rule_index_get(rule_t r, const char *name)
                                                        ^
rule.c:68:28: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                strcpy(r->items[i].name, str);
                                         ^~~
/usr/include/secure/_string.h:83:33: note: expanded from macro 'strcpy'
  __builtin___strcpy_chk (dest, src, __darwin_obsz (dest))
                                ^
rule.c:72:19: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                if (!strcasecmp(str, "special"))
                                ^~~
/usr/include/strings.h:78:29: note: passing argument to parameter here
int      strcasecmp(const char *, const char *);
                                ^
rule.c:74:24: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                else if (!strcasecmp(str, "nostats"))
                                     ^~~
/usr/include/strings.h:78:29: note: passing argument to parameter here
int      strcasecmp(const char *, const char *);
                                ^
rule.c:94:15: warning: passing 'unsigned char [512]' to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
        while (fgets(buf, sizeof(buf) - 1, fp))
                     ^~~
/usr/include/stdio.h:236:30: note: passing argument to parameter here
char    *fgets(char * __restrict, int, FILE *);
                                ^
rule.c:104:22: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if ((ptr = strchr(str, ']')) != NULL)
                                          ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:104:13: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if ((ptr = strchr(str, ']')) != NULL)
                                 ^ ~~~~~~~~~~~~~~~~
rule.c:107:21: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if (!strcasecmp(str, "attrs"))
                                                ^~~
/usr/include/strings.h:78:29: note: passing argument to parameter here
int      strcasecmp(const char *, const char *);
                                ^
rule.c:111:38: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                else if ((i = _rule_index_get(r, str)) >= 0)
                                                                 ^~~
rule.c:21:57: note: passing argument to parameter 'name' here
static inline int _rule_index_get(rule_t r, const char *name)
                                                        ^
rule.c:126:22: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if ((ptr = strchr(str, '+')) == NULL) continue;
                                          ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:126:13: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if ((ptr = strchr(str, '+')) == NULL) continue;
                                 ^ ~~~~~~~~~~~~~~~~
rule.c:128:22: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if ((qtr = strchr(ptr, '=')) == NULL) continue;
                                          ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:128:13: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if ((qtr = strchr(ptr, '=')) == NULL) continue;
                                 ^ ~~~~~~~~~~~~~~~~
rule.c:137:28: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        a->ratio = (short) atoi(qtr);
                                                ^~~
/usr/include/stdlib.h:132:23: note: passing argument to parameter here
int      atoi(const char *);
                          ^
rule.c:150:23: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if ((qtr = strchr(str, ')')) != NULL)
                                                  ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:150:14: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if ((qtr = strchr(str, ')')) != NULL)
                                         ^ ~~~~~~~~~~~~~~~~
rule.c:153:41: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                        a->npath[0] = (unsigned char) atoi(str);
                                                                           ^~~
/usr/include/stdlib.h:132:23: note: passing argument to parameter here
int      atoi(const char *);
                          ^
rule.c:171:23: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if ((qtr = strchr(str, ')')) != NULL)
                                                  ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:171:14: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if ((qtr = strchr(str, ')')) != NULL)
                                         ^ ~~~~~~~~~~~~~~~~
rule.c:174:41: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                        a->npath[1] = (unsigned char) atoi(str);
                                                                           ^~~
/usr/include/stdlib.h:132:23: note: passing argument to parameter here
int      atoi(const char *);
                          ^
rule.c:204:23: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if (!(ptr = strchr(str, '=')))
                                           ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:204:14: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if (!(ptr = strchr(str, '=')))
                                  ^ ~~~~~~~~~~~~~~~~
rule.c:215:16: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        if (!strcmp(ptr, "line"))
                                    ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:217:21: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        else if (!strcmp(ptr, "tf"))
                                         ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:218:27: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                cr->tf = (float) atof(str);
                                                      ^~~
/usr/include/stdlib.h:131:26: note: passing argument to parameter here
double   atof(const char *);
                          ^
rule.c:219:21: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        else if (!strcmp(ptr, "idf"))
                                         ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:220:28: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                cr->idf = (float) atof(str);
                                                       ^~~
/usr/include/stdlib.h:131:26: note: passing argument to parameter here
double   atof(const char *);
                          ^
rule.c:221:21: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        else if (!strcmp(ptr, "attr"))
                                         ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:222:23: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                strncpy(cr->attr, str, 2);
                                                  ^~~
/usr/include/secure/_string.h:119:34: note: expanded from macro 'strncpy'
  __builtin___strncpy_chk (dest, src, len, __darwin_obsz (dest))
                                 ^
rule.c:223:21: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        else if (!strcmp(ptr, "znum"))
                                         ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:225:23: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if ((ptr = strchr(str, ',')) != NULL)
                                                  ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:225:14: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if ((ptr = strchr(str, ',')) != NULL)
                                         ^ ~~~~~~~~~~~~~~~~
rule.c:229:22: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                        cr->zmax = atoi(ptr);
                                                        ^~~
/usr/include/stdlib.h:132:23: note: passing argument to parameter here
int      atoi(const char *);
                          ^
rule.c:232:21: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                cr->zmin = atoi(str);
                                                ^~~
/usr/include/stdlib.h:132:23: note: passing argument to parameter here
int      atoi(const char *);
                          ^
rule.c:234:21: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        else if (!strcmp(ptr, "type"))
                                         ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:236:18: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if (!strncmp(str, "prefix", 6))
                                             ^~~
/usr/include/string.h:84:26: note: passing argument to parameter here
int      strncmp(const char *, const char *, size_t);
                             ^
rule.c:238:23: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                else if (!strncmp(str, "suffix", 6))
                                                  ^~~
/usr/include/string.h:84:26: note: passing argument to parameter here
int      strncmp(const char *, const char *, size_t);
                             ^
rule.c:241:21: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        else if (!strcmp(ptr, "include") || !strcmp(ptr, "exclude"))
                                         ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:241:48: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        else if (!strcmp(ptr, "include") || !strcmp(ptr, "exclude"))
                                                                    ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:245:17: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if (!strcmp(ptr, "include"))
                                            ^~~
/usr/include/string.h:77:25: note: passing argument to parameter here
int      strcmp(const char *, const char *);
                            ^
rule.c:256:26: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                while ((ptr = strchr(str, ',')) != NULL)
                                                     ^~~
/usr/include/string.h:76:26: note: passing argument to parameter here
char    *strchr(const char *, int);
                            ^
rule.c:256:17: warning: assigning to 'unsigned char *' from 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                while ((ptr = strchr(str, ',')) != NULL)
                                            ^ ~~~~~~~~~~~~~~~~
rule.c:260:34: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                        if ((i = _rule_index_get(r, str)) >= 0)
                                                                    ^~~
rule.c:21:57: note: passing argument to parameter 'name' here
static inline int _rule_index_get(rule_t r, const char *name)
                                                        ^
rule.c:267:18: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                ptr = strlen(str) + str;
                                             ^~~
/usr/include/string.h:82:28: note: passing argument to parameter here
size_t   strlen(const char *);
                            ^
rule.c:270:46: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                if (ptr > str && (i = _rule_index_get(r, str)))
                                                                         ^~~
rule.c:21:57: note: passing argument to parameter 'name' here
static inline int _rule_index_get(rule_t r, const char *name)
                                                        ^
rule.c:279:22: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                ptr = str + strlen(str);
                                   ^~~
/usr/include/string.h:82:28: note: passing argument to parameter here
size_t   strlen(const char *);
                            ^
rule.c:288:59: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        xtree_nput(r->tree, cr, sizeof(struct scws_rule_item), str, ptr - str);
                                                                               ^~~
./xtree.h:44:64: note: passing argument to parameter 'key' here
void xtree_nput(xtree_t xt, void *value, int vlen, const char *key, int len);
                                                               ^
rule.c:300:60: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                                xtree_nput(r->tree, cr, sizeof(struct scws_rule_item), str, j);
                                                                                       ^~~
./xtree.h:44:64: note: passing argument to parameter 'key' here
void xtree_nput(xtree_t xt, void *value, int vlen, const char *key, int len);
                                                               ^
52 warnings generated.
mv -f .deps/rule.Tpo .deps/rule.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT lock.lo -MD -MP -MF .deps/lock.Tpo -c -o lock.lo lock.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT lock.lo -MD -MP -MF .deps/lock.Tpo -c lock.c  -fno-common -DPIC -o .libs/lock.o
mv -f .deps/lock.Tpo .deps/lock.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT xdb.lo -MD -MP -MF .deps/xdb.Tpo -c -o xdb.lo xdb.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT xdb.lo -MD -MP -MF .deps/xdb.Tpo -c xdb.c  -fno-common -DPIC -o .libs/xdb.o
xdb.c:335:12: warning: passing 'unsigned char *' to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign]
                        strncpy(buf + 17, key, len);
                                ^~~~~~~~
/usr/include/secure/_string.h:119:28: note: expanded from macro 'strncpy'
  __builtin___strncpy_chk (dest, src, len, __darwin_obsz (dest))
                           ^
xdb.c:504:36: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
        nodes[i].key = (char *) _mem_ndup(buf + 17, buf[16]);
                                          ^~~~~~~~
/usr/include/string.h:132:27: note: passing argument to parameter here
char    *strndup(const char *, size_t) __OSX_AVAILABLE_STARTING(__MAC_10_7, __IPHONE_4_3);
                             ^
xdb.c:604:41: warning: passing 'unsigned char *' to parameter of type 'const char *' converts between pointers to integer types with different sign [-Wpointer-sign]
        xtree_nput(xt, value, ptr->len - voff, buf + 17, buf[16]);
                                               ^~~~~~~~
./xtree.h:44:64: note: passing argument to parameter 'key' here
void xtree_nput(xtree_t xt, void *value, int vlen, const char *key, int len);
                                                               ^
3 warnings generated.
mv -f .deps/xdb.Tpo .deps/xdb.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT xtree.lo -MD -MP -MF .deps/xtree.Tpo -c -o xtree.lo xtree.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT xtree.lo -MD -MP -MF .deps/xtree.Tpo -c xtree.c  -fno-common -DPIC -o .libs/xtree.o
mv -f .deps/xtree.Tpo .deps/xtree.Plo
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=link gcc  -g -O2 -no-undefined -version-info 2:0:1  -o libscws.la -rpath /usr/local/lib charset.lo crc32.lo pool.lo scws.lo xdict.lo darray.lo rule.lo lock.lo xdb.lo xtree.lo  -lm
libtool: link: gcc -dynamiclib  -o .libs/libscws.1.dylib  .libs/charset.o .libs/crc32.o .libs/pool.o .libs/scws.o .libs/xdict.o .libs/darray.o .libs/rule.o .libs/lock.o .libs/xdb.o .libs/xtree.o   -lm    -install_name  /usr/local/lib/libscws.1.dylib -compatibility_version 3 -current_version 3.0 -Wl,-single_module
libtool: link: dsymutil .libs/libscws.1.dylib || :
libtool: link: (cd ".libs" && rm -f "libscws.dylib" && ln -s "libscws.1.dylib" "libscws.dylib")
libtool: link: (cd ".libs" && rm -f "libscws.1.1.0.dylib" && ln -s "libscws.1.dylib" "libscws.1.1.0.dylib")
libtool: link: ( cd ".libs" && rm -f "libscws.la" && ln -s "../libscws.la" "libscws.la" )
test -z "/usr/local/lib" || .././install-sh -c -d "/usr/local/lib"
 /bin/sh ../libtool --preserve-dup-deps  --mode=install /usr/bin/install -c  'libscws.la' '/usr/local/lib/libscws.la'
libtool: install: /usr/bin/install -c .libs/libscws.1.dylib /usr/local/lib/libscws.1.dylib
libtool: install: (cd /usr/local/lib && { ln -s -f libscws.1.dylib libscws.dylib || { rm -f libscws.dylib && ln -s libscws.1.dylib libscws.dylib; }; })
libtool: install: (cd /usr/local/lib && { ln -s -f libscws.1.dylib libscws.1.1.0.dylib || { rm -f libscws.1.1.0.dylib && ln -s libscws.1.dylib libscws.1.1.0.dylib; }; })
libtool: install: /usr/bin/install -c .libs/libscws.lai /usr/local/lib/libscws.la
----------------------------------------------------------------------
Libraries have been installed in:
   /usr/local/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `DYLD_LIBRARY_PATH' environment variable
     during execution

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
test -z "/usr/local/include/scws" || .././install-sh -c -d "/usr/local/include/scws"
 /usr/bin/install -c -m 644 'charset.h' '/usr/local/include/scws/charset.h'
 /usr/bin/install -c -m 644 'crc32.h' '/usr/local/include/scws/crc32.h'
 /usr/bin/install -c -m 644 'pool.h' '/usr/local/include/scws/pool.h'
 /usr/bin/install -c -m 644 'scws.h' '/usr/local/include/scws/scws.h'
 /usr/bin/install -c -m 644 'xdict.h' '/usr/local/include/scws/xdict.h'
 /usr/bin/install -c -m 644 'darray.h' '/usr/local/include/scws/darray.h'
 /usr/bin/install -c -m 644 'rule.h' '/usr/local/include/scws/rule.h'
 /usr/bin/install -c -m 644 'xdb.h' '/usr/local/include/scws/xdb.h'
 /usr/bin/install -c -m 644 'xtree.h' '/usr/local/include/scws/xtree.h'
 /usr/bin/install -c -m 644 'version.h' '/usr/local/include/scws/version.h'
Making install in cli
gcc -DHAVE_CONFIG_H -I. -I.. -I.. -I../libscws    -g -O2 -MT scws_cmd.o -MD -MP -MF .deps/scws_cmd.Tpo -c -o scws_cmd.o scws_cmd.c
mv -f .deps/scws_cmd.Tpo .deps/scws_cmd.Po
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=link gcc  -g -O2   -o scws scws_cmd.o ../libscws/libscws.la -lm
libtool: link: gcc -g -O2 -o .libs/scws scws_cmd.o  ../libscws/.libs/libscws.1.1.0.dylib -lm
gcc -DHAVE_CONFIG_H -I. -I.. -I.. -I../libscws    -g -O2 -MT gen_dict.o -MD -MP -MF .deps/gen_dict.Tpo -c -o gen_dict.o gen_dict.c
gen_dict.c:135:8: warning: assigning to 'char *' from 'unsigned char *' converts between pointers to integer types with different sign [-Wpointer-sign]
        mblen = charset_table_get(charset);
              ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
gen_dict.c:165:12: warning: using the result of an assignment as a condition without parentheses [-Wparentheses]
                        if (ptr = strtok(NULL, delim))
                            ~~~~^~~~~~~~~~~~~~~~~~~~~
gen_dict.c:165:12: note: place parentheses around the assignment to silence this warning
                        if (ptr = strtok(NULL, delim))
                                ^
                            (                        )
gen_dict.c:165:12: note: use '==' to turn this assignment into an equality comparison
                        if (ptr = strtok(NULL, delim))
                                ^
                                ==
2 warnings generated.
mv -f .deps/gen_dict.Tpo .deps/gen_dict.Po
/bin/sh ../libtool --preserve-dup-deps  --tag=CC   --mode=link gcc  -g -O2   -o scws-gen-dict gen_dict.o ../libscws/libscws.la -lm
libtool: link: gcc -g -O2 -o .libs/scws-gen-dict gen_dict.o  ../libscws/.libs/libscws.1.1.0.dylib -lm
test -z "/usr/local/bin" || .././install-sh -c -d "/usr/local/bin"
  /bin/sh ../libtool --preserve-dup-deps  --mode=install /usr/bin/install -c 'scws' '/usr/local/bin/scws'
libtool: install: /usr/bin/install -c .libs/scws /usr/local/bin/scws
  /bin/sh ../libtool --preserve-dup-deps  --mode=install /usr/bin/install -c 'scws-gen-dict' '/usr/local/bin/scws-gen-dict'
libtool: install: /usr/bin/install -c .libs/scws-gen-dict /usr/local/bin/scws-gen-dict
make[2]: Nothing to be done for `install-data-am'.
Making install in etc
test -z "/usr/local/etc" || .././install-sh -c -d "/usr/local/etc"
 /usr/bin/install -c -m 644 'rules.ini' '/usr/local/etc/rules.ini'
 /usr/bin/install -c -m 644 'rules.utf8.ini' '/usr/local/etc/rules.utf8.ini'
 /usr/bin/install -c -m 644 'rules_cht.utf8.ini' '/usr/local/etc/rules_cht.utf8.ini'
make[2]: Nothing to be done for `install-data-am'.

无法为postgresql-12编译版本

在centos, export PG_CONFIG=/usr/pgsql-12/bin/pg_config
然后执行
make && make install
报错。
Makefile:20: /usr/pgsql-12/lib/pgxs/src/makefiles/pgxs.mk: No such file or directory
make: *** No rule to make target `/usr/pgsql-12/lib/pgxs/src/makefiles/pgxs.mk'. Stop.

安装编译出错

PG_CONFIG=/usr/pgsql-9.5/bin/pg_config make && make install
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DLINUX_OOM_ADJ=0 -fPIC -shared -o zhparser.so zhparser.o -L/usr/pgsql-9.5/lib -Wl,--as-needed  -L/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/usr/pgsql-9.5/lib',--enable-new-dtags  -lscws -L/usr/local/lib -Wl,-rpath -Wl,/usr/local/lib
/usr/bin/ld: cannot find -lscws
collect2: 错误:ld 返回 1
make: *** [zhparser.so] 错误 1
[root@preproduction zhparser]#

macOS Big Sur编译错误

系统环境

# uname -a
Darwin MacbookPro 20.1.0 Darwin Kernel Version 20.1.0: Sat Oct 31 00:07:11 PDT 2020; root:xnu-7195.50.7~2/RELEASE_X86_64 x86_64

# sw_vers
ProductName:	Mac OS X
ProductVersion:	10.16
BuildVersion:	20B29

# scws -v
scws (scws-cli/1.2.3: Simpled Chinese Words Segment - Command line usage)

# gmake --version
GNU Make 4.3
Built for x86_64-apple-darwin20.1.0
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

# make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

PG版本

本地PG是通过pgenv编译安装的。

PostgreSQL 10.13
# path
/Users/someone/.pgenv/pgsql-10.13/bin/psql

报错信息

无论使用gmake还是make,都会报相同的错误。

$ PG_CONFIG=/Users/someone/.pgenv/pgsql-10.13/bin/pg_config gmake

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2  -I/usr/local/include/scws  -I. -I./ -I/Users/someone/.pgenv/pgsql-10.13/include/server -I/Users/someone/.pgenv/pgsql-10.13/include/internal    -c -o zhparser.o zhparser.c
zhparser.c:220:10: error: implicit declaration of function 'SplitIdentifierString' is invalid in C99
      [-Werror,-Wimplicit-function-declaration]
            if(!SplitIdentifierString(extra_dicts,',',&elemlist)){
                ^
1 error generated.
gmake: *** [<builtin>: zhparser.o] Error 1

How to support version 11?

test=# create extension zhparser ;
ERROR: incompatible library "/usr/pgsql-11/lib/zhparser.so": version mismatch
DETAIL: Server is version 11, library is version 9.2.

[Feature Request] Release to pgxn

Hi,

Is there any plan to release it on pgxn? Currently it is a bit difficult to install via all those commands, will be much better if it is available on pgxn.

Thanks!

创建扩展时出错

操作系统: ubuntu
PG版本:9.2
postgres=# create extension zhparser;
ERROR: could not load library "/opt/PostgreSQL/9.2/lib/postgresql/zhparser.so": /opt/Post
greSQL/9.2/lib/postgresql/zhparser.so: undefined symbol: palloc

Mac 10.10 下编译出错

clang -I/usr/local/Cellar/ossp-uuid/1.6.2/include -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv  -bundle -multiply_defined suppress -o zhparser.so zhparser.o -L/usr/local/Cellar/postgresql/9.3.5/lib -L/usr/local/Cellar/ossp-uuid/1.6.2/lib  -Wl,-dead_strip_dylibs   -lscws -L/usr/local/lib -Wl,-rpath=/usr/local/lib -bundle_loader /usr/local/Cellar/postgresql/9.3.5/bin/postgres
ld: unknown option: -rpath=/usr/local/lib
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [zhparser.so] Error 1

出现了特定文字不能to_tsvector分词的情况

[SQL]select to_tsvector('chinese'::regconfig, '自从用了液体,别的再也用不了了,太好用,毫无感觉,超级好用,大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱大爱')

[Err] ERROR: invalid byte sequence for encoding "UTF8": 0xe5 0xa4

这种连续2个字不断重复的内容,都会报这类错误,导致GIN索引不能建立。(chinese就是例子上的testzhcfg)

create extension zhparser在有表空间的情况下失败

localhost fimatrix@cnstock=# create extension zhparser;
ERROR: could not open file "/home/postgres/12/data/base/35282/zhprs_dict_cnstock.txt" for writing: No such file or directory
HINT: COPY TO instructs the PostgreSQL server process to write a file. You may want a client-side facility such as psql's \copy.
CONTEXT: SQL statement "copy (select word, tf, idf, attr from zhparser.zhprs_custom_word) to '/home/postgres/12/data/base/35282/zhprs_dict_cnstock.txt' encoding 'utf8'"
PL/pgSQL function sync_zhprs_custom_word() line 17 at EXECUTE
在使用了表空间的情况有这个问题,我的数据库环境如:
localhost fimatrix@cnstock=# \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-------------+----------+----------+------------+------------+-----------------------+---------+-----------------+--------------------------------------------
cnstock | fimatrix | UTF8 | en_US.utf8 | en_US.utf8 | | 581 MB | tblspc_cnstock |
cnstock_old | fimatrix | UTF8 | en_US.utf8 | en_US.utf8 | | 443 MB | tblspc_cnstock |
fimatrix | fimatrix | UTF8 | en_US.utf8 | en_US.utf8 | | 12 MB | tblspc_fimatrix |
postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | | 8025 kB | pg_default | default administrative connection database
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +| 7825 kB | pg_default | unmodifiable empty database
| | | | | postgres=CTc/postgres | | |
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +| 7825 kB | pg_default | default template for new databases
| | | | | postgres=CTc/postgres | | |
(6 rows)

localhost fimatrix@cnstock=#
localhost fimatrix@cnstock=# \db List of tablespaces
Name | Owner | Location
-----------------+----------+-----------------------------------------------
pg_default | postgres |
pg_global | postgres |
tblspc_cnstock | fimatrix | /home/postgres/12/tablespaces/tblspc_cnstock
tblspc_fimatrix | fimatrix | /home/postgres/12/tablespaces/tblspc_fimatrix
(4 rows)

localhost fimatrix@cnstock=# select current_database(); current_database

cnstock
(1 row)

[postgres@dev ~]$ ll 12/tablespaces/tblspc_cnstock/PG_12_201909212/
total 68
drwx------. 2 postgres postgres 12288 Jun 7 17:10 16389
drwx------. 2 postgres postgres 24576 Jun 26 22:21 35282
drwx------. 2 postgres postgres 6 May 30 15:28 pgsql_tmp

搜索单个字失败

符合预期:

postgres=# SELECT to_tsvector('testzhcfg','水果') @@ to_tsquery('testzhcfg', '水果');
 ?column?
----------
 t
(1 row)

不符合预期:

postgres=# SELECT to_tsvector('testzhcfg','水果') @@ to_tsquery('testzhcfg', '果');
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored
 ?column?
----------
 f
(1 row)

冒昧请教个问题

select to_tsvector('testzhcfg','卫生纸巾') ;
得到的分词只有 卫生,和纸巾,期望得到卫生纸,应该怎么做呢。

postgresql 9.6编译失败

错误信息如下:

zhparser.c:15:27: fatal error: utils/varlena.h: No such file or directory
 #include "utils/varlena.h"
                           ^
compilation terminated.
<builtin>: recipe for target 'zhparser.o' failed
make: *** [zhparser.o] Error 1

google了一下,似乎是这个头文件只有10以上版本才支持?怎么解决啊?

Error: SSL SYSCALL error: EOF detected

当执行了

-- make test configuration using parser

CREATE TEXT SEARCH CONFIGURATION testzhcfg (PARSER = zhparser);

-- add token mapping

ALTER TEXT SEARCH CONFIGURATION testzhcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple;

之后,使用pgadmin重连数据库,点击相应schema都会出现这个错误。

text search configuration name "zhparser" must be schema-qualified

创建触发器,但是插入语句跑错了

  • text search configuration name "zhparser" must be schema-qualified
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS zhparser;
CREATE TEXT SEARCH CONFIGURATION zhparser (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION zhparser ADD MAPPING FOR n,v,a,i,e,l WITH simple;
CREATE TABLE new_table (
    id varchar(32) primary key,
    title varchar(50) not null,
    content text not null,
    title_content_full_text tsvector  -- 新建一列,存储to_tsvector函数的内容,再对该列建gin索引
);
CREATE INDEX new_table_title_content_filed_full_text_index ON new_table USING gin(title_content_full_text);
-- 创建触发器,每次更新数据时自动更新索引
CREATE TRIGGER new_table_title_content_full_text_index_trigger BEFORE INSERT OR UPDATE ON new_table FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(title_content_full_text, 'zhparser', title);
INSERT INTO "public"."new_table"("id", "title", "content") VALUES ('1', '搜索', '内容');

自定义词库 to_tsquery to_tsvector 自定义词都会丢掉

自定义词库 2.1 遇到个问题,不知道如何解决 to_tsvector to_tsquery 自定义词都会丢掉,只有ts_parse 是正常的
例如 定义“石业”这个词

select ts_parse('zhparser','金时石业');
(110,金时)
(120,石业)

SELECT to_tsquery('zhcfg', '金时石业');
'金时'

求教

Makefile:17: /usr/pgsql-9.5/lib/pgxs/src/makefiles/pgxs.mk: No such file or directory
gmake: *** No rule to make target `/usr/pgsql-9.5/lib/pgxs/src/makefiles/pgxs.mk'. Stop.

编译安装后,不能导入

#./bin/gd_config --libdir
/home/greatdbdata/greatdb_p/lib

#ll /home/greatdbdata/greatdb_p/lib/zhparser.so
/home/greatdbdata/greatdb_p/lib/zhparser.so

#/home/greatdbdata/greatdb_p/bin/psql -Ubenchmarksql -p5432 -h127.0.0.1 -d great_db
psql (11.4)
Type "help" for help.

great_db=# CREATE EXTENSION zhparser;
ERROR: could not access file "$libdir/zhparser": 没有那个文件或目录
great_db=# CREATE EXTENSION zhparser;
ERROR: could not access file "$libdir/zhparser": 没有那个文件或目录
great_db=# exit

咨询自定义词库

如果修改了词库,是否要重建索引?
能否定义zhparser使用的内存大小?
如果有几个自定义的词库,单个词库都是几十万个词,词库之间有少量单词重复,性能没问题吧?

下划线的处理

默认是不支持 _ 切词的,我在 postgres.conf 开启了 zhparser.multi_duality=on 选项后,切得的词包括原始文字,不知为何。

vt=# SELECT * FROM ts_parse('zhparser', 'hello world');
 tokid | token
-------+-------
   101 | hello
   101 | world
(2 rows)

vt=# SELECT * FROM ts_parse('zhparser', 'hello_world');
 tokid |    token
-------+-------------
   101 | hello_world
   101 | hello
   101 | world
(3 rows)

yum install clang之后重新make,还是报错

yum install clang之后重新make,还是报错
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -I/usr/local/include/scws -I. -I./ -I/usr/pgsql-11/include/server -I/usr/pgsql-11/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -c -o zhparser.o zhparser.c
zhparser.c: In function ‘init’:
zhparser.c:223:6: warning: implicit declaration of function ‘SplitIdentifierString’ [-Wimplicit-function-declaration]
if(!SplitIdentifierString(extra_dicts,',',&elemlist)){
^
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -shared -o zhparser.so zhparser.o -L/usr/pgsql-11/lib -Wl,--as-needed -L/usr/lib64/llvm5.0/lib -L/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/usr/pgsql-11/lib',--enable-new-dtags -lscws -L/usr/local/lib -Wl,-rpath -Wl,/usr/local/lib
/opt/rh/llvm-toolset-7/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I/usr/local/include/scws -I. -I./ -I/usr/pgsql-11/include/server -I/usr/pgsql-11/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -flto=thin -emit-llvm -c -o zhparser.bc zhparser.c
make: /opt/rh/llvm-toolset-7/root/usr/bin/clang: Command not found
make: *** [zhparser.bc] Error 127

Originally posted by @zyd in #29 (comment)

CREATE EXTENSION zhparser 出错( CentOS (core) 7.7.1908 x64, PostgreSQL 12.1)

编译和安装 zhparser

命令:PG_CONFIG=/usr/pgsql-12/bin/pg_config make && make install
Makefile:20: /usr/pgsql-12/lib/pgxs/src/makefiles/pgxs.mk: No such file or directory
make: *** No rule to make target `/usr/pgsql-12/lib/pgxs/src/makefiles/pgxs.mk'. Stop.

然后,我试了:
命令:yum install postgresql-server-dev-all
No package postgresql-server-dev-all available.
命令:yum install postgresql-common
No package postgresql-common available.

试了很多办法,都不解决问题。我的系统信息: CentOS (core) 7.7.1908 x64, PostgreSQL 12.1

can't update to 2.1 version

Hi there:

I have zhparser of version 1.0, and I exec the command in psql: alter extension zhparser update, but I only get the NOTICE: NOTICE: version "1.0" of extension "zhparser" is already installed.

How can I upgrade to version 2.1?

无法为postgresql-13编译版本

export PG_CONFIG=/usr/lib/postgresql/13/bin/pg_config
make & make install

错误如下:

r.o zhparser.c
zhparser.c:10:10: fatal error: postgres.h: No such file or directory
 #include "postgres.h"
          ^~~~~~~~~~~~
compilation terminated.
<builtin>: recipe for target 'zhparser.o' failed

添加字典不起作用

postgres12,zhparser2.1 拿dict_extra.txt作例子,放到tsearch_data目录和base/数据库ID下都不管用。

windows VS编译错误

把zhparser.h zhparser.c加入到新的dll工程(VS2010,postgre 13,scws-1.2.3), 添加相应的头文件和lib后编译,提示错误如下,该如何是好:
1>d:\program files\postgresql\13\include\server\port.h(41): error C2061: 语法错误: 标识符“pg_set_noblock”
1>d:\program files\postgresql\13\include\server\port.h(41): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(41): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(42): error C2061: 语法错误: 标识符“pg_set_block”
1>d:\program files\postgresql\13\include\server\port.h(42): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(42): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(46): error C2061: 语法错误: 标识符“has_drive_prefix”
1>d:\program files\postgresql\13\include\server\port.h(46): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(46): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(55): error C2061: 语法错误: 标识符“path_contains_parent_reference”
1>d:\program files\postgresql\13\include\server\port.h(55): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(55): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(56): error C2061: 语法错误: 标识符“path_is_relative_and_below_cwd”
1>d:\program files\postgresql\13\include\server\port.h(56): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(56): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(57): error C2061: 语法错误: 标识符“path_is_prefix_of_path”
1>d:\program files\postgresql\13\include\server\port.h(57): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(57): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(71): error C2061: 语法错误: 标识符“get_home_path”
1>d:\program files\postgresql\13\include\server\port.h(71): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(71): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(218): error C2146: 语法错误: 缺少“)”(在标识符“echo”的前面)
1>d:\program files\postgresql\13\include\server\port.h(218): error C2081: “_Bool”: 形参表中的名称非法
1>d:\program files\postgresql\13\include\server\port.h(218): error C2061: 语法错误: 标识符“echo”
1>d:\program files\postgresql\13\include\server\port.h(218): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(218): error C2059: 语法错误:“)”
1>d:\program files\postgresql\13\include\server\port.h(259): error C2061: 语法错误: 标识符“pgwin32_is_junction”
1>d:\program files\postgresql\13\include\server\port.h(259): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(259): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(265): error C2061: 语法错误: 标识符“rmtree”
1>d:\program files\postgresql\13\include\server\port.h(265): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(265): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(487): error C2146: 语法错误: 缺少“)”(在标识符“write_message”的前面)
1>d:\program files\postgresql\13\include\server\port.h(487): error C2081: “_Bool”: 形参表中的名称非法
1>d:\program files\postgresql\13\include\server\port.h(487): error C2061: 语法错误: 标识符“write_message”
1>d:\program files\postgresql\13\include\server\port.h(487): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(487): error C2059: 语法错误:“)”
1>d:\program files\postgresql\13\include\server\port.h(498): error C2061: 语法错误: 标识符“pg_strong_random”
1>d:\program files\postgresql\13\include\server\port.h(498): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(498): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(521): error C2061: 语法错误: 标识符“wait_result_is_signal”
1>d:\program files\postgresql\13\include\server\port.h(521): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(521): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\port.h(522): error C2061: 语法错误: 标识符“wait_result_is_any_signal”
1>d:\program files\postgresql\13\include\server\port.h(522): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\port.h(522): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\utils\elog.h(149): error C2061: 语法错误: 标识符“errstart”
1>d:\program files\postgresql\13\include\server\utils\elog.h(149): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(149): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\utils\elog.h(191): error C2146: 语法错误: 缺少“)”(在标识符“hide_stmt”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(191): error C2061: 语法错误: 标识符“hide_stmt”
1>d:\program files\postgresql\13\include\server\utils\elog.h(191): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(191): error C2059: 语法错误:“)”
1>d:\program files\postgresql\13\include\server\utils\elog.h(192): error C2146: 语法错误: 缺少“)”(在标识符“hide_ctx”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(192): error C2061: 语法错误: 标识符“hide_ctx”
1>d:\program files\postgresql\13\include\server\utils\elog.h(192): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(192): error C2059: 语法错误:“)”
1>d:\program files\postgresql\13\include\server\utils\elog.h(354): error C2061: 语法错误: 标识符“_Bool”
1>d:\program files\postgresql\13\include\server\utils\elog.h(355): error C2061: 语法错误: 标识符“output_to_client”
1>d:\program files\postgresql\13\include\server\utils\elog.h(355): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(356): error C2061: 语法错误: 标识符“show_funcname”
1>d:\program files\postgresql\13\include\server\utils\elog.h(356): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(357): error C2061: 语法错误: 标识符“hide_stmt”
1>d:\program files\postgresql\13\include\server\utils\elog.h(357): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(358): error C2061: 语法错误: 标识符“hide_ctx”
1>d:\program files\postgresql\13\include\server\utils\elog.h(358): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(384): error C2059: 语法错误:“}”
1>d:\program files\postgresql\13\include\server\utils\elog.h(387): error C2143: 语法错误 : 缺少“{”(在“”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(388): error C2143: 语法错误 : 缺少“)”(在“
”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(388): error C2143: 语法错误 : 缺少“{”(在“”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(388): error C2059: 语法错误:“)”
1>d:\program files\postgresql\13\include\server\utils\elog.h(390): error C2143: 语法错误 : 缺少“)”(在“
”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(390): error C2143: 语法错误 : 缺少“{”(在“”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(390): error C2059: 语法错误:“)”
1>d:\program files\postgresql\13\include\server\utils\elog.h(391): error C2143: 语法错误 : 缺少“)”(在“
”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(391): error C2143: 语法错误 : 缺少“{”(在“”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(391): error C2059: 语法错误:“)”
1>d:\program files\postgresql\13\include\server\utils\elog.h(397): error C2143: 语法错误 : 缺少“)”(在“
”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(397): error C2143: 语法错误 : 缺少“{”(在“*”的前面)
1>d:\program files\postgresql\13\include\server\utils\elog.h(397): error C2059: 语法错误:“)”
1>d:\program files\postgresql\13\include\server\utils\elog.h(398): error C2061: 语法错误: 标识符“emit_log_hook”
1>d:\program files\postgresql\13\include\server\utils\elog.h(398): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(414): error C2061: 语法错误: 标识符“syslog_sequence_numbers”
1>d:\program files\postgresql\13\include\server\utils\elog.h(414): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(415): error C2061: 语法错误: 标识符“syslog_split_messages”
1>d:\program files\postgresql\13\include\server\utils\elog.h(415): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(426): error C2061: 语法错误: 标识符“in_error_recursion_trouble”
1>d:\program files\postgresql\13\include\server\utils\elog.h(426): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\utils\elog.h(426): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\postgres.h(380): error C2061: 语法错误: 标识符“_Bool”
1>d:\program files\postgresql\13\include\server\postgres.h(382): error C2059: 语法错误:“}”
1>d:\program files\postgresql\13\include\server\pgtime.h(57): error C2061: 语法错误: 标识符“pg_interpret_timezone_abbrev”
1>d:\program files\postgresql\13\include\server\pgtime.h(57): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\pgtime.h(57): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\pgtime.h(62): error C2061: 语法错误: 标识符“pg_get_timezone_offset”
1>d:\program files\postgresql\13\include\server\pgtime.h(62): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\pgtime.h(62): error C2059: 语法错误:“类型”
1>d:\program files\postgresql\13\include\server\pgtime.h(64): error C2061: 语法错误: 标识符“pg_tz_acceptable”
1>d:\program files\postgresql\13\include\server\pgtime.h(64): error C2059: 语法错误:“;”
1>d:\program files\postgresql\13\include\server\pgtime.h(64): error C2059: 语法错误:“类型”

./check.sh

1a2

psql:sql/zhparser.sql:1: ERROR: could not access file "$libdir/zhparser": No such file or directory
3a5
psql:sql/zhparser.sql:5: ERROR: text search parser "zhparser" does not exist
4a7
psql:sql/zhparser.sql:7: ERROR: text search configuration "testzhcfg" does not exist
7,81c10

在 mac os x 10.8.2 中编译时 Makefile 中遇到的小问题

OS: Mac OS X 10.8.2

$ clang -v
Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix

现象:根据说进行编译,make 时报错
ld: unknown option: --rpath
clang: error: linker command failed with exit code 1 (use -v to see invocation)

结果:将 Makefile 文件第13行的 --rpath 改为 -rpath 后,编译通过。

请问可以支持多词典吗?

看了一下,zhparser代码中使用的是scws_set_dict,由于我们现在需要自定义一些词,所以不知道是否可以支持多词典?如果不支持的话只能每次都重新生成xdb文件才行。

谢谢

How to exclude dot from separators?

Hello!
How to exclude dot and % sign from separators?
SELECT token FROM ts_parse('testzhcfg', '肖特基整流 99.02.7%060.98')

token
肖特基
整流
99
.
02
.
7
%
060.98

Expected:

token
肖特基
整流
99.02.7%060.98

Thank you.

请指定版本号

多谢此软件包,我准备使用。针对zhparser的安装,我准备使用我自己写的包管理器STARMAN来安装,但是zhparser不提供最新的版本号很难管理。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.