Giter VIP home page Giter VIP logo

befaq's People

Contributors

xiaoyichao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

befaq's Issues

如何增量更新?

假如问答数据有几十万条,每次训练都需要执行很长的时间,是否可以增量更新,增量训练?

大佬 这是不是我 es的问题 没有连接上

想要删除的索引 index_faq_1 不存在
Traceback (most recent call last):
File "/Users/hellozhang/Desktop/BEFAQ/es/write_data2es.py", line 90, in
es_faq.create_index(index_name=new_index)
File "/Users/hellozhang/Desktop/BEFAQ/es/es_operate.py", line 253, in create_index
self.es.indices.create(index=index_name, body=mappings_cn)
File "/Users/hellozhang/opt/anaconda3/envs/fqa/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 92, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/Users/hellozhang/opt/anaconda3/envs/fqa/lib/python3.6/site-packages/elasticsearch/client/indices.py", line 103, in create
"PUT", _make_path(index), params=params, headers=headers, body=body
File "/Users/hellozhang/opt/anaconda3/envs/fqa/lib/python3.6/site-packages/elasticsearch/transport.py", line 362, in perform_request
timeout=timeout,
File "/Users/hellozhang/opt/anaconda3/envs/fqa/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 252, in perform_request
self._raise_error(response.status, raw_data)
File "/Users/hellozhang/opt/anaconda3/envs/fqa/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 282, in _raise_error
status_code, error_message, additional_info
elasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', 'Custom Analyzer [text_ik] failed to find tokenizer under name [ik_smart]')

sentense-bert如何应用

请问您是如何利用sentense-bert生成词向量的?是在自己的问答对(正例、反例)数据集上进行微调后保存model,接着再用生成的模型生成词向量的吗?

sentence-bert

请问sentence-bert 用在 BEFAQ的架构图里面的那一部分?

python3.6.16 docker中各种报编码问题

读es.ini , 读sheetname.conf ,读excel文件,中文文件名都的报unicode错误,前两个我在读配置文件时候指定编码是utf-8解决了,excel这个指定编码都不行,最后是把文件改名了才解决

docker中无法启动main_faq.py

进入docker后 cd 进入 src 文件夹下,
nohup python -u main_faq.py > "logs/log$(date +"%Y-%m-%d-%H").txt" 2>&1 &
会显示如下错误
image

但是associative_questions_server.py这个服务可以正常启动,调用接口也正常
image

请问有可能是什么导致main_faq无法启动呢?

最新的版本上有一些 error code

def search_annoy(self, owner_name, question, num=5):
    '''
    Author: xiaoyichao
    param {type}
    Description: 使用Annoy 召回
    '''
    sentences = read_vec2bin.read_bert_sents(owner_name=owner_name)
    annoy_index_path = os.path.join(
        dir_name, '../es/search_model/%s_annoy.index' % owner_name)
    **encodearrary = self.sentenceBERT.get_bert([question])**
    tc_index = AnnoyIndex(f=512, metric='angular')
    tc_index.load(annoy_index_path)
    items = tc_index.get_nns_by_vector(
        encodearrary[0], num, include_distances=True)
    sim_questions = [sentences[num_annoy] for num_annoy in items[0]]
    # sims = items[1]
    # index_nums = items[0]
    return sim_questions

def search_faiss(self, owner_name, question, num=5):
    '''
    Author: xiaoyichao
    param {type}
    Description: 使用Faiss 召回
    '''
    sentences = read_vec2bin.read_bert_sents(owner_name=owner_name)
    faiss_index_path = os.path.join(
        dir_name, '../es/search_model/%s_faiss.index' % owner_name)
    index = faiss.read_index(faiss_index_path)
    **question_vec = np.array(bc.encode([question])).astype('float32')**
    index.nprobe = 1
    sims, index_nums = index.search(question_vec, num)
    sim_questions = [sentences[num_faiss] for num_faiss in index_nums[0]]
    # index_nums = index_nums[0].tolist()
    # sims = sims[0].tolist()
    return sim_questions

使用ES7.10.1召回时报错

Traceback (most recent call last):
File "/Users/cgxw/miniconda3/envs/faq/lib/python3.7/site-packages/sanic/app.py", line 937, in handle_request
response = await response
File "/Users/cgxw/PycharmProjects/pythonProject1/FAQ/BEFAQ/faq/main_faq.py", line 93, in myfaq
owner_name=owner_name, question=orgin_query, query_word_list=query_word_list, use_faiss=use_faiss, use_annoy=use_annoy, engine_limit_num=engine_num, ES_limit_num=ES_num, use_other_when_es_none=use_other_when_es_none)
File "/Users/cgxw/PycharmProjects/pythonProject1/FAQ/BEFAQ/faq/retrieval_es.py", line 200, in search_merge
owner_name=owner_name, query_word_list=query_word_list, ES_limit_num=ES_limit_num)
File "/Users/cgxw/PycharmProjects/pythonProject1/FAQ/BEFAQ/faq/retrieval_es.py", line 63, in search_es
index_name=index_name, owner_name=owner_name, query_word_list=query_word_list, limit_num=ES_limit_num)
File "/Users/cgxw/PycharmProjects/pythonProject1/FAQ/BEFAQ/es/es_operate.py", line 332, in search_data
index=index_name, body=doc)
File "/Users/cgxw/miniconda3/envs/faq/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 92, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/Users/cgxw/miniconda3/envs/faq/lib/python3.7/site-packages/elasticsearch/client/init.py", line 1627, in search
body=body,
File "/Users/cgxw/miniconda3/envs/faq/lib/python3.7/site-packages/elasticsearch/transport.py", line 362, in perform_request
timeout=timeout,
File "/Users/cgxw/miniconda3/envs/faq/lib/python3.7/site-packages/elasticsearch/connection/http_urllib3.py", line 248, in perform_request
self._raise_error(response.status, raw_data)
File "/Users/cgxw/miniconda3/envs/faq/lib/python3.7/site-packages/elasticsearch/connection/base.py", line 244, in _raise_error
status_code, error_message, additional_info
elasticsearch.exceptions.RequestError: RequestError(400, 'x_content_parse_exception', '[1:27] [bool] failed to parse field [must]')
INFO:sanic.access:

分词查询问题

你好,我想问下,搜索查询的时候使用的结巴对query进行分词、去停用词,然后match处理过的process_question,但是es的分词用的是IK,这样是不是有问题的呀?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.