Giter VIP home page Giter VIP logo

infiniflow / ragflow Goto Github PK

View Code? Open in Web Editor NEW
6.2K 40.0 525.0 37.75 MB

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Home Page: https://ragflow.io

License: Apache License 2.0

Python 60.15% TypeScript 37.26% JavaScript 0.10% Less 2.24% Dockerfile 0.03% Shell 0.21%
document-understanding llm ocr rag table-structure-recognition data-pipelines deep-learning document-parser information-retrieval llmops

ragflow's Introduction

English | 简体中文 | 日本語

Static Badge docker pull infiniflow/ragflow:v0.2.0 license

💡 What is RAGFlow?

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

🌟 Key Features

🍭 "Quality in, quality out"

  • Deep document understanding-based knowledge extraction from unstructured data with complicated formats.
  • Finds "needle in a data haystack" of literally unlimited tokens.

🍱 Template-based chunking

  • Intelligent and explainable.
  • Plenty of template options to choose from.

🌱 Grounded citations with reduced hallucinations

  • Visualization of text chunking to allow human intervention.
  • Quick view of the key references and traceable citations to support grounded answers.

🍔 Compatibility with heterogeneous data sources

  • Supports Word, slides, excel, txt, images, scanned copies, structured data, web pages, and more.

🛀 Automated and effortless RAG workflow

  • Streamlined RAG orchestration catered to both personal and large businesses.
  • Configurable LLMs as well as embedding models.
  • Multiple recall paired with fused re-ranking.
  • Intuitive APIs for seamless integration with business.

📌 Latest Features

  • 2024-04-16 Add an embedding model 'bce-embedding-base_v1' from BCEmbedding.
  • 2024-04-16 Add FastEmbed, which is designed specifically for light and speedy embedding.
  • 2024-04-11 Support Xinference for local LLM deployment.
  • 2024-04-10 Add a new layout recognization model for analyzing Laws documentation.
  • 2024-04-08 Support Ollama for local LLM deployment.
  • 2024-04-07 Support Chinese UI.

🔎 System Architecture

🎬 Get Started

📝 Prerequisites

  • CPU >= 2 cores
  • RAM >= 8 GB
  • Docker >= 24.0.0 & Docker Compose >= v2.26.1

    If you have not installed Docker on your local machine (Windows, Mac, or Linux), see Install Docker Engine.

🚀 Start up the server

  1. Ensure vm.max_map_count >= 262144 (more):

    To check the value of vm.max_map_count:

    $ sysctl vm.max_map_count

    Reset vm.max_map_count to a value at least 262144 if it is not.

    # In this case, we set it to 262144:
    $ sudo sysctl -w vm.max_map_count=262144

    This change will be reset after a system reboot. To ensure your change remains permanent, add or update the vm.max_map_count value in /etc/sysctl.conf accordingly:

    vm.max_map_count=262144
  2. Clone the repo:

    $ git clone https://github.com/infiniflow/ragflow.git
  3. Build the pre-built Docker images and start up the server:

    $ cd ragflow/docker
    $ chmod +x ./entrypoint.sh
    $ docker compose up -d

    The core image is about 9 GB in size and may take a while to load.

  4. Check the server status after having the server up and running:

    $ docker logs -f ragflow-server

    The following output confirms a successful launch of the system:

        ____                 ______ __
       / __ \ ____ _ ____ _ / ____// /____  _      __
      / /_/ // __ `// __ `// /_   / // __ \| | /| / /
     / _, _// /_/ // /_/ // __/  / // /_/ /| |/ |/ /
    /_/ |_| \__,_/ \__, //_/    /_/ \____/ |__/|__/
                  /____/
    
     * Running on all addresses (0.0.0.0)
     * Running on http://127.0.0.1:9380
     * Running on http://x.x.x.x:9380
     INFO:werkzeug:Press CTRL+C to quit
  5. In your web browser, enter the IP address of your server and log in to RAGFlow.

    In the given scenario, you only need to enter http://IP_OF_YOUR_MACHINE (sans port number) as the default HTTP serving port 80 can be omitted when using the default configurations.

  6. In service_conf.yaml, select the desired LLM factory in user_default_llm and update the API_KEY field with the corresponding API key.

    See ./docs/llm_api_key_setup.md for more information.

    The show is now on!

🔧 Configurations

When it comes to system configurations, you will need to manage the following files:

You must ensure that changes to the .env file are in line with what are in the service_conf.yaml file.

The ./docker/README file provides a detailed description of the environment settings and service configurations, and you are REQUIRED to ensure that all environment settings listed in the ./docker/README file are aligned with the corresponding configurations in the service_conf.yaml file.

To update the default HTTP serving port (80), go to docker-compose.yml and change 80:80 to <YOUR_SERVING_PORT>:80.

Updates to all system configurations require a system reboot to take effect:

$ docker-compose up -d

🛠️ Build from source

To build the Docker images from source:

$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
$ docker build -t infiniflow/ragflow:v0.2.0 .
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d

📚 Documentation

📜 Roadmap

See the RAGFlow Roadmap 2024

🏄 Community

🙌 Contributing

RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community. If you would like to be a part, review our Contribution Guidelines first.

ragflow's People

Contributors

anush008 avatar bxb100 avatar carson-hold avatar cike8899 avatar eltociear avatar houkensjtu avatar jinhai-cn avatar kevinhush avatar kkould avatar ooooo-create avatar writinwaters avatar yangjie407 avatar yangqianjuan avatar yc-huang avatar yingfeng avatar ysyx2008 avatar zhanwenzhuo-github avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ragflow's Issues

[Question]: Can't update token usage for ***/EMBEDDING

Describe your problem

  • The info in ERROR.log as follows:
Fail put 29f4f2dcf21b11ee97630242c0a80006/AcademicGPT.pdf: S3 operation failed; code: NoSuchBucket, message: The specified bucket does not exist, resource: /29f4f2dcf21b11ee97630242c0a80006, request_id: 17C2EC907DD3BC46, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 29f4f2dcf21b11ee97630242c0a80006
Can't update token usage for d11309c4f0c111eea3da0242ac150005/EMBEDDING
Object of type ndarray is not JSON serializable
Traceback (most recent call last):
  File "/ragflow/api/apps/conversation_app.py", line 172, in completion
    ans = chat(dia, msg, **req)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/api/apps/conversation_app.py", line 215, in chat
    kbinfos = retrievaler.retrieval(" ".join(questions), embd_mdl, dialog.tenant_id, dialog.kb_ids, 1, dialog.top_n,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/nlp/search.py", line 314, in retrieval
    sres = self.search(req, index_name(tenant_id), embd_mdl)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/nlp/search.py", line 115, in search
    es_logger.info("【Q】: {}".format(json.dumps(s)))
                                    ^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/ragflow/api/utils/__init__.py", line 128, in default
    return json.JSONEncoder.default(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ndarray is not JSON serializable
Can't update token usage for d11309c4f0c111eea3da0242ac150005/EMBEDDING
Object of type ndarray is not JSON serializable

How can I solved this?

change language

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

change language

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Question]: Dockerfile for ragflow-base image?

Describe your problem

Building the project truly from source would involve building all resources, including base image. Any chance ragflow-base dockerfile could be included in repo?

a problem while trying to process a search request in Elasticsearch

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

d0a1ffe

Other environment information

This happens on both demo site and a local deployment instance.

Actual behavior

on the page: https://demo.ragflow.io/knowledge/dataset?id=<...>
After added a dataset, and try to add text chunks to the dataset via the UI interface, the following error message is encoutered:
Possible issue is that the field 'create_time' in your index ragflow_15b4f374f2e011eeae1b0242ac180006 is a text field, and operations like sorting or aggregating require field data. However, field data is disabled by default on text fields to optimize performance.

BadRequestError(
"search_phase_execution_exception",
meta=ApiResponseMeta(
status=400,
http_version="1.1",
headers={
"X-elastic-product": "Elasticsearch",
"content-type": "application/vnd.elasticsearch+json;compatible-with=8",
"content-length": "2231",
},
duration=0.0018017292022705078,
node=NodeConfig(
scheme="http",
host="es01",
port=9200,
path_prefix="",
headers={
"user-agent": "elasticsearch-py/8.12.1 (Python/3.11.0; elastic-transport/8.12.0)"
},
connections_per_node=10,
request_timeout=10.0,
http_compress=False,
verify_certs=True,
ca_certs=None,
client_cert=None,
client_key=None,
ssl_assert_hostname=None,
ssl_assert_fingerprint=None,
ssl_version=None,
ssl_context=None,
ssl_show_warn=True,
_extras={},
),
),
body={
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": True,
"failed_shards": [
{
"shard": 0,
"index": "ragflow_15b4f374f2e011eeae1b0242ac180006",
"node": "90aM0LzhTSqdYA-X6yX5mg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
},
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on [create_time] in [ragflow_15b4f374f2e011eeae1b0242ac180006]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [create_time] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
},
},
},
"status": 400,
},
)

Expected behavior

No response

Steps to reproduce

Add a new dataset via the WebUI (successful)
Add a new chunk to the newly created dataset (error).

This happens on both official demo site and a local deployment testing environment.

Additional information

No response

[Question]: Redundant database.log

Describe your problem

We've observed that the size of the database.log file increases rapidly, reaching gigabytes in a very short span of time. By using tail -f to monitor the file, we noticed it generates numerous entries similar to the ones below. Is there a way to suppress these logs?

Returning 140199553734352 to pool.
('SELECT t1.id, t1.doc_id, t1.from_page, t1.to_page, t2.kb_id, t2.parser_id, t2.parser_config, t2.name, t2.type, t2.location, t2.size, t3.tenant_id, t3.language, t4.embd_id, t4.img2txt_id, t4.asr_id, t1.update_time FROM task AS t1 INNER JOIN document AS t2 ON (t1.doc_id = t2.id) INNER JOIN knowledgebase AS t3 ON (t2.kb_id = t3.id) INNER JOIN tenant AS t4 ON (t3.tenant_id = t4.id) WHERE ((((((t2.status = %s) AND (t2.run = %s)) AND NOT (t2.type = %s)) AND (t1.progress = %s)) AND (t1.update_time >= %s)) AND ((t1.create_time %% %s) = %s)) ORDER BY t1.update_time ASC LIMIT %s OFFSET %s', ['1', '1', 'virtual', 0.0, 1712592641735, 2, 1, 64, 0])
('SELECT t1.id, t1.doc_id, t1.from_page, t1.to_page, t2.kb_id, t2.parser_id, t2.parser_config, t2.name, t2.type, t2.location, t2.size, t3.tenant_id, t3.language, t4.embd_id, t4.img2txt_id, t4.asr_id, t1.update_time FROM task AS t1 INNER JOIN document AS t2 ON (t1.doc_id = t2.id) INNER JOIN knowledgebase AS t3 ON (t2.kb_id = t3.id) INNER JOIN tenant AS t4 ON (t3.tenant_id = t4.id) WHERE ((((((t2.status = %s) AND (t2.run = %s)) AND NOT (t2.type = %s)) AND (t1.progress = %s)) AND (t1.update_time >= %s)) AND ((t1.create_time %% %s) = %s)) ORDER BY t1.update_time ASC LIMIT %s OFFSET %s', ['1', '1', 'virtual', 0.0, 0, 2, 0, 64, 0])
Returning 139777349632080 to pool.
Returning 140199553734352 to pool.
('SELECT t1.id, t1.doc_id, t1.from_page, t1.to_page, t2.kb_id, t2.parser_id, t2.parser_config, t2.name, t2.type, t2.location, t2.size, t3.tenant_id, t3.language, t4.embd_id, t4.img2txt_id, t4.asr_id, t1.update_time FROM task AS t1 INNER JOIN document AS t2 ON (t1.doc_id = t2.id) INNER JOIN knowledgebase AS t3 ON (t2.kb_id = t3.id) INNER JOIN tenant AS t4 ON (t3.tenant_id = t4.id) WHERE ((((((t2.status = %s) AND (t2.run = %s)) AND NOT (t2.type = %s)) AND (t1.progress = %s)) AND (t1.update_time >= %s)) AND ((t1.create_time %% %s) = %s)) ORDER BY t1.update_time ASC LIMIT %s OFFSET %s', ['1', '1', 'virtual', 0.0, 1712592641735, 2, 1, 64, 0])
('SELECT t1.id, t1.doc_id, t1.from_page, t1.to_page, t2.kb_id, t2.parser_id, t2.parser_config, t2.name, t2.type, t2.location, t2.size, t3.tenant_id, t3.language, t4.embd_id, t4.img2txt_id, t4.asr_id, t1.update_time FROM task AS t1 INNER JOIN document AS t2 ON (t1.doc_id = t2.id) INNER JOIN knowledgebase AS t3 ON (t2.kb_id = t3.id) INNER JOIN tenant AS t4 ON (t3.tenant_id = t4.id) WHERE ((((((t2.status = %s) AND (t2.run = %s)) AND NOT (t2.type = %s)) AND (t1.progress = %s)) AND (t1.update_time >= %s)) AND ((t1.create_time %% %s) = %s)) ORDER BY t1.update_time ASC LIMIT %s OFFSET %s', ['1', '1', 'virtual', 0.0, 0, 2, 0, 64, 0])
Returning 140199553734352 to pool.
Returning 139777349632080 to pool.

[Feature Request]: Support for Apple Silicon Mac

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Hi.
It looks like the docker image and instructions are for Linux. I tried to run the docker compose on my M2 Mac, but I do get errors related to MySQL.

! mysql The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
 runtime: failed to create new OS thread (have 2 already; errno=22)


### Describe the feature you'd like

Apple Silicone support and clear instructions how to install in Mac

### Describe implementation you've considered

I tried to add     `platform: linux/amd64` to MySQL image deofinition in docker compose, but this didn't help.

### Documentation, adoption, use case

_No response_

### Additional information

_No response_

[Bug]: Parsing stuck at 0.62%

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

main

Other environment information

No response

Actual behavior

I signed up on the demo site and uploaded pdf and docx files. They are both stuck at 0.62% for over 10 minutes now and not moving.
image

Here is my config
image

Expected behavior

I would expect parsing to finish, I guess.

Steps to reproduce

1. Create an account on https://demo.ragflow.io/
2. Upload a document

Additional information

No response

[Bug]: WARNING: can't find /ragflow/rag/res/broker.tm

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

7a36d25

Other environment information

pop_os 22.04
docker 26.0
Intel i7-12800h
32gb

Actual behavior

(base) hitesh@whiskey:~/ragflow/docker$ docker logs -f ragflow-server
[HUQIE]:Build default trie
[HUQIE]:Build trie /ragflow/rag/res/huqie.txt
WARNING:root:Realtime synonym is disabled, since no redis connection.
[WARNING] Load term.freq FAIL!

____                 ______ __               

/ __ \ ____ _ ____ _ / // / _ __
/ // // __ // __ // / / // __ | | /| / /
/ , // // // // // / / // // /| |/ |/ /
/
/ || _,/ _, /// // _
/ |/|_/
/____/

ERROR:dashscope:Request: https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation failed, status: 401, message: Invalid API-key provided.
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

  • Running on all addresses (0.0.0.0)
  • Running on http://127.0.0.1:9380
  • Running on http://172.18.0.5:9380
    INFO:werkzeug:Press CTRL+C to quit
    WARNING:root:Realtime synonym is disabled, since no redis connection.
    [WARNING] Load term.freq FAIL!
    WARNING: can't find /ragflow/rag/res/broker.tm
    WARNING: can't find /ragflow/rag/res/broker.tm
    WARNING: can't find /ragflow/rag/res/broker.tm
    WARNING: can't find /ragflow/rag/res/broker.tm
    WARNING: can't find /ragflow/rag/res/broker.tm
    WARNING: can't find /ragflow/rag/res/broker.tm
    WARNING: can't find /ragflow/rag/res/broker.tm
    WARNING: can't find /ragflow/rag/res/broker.tm

Expected behavior

____                 ______ __

/ __ \ ____ _ ____ _ / // / _ __
/ // // __ // __ // / / // __ | | /| / /
/ , // // // // // / / // // /| |/ |/ /
/
/ || _,/ _, /// // _
/ |/|_/
/____/

Steps to reproduce

I followed the instructions for docker:

$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/docker
$ docker compose up -d

Additional information

No response

[Question]: Document process blocked at 80%

Describe your problem

local docker with latest image, the document process is blocked at 80%. LLM is ChatGLM and the API Key is set in the web ui.
Error log is:
image
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
ES create index error ragflow_0196ca84f5a111ee80170242ac150006 ----BadRequestError(400, 'resource_already_exists_exception', 'index [ragflow_0196ca84f5a111ee80170242ac150006/gHToUXxJSNSqdLG9Yo0mNA] already exists')
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '372033f2f5a111eea71b0242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '371b7de4f5a111eea0880242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '367e6c34f5a111ee9a840242ac150006'}}
Fail put 0beab266f5a111eeab0c0242ac150006/附件1:《好烤漆金牌造》销售工具话术.pdf: S3 operation failed; code: NoSuchKey, message: Object does not exist, resource: /0beab266f5a111eeab0c0242ac150006/%E9%99%84%E4%BB%B61%EF%BC%9A%E3%80%8A%E5%A5%BD%E7%83%A4%E6%BC%86%E9%87%91%E7%89%8C%E9%80%A0%E3%80%8B%E9%94%80%E5%94%AE%E5%B7%A5%E5%85%B7%E8%AF%9D%E6%9C%AF.pdf, request_id: 17C44D75219FE3C5, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 0beab266f5a111eeab0c0242ac150006, object_name: 附件1:《好烤漆金牌造》销售工具话术.pdf
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '9b962552f5a211eea4b10242ac150005'}}
Fail put 8568867ef5a411eebc050242ac150005/附件1:《好烤漆金牌造》销售工具话术.pdf: S3 operation failed; code: NoSuchBucket, message: The specified bucket does not exist, resource: /8568867ef5a411eebc050242ac150005, request_id: 17C44E3BAF64631C, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 8568867ef5a411eebc050242ac150005
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
ES updateByQuery deleteByQuery: ApiError(503, 'search_phase_execution_exception')【Q】:{'match': {'doc_id': '97e2aa3cf5a411eeac6d0242ac150005'}}
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING
Can't update token usage for 0196ca84f5a111ee80170242ac150006/EMBEDDING

What might be the error?

[Bug]: Empty excel file will raise exception

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

e3c24e6

Other environment information

No response

Actual behavior

None

Expected behavior

No response

Steps to reproduce

import an empty excel file into knowledge base.

Additional information

No response

[Bug]: Historical chats appear in the new user's chat box

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

d0a1ffe

Other environment information

No response

Actual behavior

I registered user yh, and have a chat as shown in pic1. After that, I registered a new user yh01 and found the history chat of user yh(pic2). I think it's a function bug.

pic1
截屏2024-04-08 15 59 10

pic2
截屏2024-04-08 15 59 29

Expected behavior

No response

Steps to reproduce

1.start the ragflow
2.config apikeys
3.registered user A and config model and start to chat
4.registered user B and start to chat

Additional information

By the way, there still some error message in ERROR.log as follows,

Fail put 77dd584cf57a11eebdea0242ac190005/LOMO.pdf: S3 operation failed; code: NoSuchBucket, message: The specified bucket does not exist, resource: /77dd584cf57a11eebdea0242ac190005, request_id: 17C43DDAB4AB4F18, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 77dd584cf57a11eebdea0242ac190005
Can't update token usage for c4c74360f54e11ee863b0242ac190005/EMBEDDING

[Feature Request]: Better local LLM support

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Local LLM, especially for LLAMA families should be easily integrated

Describe the feature you'd like

Support ollama

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

[Question]: API docs?

Describe your problem

Couldn't find any swagger api on first glance to use this locally with an external code base and just as a RAG engine.

如何溯源?

Describe your problem

请问如果精召回了3个chunk,都丢给大模型,最终是如何确认答案是基于哪个chunk回答的呢?

[Bug]: unable to connect to es01 cluster

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

7def208

Other environment information

No response

Actual behavior

tail -f -n 100 logs/rag/es.log

Elasticsearch version: (8, 12, 1)
Fail to connect to es: Connection error caused by: ConnectionError(Connection error caused by: NameResolutionError(<urllib3.connection.HTTPConnection object at 0x7ffa7bbce100>: Failed to resolve 'es01' ([Errno -3] Temporary failure in name resolution)))
Fail to connect to es: Connection error caused by: ConnectionError(Connection error caused by: NameResolutionError(<urllib3.connection.HTTPConnection object at 0x7ffa7bbce790>: Failed to resolve 'es01' ([Errno -3] Temporary failure in name resolution)))

Expected behavior

No response

Steps to reproduce

After running the docker

Additional information

No response

Index failure when parsing documents

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

c3b2d1

Other environment information

No response

Actual behavior

I test on https://demo.ragflow.io/, upload a pdf file. Index failure every time.

Page(13~25): [ERROR]Index failure!

Expected behavior

No response

Steps to reproduce

I test on https://demo.ragflow.io/,  upload a pdf file. Index failure every time.


Page(13~25): [ERROR]Index failure!

Additional information

No response

[Question]: Performance of OCR

Describe your problem

ragflow is integrating with the OCR model of InfiniFlow/deepdoc. what's the performance of the text extraction and table structure extraction compare with the commercial OCR tools such as the text extraction of Azure and Aws.

[Bug]: WARNING: can't find /ragflow/rag/res/broker.tm

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

c3b2a1

Other environment information

No response

Actual behavior

Continually print these warning

Expected behavior

No response

Steps to reproduce

The first time you startup the system by:
docker compose up

Additional information

No response

For any type of file, if the parsing method is general, the chunk token number needs to be displayed.

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

963533b

Other environment information

No response

Actual behavior

image

Expected behavior

For any type of file, if the parsing method is general, the chunk token number needs to be displayed.

Steps to reproduce

![image](https://github.com/infiniflow/ragflow/assets/8089971/640577f4-a7ad-4394-a22c-4ab4db336491)

Additional information

No response

[Bug]: Lost embedding model config in knowledgebase config

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

2673be8

Other environment information

No response

Actual behavior

As title describe

Expected behavior

No response

Steps to reproduce

Save knowledgebase configuration.
Load it again.
Embedding configuration dismissed.

Additional information

No response

[Bug]: Missing CONTRIBUTING.md

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

6cf0889

Other environment information

No response

Actual behavior

README.md mentions:

RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community. If you would like to be a part, review our Contribution Guidelines first.

But the main branch of ragflow does not contain CONTRIBUTING.md.

image

Expected behavior

CONTRIBUTING.md should exist.

Steps to reproduce

NA

Additional information

No response

[Bug]: Documents stop processing after uploading a PDF on demo.ragflow.io

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

36f2d7b

Other environment information

No response

Actual behavior

Documents stop processing after uploading a PDF on demo.ragflow.io

Expected behavior

No response

Steps to reproduce

Upload a PDF on demo.ragflow.io

Additional information

No response

[Question]: pymysql.err.ProgrammingError: (1146, "Table 'rag_flow.task' doesn't exist")

Describe your problem

image
The command "docker compose -f docker-compose-CN.yml up -d" can run normally, but when I execute the command " docker logs -f ragflow-server". The exception occurred. Has anyone encountered a similar situation before?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/ragflow/rag/svr/task_broker.py", line 180, in
dispatch()
File "/ragflow/rag/svr/task_broker.py", line 64, in dispatch
rows = collect(tm)
^^^^^^^^^^^
File "/ragflow/rag/svr/task_broker.py", line 38, in collect
docs = DocumentService.get_newly_uploaded(tm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3128, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/ragflow/api/db/services/document_service.py", line 101, in get_newly_uploaded
return list(docs.dicts())
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 7243, in iter
self.execute()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 2011, in inner
return method(self, database, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 2082, in execute
return self._execute(database)
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 2255, in _execute
cursor = database.execute(self)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3299, in execute
return self.execute_sql(sql, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3289, in execute_sql
with exception_wrapper:
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3059, in exit
reraise(new_type, new_type(exc_value, *exc_args), traceback)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 192, in reraise
raise value.with_traceback(tb)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3291, in execute_sql
cursor.execute(sql, params or ())
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
result = self._query(query)
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query
conn.query(q)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 558, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 822, in _read_query_result
result.read()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 1200, in read
first_packet = self.connection._read_packet()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 772, in _read_packet
packet.raise_for_error()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
peewee.ProgrammingError: (1146, "Table 'rag_flow.document' doesn't exist")
[WARNING] Load term.freq FAIL!
[WARNING] Load term.freq FAIL!
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 114044.52it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 26564.91it/s]
Traceback (most recent call last):
Traceback (most recent call last):
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3291, in execute_sql
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3291, in execute_sql
cursor.execute(sql, params or ())
cursor.execute(sql, params or ())
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
result = self._query(query)
result = self._query(query)
^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query
conn.query(q)
conn.query(q)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 558, in query
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 558, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 822, in _read_query_result
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 822, in _read_query_result
result.read()
result.read()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 1200, in read
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 1200, in read
first_packet = self.connection._read_packet()
first_packet = self.connection._read_packet()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 772, in _read_packet
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/connections.py", line 772, in _read_packet
packet.raise_for_error()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/protocol.py", line 221, in raise_for_error
packet.raise_for_error()
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
err.raise_mysql_exception(self._data)
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.ProgrammingError: (1146, "Table 'rag_flow.task' doesn't exist")

[Feature Request]: Hello, if possible, we'd like to customize responses when there's no relevant content in the knowledge base.

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

When a user asks a question and there's no relevant content in the knowledge base, we can reply with a custom message.

For example:
Knowledge base: 1. Professional knowledge
User input: Hello, xxxxxx?
[No relevant content found]
Assistant output: Sorry, I'm unable to answer your question. You can submit a ticket at https://xxx.com.

Describe the feature you'd like

When a user asks a question and there's no relevant content in the knowledge base, we can reply with a custom message.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

Refresh the login page and the language setting becomes invalid.

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

c829799

Other environment information

No response

Actual behavior

Refresh the login page and the language setting becomes invalid.

Expected behavior

No response

Steps to reproduce

Refresh the login page and the language setting becomes invalid.

Additional information

No response

All documents in the knowledge base cannot be selected if they have not been parsed.

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

080cbd9

Other environment information

No response

Actual behavior

image

image

Expected behavior

All documents in the knowledge base cannot be selected if they have not been parsed.

Steps to reproduce

![image](https://github.com/infiniflow/ragflow/assets/8089971/e580162d-149d-42ed-881d-7123beb35458)

![image](https://github.com/infiniflow/ragflow/assets/8089971/cea5d535-8613-4f3f-8cd7-b5b19d43ecea)

Additional information

No response

[Bug]: docker-compose failed!

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

newest

Other environment information

No response

Actual behavior

I have pulled the images successfully and do docker compose -f docker-compose-CN.yml up -d.

Expected behavior

No response

Steps to reproduce

[+] Running 6/8
 ⠿ Network docker_ragflow                                                                                                 Created                                                                       0.1s
 ⠿ Container ragflow-es-01                                                                                                Healthy                                                                      21.2s
 ⠿ Container ragflow-mysql                                                                                                Healthy                                                                      11.2s
 ⠿ Container ragflow-minio                                                                                                Started                                                                       1.7s
 ⠇ es01 Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.                                                                                 0.0s
 ⠿ Container ragflow-kibana                                                                                               Started                                                                      21.6s
 ⠿ Container ragflow-server                                                                                               Started                                                                      21.8s
 ⠇ kibana Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.                                                                               0.0s
(base) lk@lk:/media/lk/disk1/lk_git/6_NLPandCNN/LLM/ragflow/docker$ docker logs -f ragflow-server
[HUQIE]:Build default trie
[HUQIE]:Build default trie
[HUQIE]:Build default trie
[HUQIE]:Build trie /ragflow/rag/res/huqie.txt
[HUQIE]:Build trie /ragflow/rag/res/huqie.txt
[HUQIE]:Build trie /ragflow/rag/res/huqie.txt



WARNING:root:Realtime synonym is disabled, since no redis connection.
WARNING:root:Realtime synonym is disabled, since no redis connection.
WARNING:root:Realtime synonym is disabled, since no redis connection.
[WARNING] Load term.freq FAIL!
pytorch_model.bin:   7%|▋         | 94.4M/1.30G [00:29<06:09, 3.27MB/s]WARNING:root:Realtime synonym is disabled, since no redis connection.
[WARNING] Load term.freq FAIL!
Traceback (most recent call last):
  File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/peewee.py", line 3291, in execute_sql
    cursor.execute(sql, params or ())
  File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 153, in execute
    result = self._query(query)
             ^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py11/lib/python3.11/site-packages/pymysql/cursors.py", line 322, in _query

Anyone can helps ? Thanks!



### Additional information

![screenshot1](https://github.com/infiniflow/ragflow/assets/20237650/246876fb-4737-4066-bae1-57605561a678)

It shows {"data":null,"retcode":100,"retmsg":"<NotFound '404: Not Found'>"} in the website.

Add support for ollama

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

Add support for ollama

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

More context of README

  • system requirements:
    hardward, operating system.

  • How to get ragflow from dockerhub

  • How to config ragflow

  • Community

  • Roadmap

  • License

[Bug]: call ChatGLM failed

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

d0a1ffe

Other environment information

No response

Actual behavior

Configure the LLM as ChatGLM and chat, got
ERROR: Completions.create() got an unexpected keyword argument 'presence_penalty'
image

Expected behavior

No response

Steps to reproduce

Deployed local docker environment.
Create a knowledge base, uploading some docs.
Config ChatGLM as LLM and config the ApiKey.
Then create an assistant with ChatGLM, chat with it, the error will happen.

Additional information

No response

ROADMAP 2024

Features

  • Difference language prompt templates. @KevinHuSh
  • Product documents. @writinwaters
  • Difference language UI. #246
  • URL support: Capable of web crawling and the corresponding content extraction. #315
  • Support x-inference as model provider #299
  • OpenAI API compatibility #287
  • ETA of parsing files #328
  • Support doc files

RAG flows

Model integration

  • Ollama integration. #221
  • BCE embedding model #326
  • Cohere Command R embedding model #367
  • AWS Bedrock models #308

[Bug]: Index Not Found - when testing retrieval

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

2673be8

Other environment information

I've uploaded docs to the dataset, parsed and chunked successfully but testing the retrieval fails consistently. Using OpenAI model.

Actual behavior

Error in top right - 'Index Not Found'

Expected behavior

It should produce output from the LLM.

Steps to reproduce

Test anything in retrieval testing.

Additional information

No response

[Bug]: Discord Invalid Invite

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

newest

Other environment information

No response

Actual behavior

The Discord link failed.
image

Expected behavior

No response

Steps to reproduce

Just click here
https://github.com/infiniflow/ragflow?tab=readme-ov-file#-community
The Discord link not work.

Additional information

No response

[Question]: Failed to parse any local file

Describe your problem

After deployed with the pre-built docker images and started up the ragflow server, I could successfully access the ragflow web page, but failed to parse any pdf file in the knowlege base.

All configurations follow the official configuration, except for the service port of Minio in the ./docker/docker-compose.yml

minio:
image: quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z
container_name: ragflow-minio
command: server --console-address ":9001" /data
ports:
- 19000:9000
- 19011:9001
environment:
- MINIO_ROOT_USER=${MINIO_USER}

ERROR msg:

ES updateByQuery deleteByQuery: NotFoundError(404, 'index_not_found_exception', 'no such index [ragflow_0eea0066f16411eeadae0242ac150006]', ragflow_0eea0066f16411eeadae0242ac150006, index_or_alias)【Q】:{'match': {'doc_id': '9f5aac32f18611eeb9eb0242ac150006'}}
Fail put 55d6b0f0f16411ee90d40242ac150006/xxxxxx_my_test_file.pdf: S3 operation failed; code: NoSuchKey, message: Object does not exist, resource: /55d6b0f0f16411ee90d40242ac150006/26-Tesla%20Model%20X%E8%AF%8A%E6%96%AD%E5%AF%B9%E6%A0%87%E6%8A%A5%E5%91%8A20171013.pdf, request_id: 17C2B3E1E8FCF95A, host_id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, bucket_name: 55d6b0f0f16411ee90d40242ac150006, object_name: -xxxxxx_my_test_file.pdf

[Bug]: pip install -r requirements.txt error!

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch name

main

Commit ID

aae84f6

Other environment information

Collecting accelerate==0.27.2 (from -r requirements.txt (line 1))
  Downloading accelerate-0.27.2-py3-none-any.whl (279 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 280.0/280.0 kB 6.4 MB/s eta 0:00:00
Requirement already satisfied: aiohttp==3.9.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (3.9.3)
Requirement already satisfied: aiosignal==1.3.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (1.3.1)
Requirement already satisfied: annotated-types==0.6.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (0.6.0)
Collecting anyio==4.3.0 (from -r requirements.txt (line 5))
  Downloading anyio-4.3.0-py3-none-any.whl (85 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.6/85.6 kB 13.8 MB/s eta 0:00:00
Requirement already satisfied: argon2-cffi==23.1.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (23.1.0)
Requirement already satisfied: argon2-cffi-bindings==21.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 7)) (21.2.0)
Collecting Aspose.Slides==24.2.0 (from -r requirements.txt (line 8))
  Downloading Aspose.Slides-24.2.0-py3-none-manylinux1_x86_64.whl (88.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.7/88.7 MB 2.6 MB/s eta 0:00:00
Requirement already satisfied: attrs==23.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (23.2.0)
Collecting blinker==1.7.0 (from -r requirements.txt (line 10))
  Downloading blinker-1.7.0-py3-none-any.whl (13 kB)
Collecting cachelib==0.12.0 (from -r requirements.txt (line 11))
  Downloading cachelib-0.12.0-py3-none-any.whl (20 kB)
Requirement already satisfied: cachetools==5.3.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 12)) (5.3.3)
Requirement already satisfied: certifi==2024.2.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 13)) (2024.2.2)
Requirement already satisfied: cffi==1.16.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 14)) (1.16.0)
Requirement already satisfied: charset-normalizer==3.3.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 15)) (3.3.2)
Requirement already satisfied: click==8.1.7 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 16)) (8.1.7)
Collecting coloredlogs==15.0.1 (from -r requirements.txt (line 17))
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.0/46.0 kB 7.2 MB/s eta 0:00:00
Requirement already satisfied: cryptography==42.0.5 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 18)) (42.0.5)
Collecting dashscope==1.14.1 (from -r requirements.txt (line 19))
  Downloading dashscope-1.14.1-py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 86.1 MB/s eta 0:00:00
Collecting datasets==2.17.1 (from -r requirements.txt (line 20))
  Downloading datasets-2.17.1-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.7/536.7 kB 57.4 MB/s eta 0:00:00
Collecting datrie==0.8.2 (from -r requirements.txt (line 21))
  Downloading datrie-0.8.2.tar.gz (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.3/63.3 kB 9.7 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting demjson==2.2.4 (from -r requirements.txt (line 22))
  Downloading demjson-2.2.4.tar.gz (131 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 131.5/131.5 kB 17.2 MB/s eta 0:00:00
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Actual behavior

pip install -r -r requirements.txt

error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Expected behavior

pip install -r -r requirements.txt will be successful

Steps to reproduce

1. pip install -r requirements.txt
2. Error happened again.

Additional information

Collecting accelerate==0.27.2 (from -r requirements.txt (line 1))
Using cached accelerate-0.27.2-py3-none-any.whl (279 kB)
Requirement already satisfied: aiohttp==3.9.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (3.9.3)
Requirement already satisfied: aiosignal==1.3.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (1.3.1)
Requirement already satisfied: annotated-types==0.6.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (0.6.0)
Collecting anyio==4.3.0 (from -r requirements.txt (line 5))
Using cached anyio-4.3.0-py3-none-any.whl (85 kB)
Requirement already satisfied: argon2-cffi==23.1.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (23.1.0)
Requirement already satisfied: argon2-cffi-bindings==21.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 7)) (21.2.0)
Collecting Aspose.Slides==24.2.0 (from -r requirements.txt (line 8))
Using cached Aspose.Slides-24.2.0-py3-none-manylinux1_x86_64.whl (88.7 MB)
Requirement already satisfied: attrs==23.2.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (23.2.0)
Collecting blinker==1.7.0 (from -r requirements.txt (line 10))
Using cached blinker-1.7.0-py3-none-any.whl (13 kB)
Collecting cachelib==0.12.0 (from -r requirements.txt (line 11))
Using cached cachelib-0.12.0-py3-none-any.whl (20 kB)
Requirement already satisfied: cachetools==5.3.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 12)) (5.3.3)
Requirement already satisfied: certifi==2024.2.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 13)) (2024.2.2)
Requirement already satisfied: cffi==1.16.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 14)) (1.16.0)
Requirement already satisfied: charset-normalizer==3.3.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 15)) (3.3.2)
Requirement already satisfied: click==8.1.7 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 16)) (8.1.7)
Collecting coloredlogs==15.0.1 (from -r requirements.txt (line 17))
Using cached coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Requirement already satisfied: cryptography==42.0.5 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 18)) (42.0.5)
Collecting dashscope==1.14.1 (from -r requirements.txt (line 19))
Using cached dashscope-1.14.1-py3-none-any.whl (1.2 MB)
Collecting datasets==2.17.1 (from -r requirements.txt (line 20))
Using cached datasets-2.17.1-py3-none-any.whl (536 kB)
Collecting datrie==0.8.2 (from -r requirements.txt (line 21))
Using cached datrie-0.8.2.tar.gz (63 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting demjson==2.2.4 (from -r requirements.txt (line 22))
Using cached demjson-2.2.4.tar.gz (131 kB)
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.