Giter VIP home page Giter VIP logo

Comments (7)

xixihahaliu avatar xixihahaliu commented on September 27, 2024

It seems that there is a problem with the container startup of Milvus. You can try the following docker-compose.yaml, which is the official Milvus startup file. If executing this also results in an error, it means that Milvus cannot start properly.

version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.3.3
    command: ["milvus", "run", "standalone"]
    security_opt:
    - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"

networks:
  default:
    name: milvus

from qanything.

qmhl1 avatar qmhl1 commented on September 27, 2024

Bug?,/model_repos/QAEnsemble/QAEnsemble.log

I0105 12:26:15.434683 85 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1416000000' with size 268435456
I0105 12:26:15.435029 85 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0105 12:26:15.439093 85 model_lifecycle.cc:462] loading: rerank:1
I0105 12:26:15.439135 85 model_lifecycle.cc:462] loading: embed:1
I0105 12:26:15.439166 85 model_lifecycle.cc:462] loading: base:1
I0105 12:26:15.442887 85 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime
I0105 12:26:15.442939 85 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12
I0105 12:26:15.442967 85 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12
I0105 12:26:15.442993 85 onnxruntime.cc:2550] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0105 12:26:15.477325 85 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: embed (version 1)
I0105 12:26:15.477402 85 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: rerank (version 1)
I0105 12:26:15.477980 85 onnxruntime.cc:666] skipping model configuration auto-complete for 'embed': inputs and outputs already specified
I0105 12:26:15.478452 85 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: embed (GPU device 0)
I0105 12:26:15.478770 85 onnxruntime.cc:666] skipping model configuration auto-complete for 'rerank': inputs and outputs already specified
I0105 12:26:15.480427 85 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: rerank (GPU device 0)
[1704457575.790988] [739a2c81d313:85   :0]       **ib_device.c:1173 UCX  ERROR   ibv_create_ah(dlid=49152 sl=0 port=1 src_path_bits=0 dgid=fe80::f652:14ff:feda:687e sgid_index=0 traffic_class=0) for UD verbs connect on mlx4_0 failed: Cannot allocate memory**
[739a2c81d313:00085] pml_ucx.c:424  Error: ucp_ep_create(proc=0) failed: Address not valid
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[739a2c81d313:00085] *** An error occurred in MPI_Init_thread
[739a2c81d313:00085] *** reported by process [682491905,0]
[739a2c81d313:00085] *** on a NULL communicator
[739a2c81d313:00085] *** Unknown error
[739a2c81d313:00085] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[739a2c81d313:00085] ***    and potentially your MPI job)

from qanything.

successren avatar successren commented on September 27, 2024

我也遇到了,不过这个好像不重要,bug好像在别的地方,我这边是tritonserver启动失败,我泡了一天咖啡都没见它完成,查了查好像是内存问题,docker-compose.yaml里加点东西好像就可以了 image

How much memory does your machine have?

from qanything.

qmhl1 avatar qmhl1 commented on September 27, 2024

After deleting the models folder, I can start it. Is there a problem with my decompression operation?

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:65:00.0 Off |                  N/A |
| 22%   26C    P8              15W / 250W |     15MiB / 22528MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1174      G   /usr/lib/xorg/Xorg                            9MiB |
|    0   N/A  N/A      1418      G   /usr/bin/gnome-shell                          4MiB |
+---------------------------------------------------------------------------------------+```
~/QAnything/models/...
> Bug,/model_repos/QAEnsemble/QAEnsemble.log
> 
> ```
> I0105 12:26:15.434683 85 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1416000000' with size 268435456
> I0105 12:26:15.435029 85 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
> I0105 12:26:15.439093 85 model_lifecycle.cc:462] loading: rerank:1
> I0105 12:26:15.439135 85 model_lifecycle.cc:462] loading: embed:1
> I0105 12:26:15.439166 85 model_lifecycle.cc:462] loading: base:1
> I0105 12:26:15.442887 85 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime
> I0105 12:26:15.442939 85 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12
> I0105 12:26:15.442967 85 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12
> I0105 12:26:15.442993 85 onnxruntime.cc:2550] backend configuration:
> {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
> I0105 12:26:15.477325 85 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: embed (version 1)
> I0105 12:26:15.477402 85 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: rerank (version 1)
> I0105 12:26:15.477980 85 onnxruntime.cc:666] skipping model configuration auto-complete for 'embed': inputs and outputs already specified
> I0105 12:26:15.478452 85 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: embed (GPU device 0)
> I0105 12:26:15.478770 85 onnxruntime.cc:666] skipping model configuration auto-complete for 'rerank': inputs and outputs already specified
> I0105 12:26:15.480427 85 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: rerank (GPU device 0)
> [1704457575.790988] [739a2c81d313:85   :0]       **ib_device.c:1173 UCX  ERROR   ibv_create_ah(dlid=49152 sl=0 port=1 src_path_bits=0 dgid=fe80::f652:14ff:feda:687e sgid_index=0 traffic_class=0) for UD verbs connect on mlx4_0 failed: Cannot allocate memory**
> [739a2c81d313:00085] pml_ucx.c:424  Error: ucp_ep_create(proc=0) failed: Address not valid
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   PML add procs failed
>   --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> [739a2c81d313:00085] *** An error occurred in MPI_Init_thread
> [739a2c81d313:00085] *** reported by process [682491905,0]
> [739a2c81d313:00085] *** on a NULL communicator
> [739a2c81d313:00085] *** Unknown error
> [739a2c81d313:00085] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> [739a2c81d313:00085] ***    and potentially your MPI job)
> ```

from qanything.

successren avatar successren commented on September 27, 2024

😂 So, have you successfully run it now?"

After deleting the models folder, I can start it. Is there a problem with my decompression operation? QAnything/models/...

Bug,/model_repos/QAEnsemble/QAEnsemble.log

I0105 12:26:15.434683 85 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1416000000' with size 268435456
I0105 12:26:15.435029 85 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0105 12:26:15.439093 85 model_lifecycle.cc:462] loading: rerank:1
I0105 12:26:15.439135 85 model_lifecycle.cc:462] loading: embed:1
I0105 12:26:15.439166 85 model_lifecycle.cc:462] loading: base:1
I0105 12:26:15.442887 85 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime
I0105 12:26:15.442939 85 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12
I0105 12:26:15.442967 85 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12
I0105 12:26:15.442993 85 onnxruntime.cc:2550] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0105 12:26:15.477325 85 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: embed (version 1)
I0105 12:26:15.477402 85 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: rerank (version 1)
I0105 12:26:15.477980 85 onnxruntime.cc:666] skipping model configuration auto-complete for 'embed': inputs and outputs already specified
I0105 12:26:15.478452 85 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: embed (GPU device 0)
I0105 12:26:15.478770 85 onnxruntime.cc:666] skipping model configuration auto-complete for 'rerank': inputs and outputs already specified
I0105 12:26:15.480427 85 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: rerank (GPU device 0)
[1704457575.790988] [739a2c81d313:85   :0]       **ib_device.c:1173 UCX  ERROR   ibv_create_ah(dlid=49152 sl=0 port=1 src_path_bits=0 dgid=fe80::f652:14ff:feda:687e sgid_index=0 traffic_class=0) for UD verbs connect on mlx4_0 failed: Cannot allocate memory**
[739a2c81d313:00085] pml_ucx.c:424  Error: ucp_ep_create(proc=0) failed: Address not valid
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[739a2c81d313:00085] *** An error occurred in MPI_Init_thread
[739a2c81d313:00085] *** reported by process [682491905,0]
[739a2c81d313:00085] *** on a NULL communicator
[739a2c81d313:00085] *** Unknown error
[739a2c81d313:00085] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[739a2c81d313:00085] ***    and potentially your MPI job)

from qanything.

qmhl1 avatar qmhl1 commented on September 27, 2024

Could it be a problem with /opt/tritonserver/backends/qa_ensemble
This resulted in the inability to load the base model

from qanything.

xixihahaliu avatar xixihahaliu commented on September 27, 2024

| 22% 26C P8 15W / 250W | 15MiB / 22528MiB | 0% Default |

Why does the 2080TI have 22GB of VRAM? Is it a non-official version of the GPU that may cause unknown errors?

from qanything.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.