Giter VIP home page Giter VIP logo

Comments (6)

wangting0128 avatar wangting0128 commented on June 1, 2024

different scene,the same error

argo task:lru-fouramf-k2jqn
milvus image:milvus-io-lru-dev-9234a94-20240506

deploy_config:fouramf-server-lazyload-cluster-qn1-2c8g
case_params:fouramf-client-lazyload-49m-ddl-dql

server:

lru-scene14-etcd-0                                                1/1     Running       0              25h     10.104.30.140   4am-node38   <none>           <none>
lru-scene14-etcd-1                                                1/1     Running       0              25h     10.104.24.169   4am-node29   <none>           <none>
lru-scene14-etcd-2                                                1/1     Running       0              25h     10.104.26.102   4am-node32   <none>           <none>
lru-scene14-milvus-datacoord-6c48c5dbc5-vdpml                     1/1     Running       0              25h     10.104.15.19    4am-node20   <none>           <none>
lru-scene14-milvus-datanode-66968dfb74-77np9                      1/1     Running       1 (25h ago)    25h     10.104.13.33    4am-node16   <none>           <none>
lru-scene14-milvus-indexcoord-65b574dcdb-mxlrx                    1/1     Running       0              25h     10.104.5.213    4am-node12   <none>           <none>
lru-scene14-milvus-indexnode-7b9d46c5bf-5pp5v                     1/1     Running       0              25h     10.104.9.41     4am-node14   <none>           <none>
lru-scene14-milvus-indexnode-7b9d46c5bf-xcjsp                     1/1     Running       0              25h     10.104.34.128   4am-node37   <none>           <none>
lru-scene14-milvus-proxy-64c99dcb44-v288s                         1/1     Running       1 (25h ago)    25h     10.104.30.133   4am-node38   <none>           <none>
lru-scene14-milvus-querycoord-64bbb99c85-z8qwq                    1/1     Running       1 (25h ago)    25h     10.104.4.17     4am-node11   <none>           <none>
lru-scene14-milvus-querynode-5bfd875d8-g58vs                      1/1     Running       0              25h     10.104.15.20    4am-node20   <none>           <none>
lru-scene14-milvus-rootcoord-65cff5755-w69dj                      1/1     Running       0              25h     10.104.6.38     4am-node13   <none>           <none>
lru-scene14-minio-0                                               1/1     Running       0              25h     10.104.15.22    4am-node20   <none>           <none>
lru-scene14-minio-1                                               1/1     Running       0              25h     10.104.26.99    4am-node32   <none>           <none>
lru-scene14-minio-2                                               1/1     Running       0              25h     10.104.30.139   4am-node38   <none>           <none>
lru-scene14-minio-3                                               1/1     Running       0              25h     10.104.24.170   4am-node29   <none>           <none>
lru-scene14-pulsar-bookie-0                                       1/1     Running       0              25h     10.104.21.198   4am-node24   <none>           <none>
lru-scene14-pulsar-bookie-1                                       1/1     Running       0              25h     10.104.17.66    4am-node23   <none>           <none>
lru-scene14-pulsar-bookie-2                                       1/1     Running       0              25h     10.104.30.144   4am-node38   <none>           <none>
lru-scene14-pulsar-bookie-init-sr9x8                              0/1     Completed     0              25h     10.104.9.40     4am-node14   <none>           <none>
lru-scene14-pulsar-broker-0                                       1/1     Running       0              25h     10.104.33.104   4am-node36   <none>           <none>
lru-scene14-pulsar-proxy-0                                        1/1     Running       0              25h     10.104.5.215    4am-node12   <none>           <none>
lru-scene14-pulsar-pulsar-init-fv5sj                              0/1     Completed     0              25h     10.104.5.214    4am-node12   <none>           <none>
lru-scene14-pulsar-recovery-0                                     1/1     Running       0              25h     10.104.6.39     4am-node13   <none>           <none>
lru-scene14-pulsar-zookeeper-0                                    1/1     Running       0              25h     10.104.30.141   4am-node38   <none>           <none>
lru-scene14-pulsar-zookeeper-1                                    1/1     Running       0              25h     10.104.26.104   4am-node32   <none>           <none>
lru-scene14-pulsar-zookeeper-2                                    1/1     Running       0              25h     10.104.24.172   4am-node29   <none>           <none>

client pod name: lru-fouramf-k2jqn-1135445338
client log:
截屏2024-05-07 16 10 10

test result:

[2024-05-07 01:22:19,801 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-05-07 01:22:19,802 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-05-07 01:22:19,802 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-07 01:22:19,802 -  INFO - fouram]: grpc     query                                                                            590 590(100.00%) | 450499   10562 1051505 420000 |    0.01        0.01 (stats.py:789)
[2024-05-07 01:22:19,802 -  INFO - fouram]: grpc     scene_search_test                                                                587    43(7.33%) |1449912  296138 7748629 957000 |    0.01        0.00 (stats.py:789)
[2024-05-07 01:22:19,802 -  INFO - fouram]: grpc     search                                                                           575 575(100.00%) | 299558   21955  813003 283000 |    0.01        0.01 (stats.py:789)
[2024-05-07 01:22:19,802 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-07 01:22:19,803 -  INFO - fouram]:          Aggregated                                                                      1752 1208(68.95%) | 735810   10562 7748629 462000 |    0.04        0.03 (stats.py:789)
[2024-05-07 01:22:19,803 -  INFO - fouram]:  (stats.py:790)
[2024-05-07 01:22:19,805 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_8c16m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '2',
                                                              'memory': '8Gi',
                                                              'ephemeral-storage': '70Gi'},
                                                   'requests': {'cpu': '2',
                                                                'memory': '8Gi'}},
                                     'replicas': 1,
                                     'extraEnv': [{'name': 'LOCAL_STORAGE_SIZE',
                                                   'value': '70'}]},
                       'indexNode': {'resources': {'limits': {'cpu': '8.0',
                                                              'memory': '8Gi'},
                                                   'requests': {'cpu': '5.0',
                                                                'memory': '5Gi'}},
                                     'replicas': 2},
                       'dataNode': {'resources': {'limits': {'cpu': '2.0',
                                                             'memory': '8Gi'},
                                                  'requests': {'cpu': '2.0',
                                                               'memory': '8Gi'}},
                                    'replicas': 1},
                       'cluster': {'enabled': True},
                       'pulsar': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}},
                                 'persistence': {'size': '320Gi'}},
                       'etcd': {'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'extraConfigFiles': {'user.yaml': 'queryNode:\n'
                                                         '  '
                                                         'diskCacheCapacityLimit: '
                                                         '51539607552\n'
                                                         '  mmap:\n'
                                                         '    mmapEnabled: '
                                                         'true\n'
                                                         '  lazyloadEnabled: '
                                                         'true\n'
                                                         '  '
                                                         'useStreamComputing: '
                                                         'true\n'
                                                         '  cache:\n'
                                                         '    warmup: sync\n'
                                                         '  '
                                                         'lazyloadWaitTimeout: '
                                                         '300000\n'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'milvus-io-lru-dev-9234a94-20240506'}}},
            'host': 'lru-scene14-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'column_name': 'float32_vector',
                                                    'dim': 768,
                                                    'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
                                                    'dataset_name': 'laion1b_nolang',
                                                    'dataset_size': '49m',
                                                    'ni_per': 10000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'num_partitions': 64},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 30,
                                                                  'efConstruction': 360}},
                                 'concurrent_params': {'concurrent_number': 30,
                                                       'during_time': '12h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 1,
                                                       'params': {'top_k': 1,
                                                                  'nq': 10,
                                                                  'search_param': {'ef': 64},
                                                                  'expr': 'int64_1 '
                                                                          '>= '
                                                                          '1',
                                                                  'timeout': 3000,
                                                                  'random_data': True}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'expr': 'int64_1 '
                                                                          '>  '
                                                                          '20000',
                                                                  'timeout': 3000,
                                                                  'offset': 0,
                                                                  'limit': 20,
                                                                  'random_data': True,
                                                                  'random_count': 10,
                                                                  'random_range': [1000,
                                                                                   10000]}},
                                                      {'type': 'scene_search_test',
                                                       'weight': 1,
                                                       'params': {'index_type': 'HNSW',
                                                                  'index_param': {'M': 30,
                                                                                  'efConstruction': 360},
                                                                  'search_param': {'ef': 64}}}]},
            'run_id': 2024050687861433,
            'datetime': '2024-05-06 06:59:46.105290',
            'client_version': '2.2'},
 'result': {'test_result': {'index': {'RT': 6850.5675},
                            'insert': {'total_time': 14834.9288,
                                       'VPS': 3303.0155,
                                       'batch_time': 3.0275,
                                       'batch': 10000},
                            'flush': {'RT': 7.6264},
                            'load': {'RT': 5.571},
                            'Locust': {'Aggregated': {'Requests': 1752,
                                                      'Fails': 1208,
                                                      'RPS': 0.04,
                                                      'fail_s': 0.69,
                                                      'RT_max': 7748629.49,
                                                      'RT_avg': 735810.14,
                                                      'TP50': 462000.0,
                                                      'TP99': 7221000.0},
                                       'query': {'Requests': 590,
                                                 'Fails': 590,
                                                 'RPS': 0.01,
                                                 'fail_s': 1.0,
                                                 'RT_max': 1051505.09,
                                                 'RT_avg': 450499.33,
                                                 'TP50': 420000.0,
                                                 'TP99': 1031000.0},
                                       'scene_search_test': {'Requests': 587,
                                                             'Fails': 43,
                                                             'RPS': 0.01,
                                                             'fail_s': 0.07,
                                                             'RT_max': 7748629.49,
                                                             'RT_avg': 1449912.53,
                                                             'TP50': 957000.0,
                                                             'TP99': 7747000.0},
                                       'search': {'Requests': 575,
                                                  'Fails': 575,
                                                  'RPS': 0.01,
                                                  'fail_s': 1.0,
                                                  'RT_max': 813003.68,
                                                  'RT_avg': 299558.43,
                                                  'TP50': 283000.0,
                                                  'TP99': 765000.0}}}}} 

test steps:

1. create a collection with fields: id(primary key), float_vector(768dim), int64_1(partitionKey=64)
2. build HNSW index
3. insert 49m data
4. flush collection
5. build index with the same param again
6. load collection
7. concurrent requests: <- raises error
   - scene_search_test (collection: create->insert->flush->index->load->search->drop)
   - search
   - query  

from milvus.

wangting0128 avatar wangting0128 commented on June 1, 2024

different scene,the same error

argo task:lru-fouramf-bgswl
milvus image:milvus-io-lru-dev-9234a94-20240506

deploy_config:fouramf-server-global-lazyload-standalone-2c8g
case_params:fouramf-client-lazyload-49m-load-search-release

server:

NAME                                                              READY   STATUS        RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lru-scene18-etcd-0                                                1/1     Running       0               43h     10.104.34.125   4am-node37   <none>           <none>
lru-scene18-milvus-standalone-7fd4c8c946-ntmg2                    1/1     Running       0               43h     10.104.19.239   4am-node28   <none>           <none>
lru-scene18-minio-59cfbd7588-7fzrd                                1/1     Running       0               43h     10.104.15.16    4am-node20   <none>           <none>
lru-scene18-pulsar-bookie-0                                       1/1     Running       0               43h     10.104.34.127   4am-node37   <none>           <none>
lru-scene18-pulsar-bookie-1                                       1/1     Running       0               43h     10.104.17.59    4am-node23   <none>           <none>
lru-scene18-pulsar-bookie-2                                       1/1     Running       0               43h     10.104.24.165   4am-node29   <none>           <none>
lru-scene18-pulsar-bookie-init-jr58p                              0/1     Completed     0               43h     10.104.5.202    4am-node12   <none>           <none>
lru-scene18-pulsar-broker-0                                       1/1     Running       0               43h     10.104.6.31     4am-node13   <none>           <none>
lru-scene18-pulsar-proxy-0                                        1/1     Running       0               43h     10.104.5.203    4am-node12   <none>           <none>
lru-scene18-pulsar-pulsar-init-4b8nd                              0/1     Completed     0               43h     10.104.5.199    4am-node12   <none>           <none>
lru-scene18-pulsar-recovery-0                                     1/1     Running       0               43h     10.104.5.200    4am-node12   <none>           <none>
lru-scene18-pulsar-zookeeper-0                                    1/1     Running       0               43h     10.104.33.102   4am-node36   <none>           <none>
lru-scene18-pulsar-zookeeper-1                                    1/1     Running       0               43h     10.104.21.192   4am-node24   <none>           <none>
lru-scene18-pulsar-zookeeper-2                                    1/1     Running       0               43h     10.104.30.129   4am-node38   <none>           <none>

client pod name: lru-fouramf-bgswl-1500130857
client log:
截屏2024-05-08 10 34 07

test result:

[2024-05-07 10:58:35,848 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-05-07 10:58:35,848 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-07 10:58:35,848 -  INFO - fouram]: grpc     load_search_release                                                              123 123(100.00%) |  87373   29793  119500  96000 |    0.01        0.01 (stats.py:789)
[2024-05-07 10:58:35,848 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-07 10:58:35,849 -  INFO - fouram]:          Aggregated                                                                       123 123(100.00%) |  87373   29793  119500  96000 |    0.01        0.01 (stats.py:789)
[2024-05-07 10:58:35,849 -  INFO - fouram]:  (stats.py:790)
[2024-05-07 10:58:35,851 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '2',
                                                               'memory': '8Gi',
                                                               'ephemeral-storage': '70Gi'},
                                                    'requests': {'cpu': '2',
                                                                 'memory': '8Gi'}},
                                      'messageQueue': 'pulsar',
                                      'extraEnv': [{'name': 'LOCAL_STORAGE_SIZE',
                                                    'value': '70'}],
                                      'disk': {'size': {'enabled': True}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}},
                                 'persistence': {'size': '320Gi'}},
                       'pulsar': {'enabled': True},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'extraConfigFiles': {'user.yaml': 'queryNode:\n'
                                                         '  '
                                                         'diskCacheCapacityLimit: '
                                                         '51539607552\n'
                                                         '  mmap:\n'
                                                         '    mmapEnabled: '
                                                         'true\n'
                                                         '  lazyloadEnabled: '
                                                         'true\n'
                                                         '  '
                                                         'useStreamComputing: '
                                                         'true\n'
                                                         '  cache:\n'
                                                         '    warmup: sync\n'
                                                         '  '
                                                         'lazyloadWaitTimeout: '
                                                         '300000\n'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'milvus-io-lru-dev-9234a94-20240506'}}},
            'host': 'lru-scene18-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'column_name': 'float32_vector',
                                                    'dim': 768,
                                                    'extra_partitions': {'partitions': 49,
                                                                         'data_repeated': False},
                                                    'dataset_name': 'laion1b_nolang',
                                                    'dataset_size': '49m',
                                                    'ni_per': 10000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'collection_name': 'fouram_49m'},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 30,
                                                                  'efConstruction': 360}},
                                 'concurrent_params': {'concurrent_number': 1,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'load_search_release',
                                                       'weight': 1,
                                                       'params': {'top_k': 1,
                                                                  'nq': 10,
                                                                  'search_param': {'ef': 64},
                                                                  'timeout': 3000,
                                                                  'random_data': True}}]},
            'run_id': 2024050685376027,
            'datetime': '2024-05-06 06:55:37.675353',
            'client_version': '2.2'},
 'result': {'test_result': {'index': {'RT': 75245.953},
                            'insert': {'total_time': 13795.7049,
                                       'VPS': 3556.3144,
                                       'batch_time': 2.8154,
                                       'batch': 10000.0},
                            'flush': {'RT': 2.7891},
                            'load': {'RT': 13.2519},
                            'Locust': {'Aggregated': {'Requests': 123,
                                                      'Fails': 123,
                                                      'RPS': 0.01,
                                                      'fail_s': 1.0,
                                                      'RT_max': 119500.21,
                                                      'RT_avg': 87373.07,
                                                      'TP50': 96000.0,
                                                      'TP99': 119000.0},
                                       'load_search_release': {'Requests': 123,
                                                               'Fails': 123,
                                                               'RPS': 0.01,
                                                               'fail_s': 1.0,
                                                               'RT_max': 119500.21,
                                                               'RT_avg': 87373.07,
                                                               'TP50': 96000.0,
                                                               'TP99': 119000.0}}}}}

test steps:

1. create a collection with fields: id(primary key), float_vector(768dim), int64_1 
2. build HNSW index
3. insert 49m data to 49 partitions
4. flush collection
5. build index with the same param again
6. load collection
7. serial exec load_search_release <- raises error

from milvus.

chyezh avatar chyezh commented on June 1, 2024

In lru-verify-32453 .
A handoff is triggered at 10:45:37.121, and next search operation reports no sufficient resource to load segments.

2024-05-06 10:45:37.121	
(no unique labels)
[2024/05/06 10:45:37.121 +00:00] [INFO] [task/executor.go:217] ["load segments..."] [taskID=1714976567165] [collectionID=449570775331504258] [replicaID=449570817077936129] [segmentID=449570775331709638] [node=1] [source=segment_checker] [shardLeader=1]

load segment failed, OOM if load, maxSegmentSize = 2903.265687942505 MB,  memUsage = 4674.25 MB, predictMemUsage = 7577.515687942505 MB, totalMem = 8192 MB thresholdFactor = 0.900000

After compacted-from segment 449570775336766134, 449570775331708928 is released

2024-05-06 10:45:48.120	
(no unique labels)
[2024/05/06 10:45:48.120 +00:00] [INFO] [task/executor.go:283] ["release segment..."] [taskID=1714976567166] [collectionID=449570775331504258] [replicaID=449570817077936129] [segmentID=449570775331709141] [node=1] [source=segment_checker] [shardLeader=1]
2024-05-06 10:45:48.120	
(no unique labels)
[2024/05/06 10:45:48.120 +00:00] [INFO] [task/executor.go:283] ["release segment..."] [taskID=1714976567167] [collectionID=449570775331504258] [replicaID=449570817077936129] [segmentID=449570775336766135] [node=1] [source=segment_checker] [shardLeader=1]

The query operation is recovered.
we can optimize it by growing mmap segment in future.

from milvus.

wangting0128 avatar wangting0128 commented on June 1, 2024

In lru-verify-32453 . A handoff is triggered at 10:45:37.121, and next search operation reports no sufficient resource to load segments.

2024-05-06 10:45:37.121	
(no unique labels)
[2024/05/06 10:45:37.121 +00:00] [INFO] [task/executor.go:217] ["load segments..."] [taskID=1714976567165] [collectionID=449570775331504258] [replicaID=449570817077936129] [segmentID=449570775331709638] [node=1] [source=segment_checker] [shardLeader=1]

load segment failed, OOM if load, maxSegmentSize = 2903.265687942505 MB,  memUsage = 4674.25 MB, predictMemUsage = 7577.515687942505 MB, totalMem = 8192 MB thresholdFactor = 0.900000

After compacted-from segment 449570775336766134, 449570775331708928 is released

2024-05-06 10:45:48.120	
(no unique labels)
[2024/05/06 10:45:48.120 +00:00] [INFO] [task/executor.go:283] ["release segment..."] [taskID=1714976567166] [collectionID=449570775331504258] [replicaID=449570817077936129] [segmentID=449570775331709141] [node=1] [source=segment_checker] [shardLeader=1]
2024-05-06 10:45:48.120	
(no unique labels)
[2024/05/06 10:45:48.120 +00:00] [INFO] [task/executor.go:283] ["release segment..."] [taskID=1714976567167] [collectionID=449570775331504258] [replicaID=449570817077936129] [segmentID=449570775336766135] [node=1] [source=segment_checker] [shardLeader=1]

The query operation is recovered. we can optimize it by growing mmap segment in future.

Okay, I’ll rerun this case after growing mmap segment function is ready

from milvus.

sunby avatar sunby commented on June 1, 2024

different scene,the same error

argo task:lru-fouramf-bgswl milvus image:milvus-io-lru-dev-9234a94-20240506

deploy_config:fouramf-server-global-lazyload-standalone-2c8g case_params:fouramf-client-lazyload-49m-load-search-release

server:

NAME                                                              READY   STATUS        RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lru-scene18-etcd-0                                                1/1     Running       0               43h     10.104.34.125   4am-node37   <none>           <none>
lru-scene18-milvus-standalone-7fd4c8c946-ntmg2                    1/1     Running       0               43h     10.104.19.239   4am-node28   <none>           <none>
lru-scene18-minio-59cfbd7588-7fzrd                                1/1     Running       0               43h     10.104.15.16    4am-node20   <none>           <none>
lru-scene18-pulsar-bookie-0                                       1/1     Running       0               43h     10.104.34.127   4am-node37   <none>           <none>
lru-scene18-pulsar-bookie-1                                       1/1     Running       0               43h     10.104.17.59    4am-node23   <none>           <none>
lru-scene18-pulsar-bookie-2                                       1/1     Running       0               43h     10.104.24.165   4am-node29   <none>           <none>
lru-scene18-pulsar-bookie-init-jr58p                              0/1     Completed     0               43h     10.104.5.202    4am-node12   <none>           <none>
lru-scene18-pulsar-broker-0                                       1/1     Running       0               43h     10.104.6.31     4am-node13   <none>           <none>
lru-scene18-pulsar-proxy-0                                        1/1     Running       0               43h     10.104.5.203    4am-node12   <none>           <none>
lru-scene18-pulsar-pulsar-init-4b8nd                              0/1     Completed     0               43h     10.104.5.199    4am-node12   <none>           <none>
lru-scene18-pulsar-recovery-0                                     1/1     Running       0               43h     10.104.5.200    4am-node12   <none>           <none>
lru-scene18-pulsar-zookeeper-0                                    1/1     Running       0               43h     10.104.33.102   4am-node36   <none>           <none>
lru-scene18-pulsar-zookeeper-1                                    1/1     Running       0               43h     10.104.21.192   4am-node24   <none>           <none>
lru-scene18-pulsar-zookeeper-2                                    1/1     Running       0               43h     10.104.30.129   4am-node38   <none>           <none>

client pod name: lru-fouramf-bgswl-1500130857 client log: 截屏2024-05-08 10 34 07

test result:

[2024-05-07 10:58:35,848 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-05-07 10:58:35,848 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-07 10:58:35,848 -  INFO - fouram]: grpc     load_search_release                                                              123 123(100.00%) |  87373   29793  119500  96000 |    0.01        0.01 (stats.py:789)
[2024-05-07 10:58:35,848 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-07 10:58:35,849 -  INFO - fouram]:          Aggregated                                                                       123 123(100.00%) |  87373   29793  119500  96000 |    0.01        0.01 (stats.py:789)
[2024-05-07 10:58:35,849 -  INFO - fouram]:  (stats.py:790)
[2024-05-07 10:58:35,851 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '2',
                                                               'memory': '8Gi',
                                                               'ephemeral-storage': '70Gi'},
                                                    'requests': {'cpu': '2',
                                                                 'memory': '8Gi'}},
                                      'messageQueue': 'pulsar',
                                      'extraEnv': [{'name': 'LOCAL_STORAGE_SIZE',
                                                    'value': '70'}],
                                      'disk': {'size': {'enabled': True}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}},
                                 'persistence': {'size': '320Gi'}},
                       'pulsar': {'enabled': True},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'extraConfigFiles': {'user.yaml': 'queryNode:\n'
                                                         '  '
                                                         'diskCacheCapacityLimit: '
                                                         '51539607552\n'
                                                         '  mmap:\n'
                                                         '    mmapEnabled: '
                                                         'true\n'
                                                         '  lazyloadEnabled: '
                                                         'true\n'
                                                         '  '
                                                         'useStreamComputing: '
                                                         'true\n'
                                                         '  cache:\n'
                                                         '    warmup: sync\n'
                                                         '  '
                                                         'lazyloadWaitTimeout: '
                                                         '300000\n'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'milvus-io-lru-dev-9234a94-20240506'}}},
            'host': 'lru-scene18-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'column_name': 'float32_vector',
                                                    'dim': 768,
                                                    'extra_partitions': {'partitions': 49,
                                                                         'data_repeated': False},
                                                    'dataset_name': 'laion1b_nolang',
                                                    'dataset_size': '49m',
                                                    'ni_per': 10000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'collection_name': 'fouram_49m'},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 30,
                                                                  'efConstruction': 360}},
                                 'concurrent_params': {'concurrent_number': 1,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'load_search_release',
                                                       'weight': 1,
                                                       'params': {'top_k': 1,
                                                                  'nq': 10,
                                                                  'search_param': {'ef': 64},
                                                                  'timeout': 3000,
                                                                  'random_data': True}}]},
            'run_id': 2024050685376027,
            'datetime': '2024-05-06 06:55:37.675353',
            'client_version': '2.2'},
 'result': {'test_result': {'index': {'RT': 75245.953},
                            'insert': {'total_time': 13795.7049,
                                       'VPS': 3556.3144,
                                       'batch_time': 2.8154,
                                       'batch': 10000.0},
                            'flush': {'RT': 2.7891},
                            'load': {'RT': 13.2519},
                            'Locust': {'Aggregated': {'Requests': 123,
                                                      'Fails': 123,
                                                      'RPS': 0.01,
                                                      'fail_s': 1.0,
                                                      'RT_max': 119500.21,
                                                      'RT_avg': 87373.07,
                                                      'TP50': 96000.0,
                                                      'TP99': 119000.0},
                                       'load_search_release': {'Requests': 123,
                                                               'Fails': 123,
                                                               'RPS': 0.01,
                                                               'fail_s': 1.0,
                                                               'RT_max': 119500.21,
                                                               'RT_avg': 87373.07,
                                                               'TP50': 96000.0,
                                                               'TP99': 119000.0}}}}}

test steps:

1. create a collection with fields: id(primary key), float_vector(768dim), int64_1 
2. build HNSW index
3. insert 49m data to 49 partitions
4. flush collection
5. build index with the same param again
6. load collection
7. serial exec load_search_release <- raises error

Concurrent loading of multiple segments leads to insufficient memory resources.

from milvus.

wangting0128 avatar wangting0128 commented on June 1, 2024

verification insert and dql passed

image: master-20240513-9e3f3d99
argo task: lru-fouramf-q2hm8-0513

from milvus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.