Giter VIP home page Giter VIP logo

Comments (21)

yanliang567 avatar yanliang567 commented on August 16, 2024

Checking the logs, I did not see anything doubtful. The default value of max segment size changed from 512MB to 1024GB, that's the only suspected point I can think of. @artinshahverdian quick question: how did you observe the index memory usage?
@xiaocai2333 please help to double check.
/assign @xiaocai2333
/unassign

from milvus.

xiaocai2333 avatar xiaocai2333 commented on August 16, 2024
[2024/06/28 02:26:52.193 +00:00] [INFO] [indexnode/indexnode_service.go:56] ["IndexNode building index ..."] [traceID=54bc5ee90b17f46a0dbf953e1779ce67] [clusterID=by-dev] [indexBuildID=450766058328658389] [collectionID=447757774238601435] [indexID=0] [indexName=] [indexFilePrefix=index_files] [indexVersion=18] [dataPaths="[]"] [typeParams="[{\"key\":\"dim\",\"value\":\"1536\"}]"] [indexParams="[{\"key\":\"M\",\"value\":\"16\"},{\"key\":\"index_type\",\"value\":\"HNSW\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"efConstruction\",\"value\":\"50\"}]"] [numRows=398898] [current_index_version=4] [storepath=] [storeversion=0] [indexstorepath=] [dim=0]
[2024/06/28 02:26:52.920 +00:00] [INFO] [indexnode/task.go:516] ["index params are ready"] [buildID=450766058328658389] ["index params"="{\"M\":\"16\",\"dim\":\"1536\",\"efConstruction\":\"50\",\"index_type\":\"HNSW\",\"metric_type\":\"L2\"}"]

According to the log information, the size of the newly segment to build index is 398898*1536*4/1024/1024 2337.29MB. An 8GB indexnode is not sufficient for such a large segment. Please check if you changed the segment's MaxSize configuration during the upgrade, which might have caused the compaction to generate larger segments.

from milvus.

yanliang567 avatar yanliang567 commented on August 16, 2024

/assign @artinshahverdian
/unassign @xiaocai2333

from milvus.

artinshahverdian avatar artinshahverdian commented on August 16, 2024

can confirm the segment size default value is changed in 2.4.4:

segment:
    maxSize: 1024 # Maximum size of a segment in MB
    diskSegmentMaxSize: 2048 # Maximun size of a segment in MB for collection which has Disk index

these are my configs now. If I reduce these to:

segment:
    maxSize: 512 # Maximum size of a segment in MB
    diskSegmentMaxSize: 1024 # Maximun size of a segment in MB for collection which has Disk index

and trigger compaction, will I get smaller segments and can I use an 8GB machine for indexNode or the existing segments cannot change anymore?
cc: @xiaocai2333

from milvus.

xiaocai2333 avatar xiaocai2333 commented on August 16, 2024

It is no way to reduce the segment size through compaction. The recommended approach is to scale up the indexnode memory to 10GB; for 2.3GB segment, 10GB of memory should be sufficient for building the index.
But it is strange, your index type is HNSW, but the segment size is 2GB.
@artinshahverdian please confirm whether you have changed the segment.MaxSize or if you have ever built a DISKANN index.

from milvus.

artinshahverdian avatar artinshahverdian commented on August 16, 2024

@artinshahverdian I have not changed the segment size or built a disk index. Is there anyway I can find the big segment and verify the size?

from milvus.

artinshahverdian avatar artinshahverdian commented on August 16, 2024

@xiaocai2333 do you see any downside of changing the segment size back to 512?

from milvus.

xiaocai2333 avatar xiaocai2333 commented on August 16, 2024

@xiaocai2333 do you see any downside of changing the segment size back to 512?

Reverting the configuration poses no issues. However, once a segment has been generated, it cannot be reduced from 2GB back to 512MB.

from milvus.

xiaocai2333 avatar xiaocai2333 commented on August 16, 2024

@artinshahverdian It would be great if you have more logs from datacoord/datanode. We can try to investigate how the large segment was generated.

from milvus.

xiaofan-luan avatar xiaofan-luan commented on August 16, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.4.4
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2): N/A
- OS(Ubuntu or CentOS): 
- CPU/Memory: 4vCPU/8GB
- GPU: N/A
- Others: N/A

Current Behavior

I am running Milvus 2.4.4 in cluster mode on AWS EKS. The I am seeing the indexnode being crashed while it's trying to index. I have just upgraded from 2.3.12 to 2.4.4 and have a dedicated nodegroup for the indexnode. The machine has 8GB memory. Why would the indexnode work fine in 2.3.12 with the same memory and get OOM after upgrading to 2.4.4. Anything I'm missing? Logs for indexnode are included from start until the crash. Logs are set at info level. After upgrading to a 16GB Node, the memory usage didn't go above 6GB and it dropped multiple times and grew. I suspect Milvus is not monitoring memory usage and doesn't kick off a garbage collection before using more memory.

My segment size and max segment size are the default and I have not overridden anything.

indexnode.log

Expected Behavior

Indexnode should work fine as it was in 2.3.12 with an 8GB machine and run garbage collection periodically.

Steps To Reproduce

No response

Milvus Log

indexnode.log

Anything else?

No response

@xiaocai2333
can we investigate how much memory does it take to build 1G memory.
8GB should be well enough for 1G segment I guess.

@artinshahverdian
what is your current index parameter?

from milvus.

xiaofan-luan avatar xiaofan-luan commented on August 16, 2024

and why does this segment size becomes 2.3G with 1G setting?
@xiaocai2333

from milvus.

artinshahverdian avatar artinshahverdian commented on August 16, 2024

@xiaofan-luan my index is HNSW, ef_cunstroction: 50, m: 16. I looked at the files stored in s3, and cannot find any segment that is close to 2GB, all of them are less than 1GB.

from milvus.

xiaofan-luan avatar xiaofan-luan commented on August 16, 2024

@artinshahverdian It would be great if you have more logs from datacoord/datanode. We can try to investigate how the large segment was generated.

can you verify how much memory does it take to a 1g memory to build index?

If it takes more than 4g, maybe 1G segment size is too large as default for 2c8g users

from milvus.

artinshahverdian avatar artinshahverdian commented on August 16, 2024

@artinshahverdian It would be great if you have more logs from datacoord/datanode. We can try to investigate how the large segment was generated.

can you verify how much memory does it take to a 1g memory to build index?

If it takes more than 4g, maybe 1G segment size is too large as default for 2c8g users

If the question is addressed to me, We are in the middle of a data reset, and have reset most of our data so I can't really run this experiment. But I have reduced the segment size to 512 now and will see if we can use 8GB RAM for index node in the future.

from milvus.

xiaocai2333 avatar xiaocai2333 commented on August 16, 2024

and why does this segment size becomes 2.3G with 1G setting? @xiaocai2333

From the logs of the indexnode, it can be seen that an index is being built for a segment with 398898 rows and ad dimension of 1536.
There are no further logs t investigate how this segment was generated.

from milvus.

xiaocai2333 avatar xiaocai2333 commented on August 16, 2024

@xiaofan-luan my index is HNSW, ef_cunstroction: 50, m: 16. I looked at the files stored in s3, and cannot find any segment that is close to 2GB, all of them are less than 1GB.

@artinshahverdian
please verify this segment with collectionID: 447757774238601435 and segmentID: 450766058328436096

from milvus.

artinshahverdian avatar artinshahverdian commented on August 16, 2024

the size of that segment in s3 is ~800 mb
Screenshot 2024-07-08 at 10 06 39 AM

from milvus.

xiaocai2333 avatar xiaocai2333 commented on August 16, 2024

@artinshahverdian Could you provide the datacoord logs from the upgrade process? Or could you download this segment's data and send it to my email [email protected]?

from milvus.

artinshahverdian avatar artinshahverdian commented on August 16, 2024

Sorry, I can't share the segment since there is sensitive data in there. I have restarted the pod after the migration was done, so I don't really have the logs anymore.

from milvus.

xiaocai2333 avatar xiaocai2333 commented on August 16, 2024

Okay, were you able to successfully build the index after migrating your cluster?

from milvus.

stale avatar stale commented on August 16, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

from milvus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.