Giter VIP home page Giter VIP logo

Comments (12)

aureliar8 avatar aureliar8 commented on May 21, 2024

You probably need to install the snappy lib/packages on your machine.

from grocksdb.

Akhilesh53 avatar Akhilesh53 commented on May 21, 2024

Thanks @aureliar8 I just need to mention the path separately for all the folders. It worked then

from grocksdb.

Akhilesh53 avatar Akhilesh53 commented on May 21, 2024

@aureliar8 @yihuang @kingster
I am facing one problem. I am using rocksdb for clustering of strings for similarity/ dedupe purposes. when I start the clustering process, memory consumption is 0, but as the process proceeds further memory increases slow and steadily and reached till max limit, due to which OS auto kills the process.
RocksDB Details.zip
You can see the logs and profiling details in this zip file. Can you suggest me something to resolve this issue ?

from grocksdb.

aureliar8 avatar aureliar8 commented on May 21, 2024

What's the max limit value in your case ?

According to the go profiles you send, your pure go program seems to make a lot of short lived allocation. (High alloc_space & low inuse_space). So this shouldn't impact negatively the memory footprint of the process.
This indicates that most the memory footprint comes from cgo code that the go profiler can't observe, so probably rocksdb itself.

I can see in the rocksdb logs that you use a LRU BlockCache with a capacity of 3GB.
I can't comment if this is a correct value but this can explain a memory increase of 3GB between after the start of the process.

from grocksdb.

yihuang avatar yihuang commented on May 21, 2024

Maybe lots of sst files, there are some amount of memory needed for each opened sst files, you can set max open files option.

from grocksdb.

Akhilesh53 avatar Akhilesh53 commented on May 21, 2024

Thanks @aureliar8 @yihuang

  1. What's the max limit value in your case ?
    Around 62 GB of memory is free. One process can use 100% of the memory available.

  2. I am using the below-mentioned configuration to create rocksdb.

         bbto := grocksdb.NewDefaultBlockBasedTableOptions()
	//todo:
	// checkout the value for LRUCache and options
	bbto.SetBlockCache(grocksdb.NewLRUCache(31457280))
	filter := grocksdb.NewBloomFilter(10)
	bbto.SetFilterPolicy(filter)
	opts := grocksdb.NewDefaultOptions()
	opts.SetBlockBasedTableFactory(bbto)
	opts.SetCreateIfMissing(true)
	opts.EnableBlobFiles(true)
	opts.EnableBlobGC(true)
	opts.IncreaseParallelism(4)
	opts.SetMaxWriteBufferNumber(4)
	opts.SetMinWriteBufferNumberToMerge(1)
	opts.SetRecycleLogFileNum(4)
	opts.SetWriteBufferSize(134217728) 
	opts.SetWritableFileMaxBufferSize(0)
	opts.CompactionReadaheadSize(2097152)
	opts.SetMaxBackgroundJobs(2)
	opts.SetMaxTotalWalSize(1073741824)
	opts.SetBlobCompactionReadaheadSize(2097152)
	opts.SetDbLogDir(dataDir + "/" + name)
	opts.SetInfoLogLevel(grocksdb.InfoInfoLogLevel)
	opts.SetStatsDumpPeriodSec(180)
	opts.EnableStatistics()
	opts.SetLevelCompactionDynamicLevelBytes(false)
	opts.SetMaxOpenFiles(5)
  1. Also I forgot to mention this thing that we are creating around 200 tables with this configuration.
for i := 00; i <= 99; i++ {
		db, err := NewRocksDB(BasePath+"/table_"+idx, "Pentagram")
		PentagramDB[i] = db

		db1, err := NewRocksDB(BasePath+"/table_"+idx, "Cluster")
		ClusterDB[i] = db1
	}

Your comments will be insightful if you can recommend what should be optimal values for options considering 62 GB of free space.

  1. In the documentation, it is mentioned that
    This fork contains no defer in codebase (my side project requires as less overhead as possible). This introduces a loose convention of how/when to free c-mem, thus breaking the rule of [tecbot/gorocksdb](https://github.com/tecbot/gorocksdb).

    Is this in any way affecting memory consumption? If yes, what will be the alternative to this?

from grocksdb.

aureliar8 avatar aureliar8 commented on May 21, 2024

I find it hard to believe that this go code creates a rocksdb instance that generates the logs you previously send.

In the go code

bbto.SetBlockCache(grocksdb.NewLRUCache(31457280)) // 30MiB

In the rocksdb logs

Block cache LRUCache@0x24cd180#2984523 capacity: 3.00 GB ...

If each rocksdb instance is indeed having a cache of 3.00GB, then the total memory needed by this LRU cache is 200*3GB = 600GB

You could try to rerun your experiment with a single rocksdb instance and see where the memory usage stops. Then you'll need to have this low enough so that it can be multiplied by 200.

Alternatively you can change a bit the architecture of your code by having less rocksdb instances. The column family feature
might heal you at creating disjoint "tables" in a single rocksdb instance.

Or I think it's also possible to make these these 200 rocksdb instance share their ressources (caches, buffer) but you'd have to look at the documentation.

from grocksdb.

Akhilesh53 avatar Akhilesh53 commented on May 21, 2024

@aureliar8 Based on the comments received from your side I changed the configuration. The logs I have shared previously had different configs as mentioned in the rocksdb log file.

Plus I am setting this in readoptions
ro := grocksdb.NewDefaultReadOptions()
ro.SetFillCache(false)

from grocksdb.

aureliar8 avatar aureliar8 commented on May 21, 2024
  1. In the documentation, it is mentioned that [...]
    Is this in any way affecting memory consumption? If yes, what will be the alternative to this?

This should have no significant impact

from grocksdb.

Akhilesh53 avatar Akhilesh53 commented on May 21, 2024

I have experimented by 5 different approaches for only one table (one table will create two rocksdb instance)which has a number of records

part 1: Flush after all records processed. Quick but memory also increasing rapidly
part 2: Flush after every 1000 records processed. Quick but memory also increasing rapidly

part 3: Flush after every insert and after every 1000 records
        Too Slow

part 4: Flush after every insert 
        Slow and steadily memory increasing

part 5: No manual flush

            Quick but memory also increasing rapidly

You can see the logs in attached file.
RocksDB Details.zip

Rocksdb Configuration.

bbto := grocksdb.NewDefaultBlockBasedTableOptions()
        //todo:
        // checkout the value for LRUCache and options
        bbto.SetBlockCache(grocksdb.NewLRUCache(31457280))

        filter := grocksdb.NewBloomFilter(10)
        bbto.SetFilterPolicy(filter)

        opts := grocksdb.NewDefaultOptions()
        opts.SetBlockBasedTableFactory(bbto)
        opts.SetCreateIfMissing(true)
        opts.EnableBlobFiles(true)
        opts.EnableBlobGC(true)
        opts.IncreaseParallelism(4)
        opts.SetMaxWriteBufferNumber(4)
        opts.SetMinWriteBufferNumberToMerge(1)
        opts.SetRecycleLogFileNum(4)
        opts.SetWriteBufferSize(64 << 20)
        opts.SetWritableFileMaxBufferSize(0)
        opts.CompactionReadaheadSize(2097152)
        opts.SetMaxBackgroundJobs(2)
        opts.SetMaxTotalWalSize(1073741824)
        opts.SetBlobCompactionReadaheadSize(2097152)
        opts.SetDbLogDir(dataDir + "/" + name)
        opts.SetInfoLogLevel(grocksdb.InfoInfoLogLevel)
        opts.SetStatsDumpPeriodSec(180)
        opts.EnableStatistics()
        opts.SetLevelCompactionDynamicLevelBytes(false)
        opts.SetMaxOpenFiles(5)

from grocksdb.

yihuang avatar yihuang commented on May 21, 2024

We are also experiencing suspected memory leaking with our rocksdb based app, haven't investigated deep yet though.

from grocksdb.

linxGnu avatar linxGnu commented on May 21, 2024

@Akhilesh53

Please use Column Family instead of creating 100 instances of RocksDB like this:

for i := 00; i <= 99; i++ {
		db, err := NewRocksDB(BasePath+"/table_"+idx, "Pentagram")
		PentagramDB[i] = db

		db1, err := NewRocksDB(BasePath+"/table_"+idx, "Cluster")
		ClusterDB[i] = db1
}

from grocksdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.