Giter VIP home page Giter VIP logo

Comments (16)

alex-zaitsev avatar alex-zaitsev commented on June 15, 2024 1

@guoshuai2016 , @lukatera , @MetalRex101 , the memory leak is not fixed yet, but we have fixed #239 in 0.9.1 release, so monitoring is not lost when metrics exporter pod is restarted. The memory leak itself is the next priority.

from clickhouse-operator.

guoshuai2016 avatar guoshuai2016 commented on June 15, 2024 1

@alex-zaitsev After pprof the heap and goroutine, we found the memory leak is caused by goroutine leak, specifically by too many opened sql.DB, which is created for each query. Every opened sql.DB will have two goroutines, like:

database/sql.(*DB).connectionResetter(0xc0000b66c0, 0x135d160, 0xc0000b2f80)
        /usr/local/go/src/database/sql/sql.go:1013 +0xfb
created by database/sql.OpenDB
        /usr/local/go/src/database/sql/sql.go:671 +0x194

goroutine 47 [select, 3528 minutes]:
database/sql.(*DB).connectionOpener(0xc0000b6600, 0x135d160, 0xc0000b2600)
        /usr/local/go/src/database/sql/sql.go:1000 +0xe8
created by database/sql.OpenDB
        /usr/local/go/src/database/sql/sql.go:670 +0x15e

Thus the quick fix solution is to close the sql.DB after query, however this is not recommended:

// and maintains its own pool of idle connections. Thus, the Open
// function should be called just once. It is rarely necessary to
// close a DB.
func Open(driverName, dataSourceName string) (*DB, error) {

So I keep the sql.DB in a global map, which works as singleton. [#270 ]

Apart from the leak, we also find the clickHouseQueryScanRows may return partial data, after Investigation, we found it caused by sql.Rows is closed by deadline context after query but before scan. So we refine to close the deadline context after scan. [#270 ]

Those changes has been tested on our Stage and Production environment, and the memory usage of metrics exporter becomes much more stable now.
Please help review the merge request.

from clickhouse-operator.

alex-zaitsev avatar alex-zaitsev commented on June 15, 2024 1

Fixed in https://github.com/Altinity/clickhouse-operator/releases/tag/0.9.2

from clickhouse-operator.

alex-zaitsev avatar alex-zaitsev commented on June 15, 2024

@MetalRex101 , could you advise on your Kubernetes version and ClickHouse cluster setup. We can not reproduce memory leak in our environment. Could you check if you see it growing on 'clickhouse-operator' or 'metrics-exporter' container? It can be checked with a command like:

# kubectl top pod <your_operator_pod_name> --containers=true -n <your_namespace>

from clickhouse-operator.

MetalRex101 avatar MetalRex101 commented on June 15, 2024

Kubenetes version:

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.9-gke.15", GitCommit:"b48a8d693e191192e27c2f807daa51b54d0b0a61", GitTreeState:"clean", BuildDate:"2019-08-12T17:49:30Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

Operator pod top:

POD                                    NAME                  CPU(cores)   MEMORY(bytes)
clickhouse-operator-6b7f548688-m5857   clickhouse-operator   8m           36Mi

Clickhouse cluster configuration:

Name:         chi-clickhouse-db-common-configd
Namespace:    clickhouse
Labels:       clickhouse.altinity.com/app=chop
              clickhouse.altinity.com/chi=clickhouse-db
              clickhouse.altinity.com/chop=0.6.0
Annotations:  <none>

Data
====
remote_servers.xml:
----
<yandex>
    <remote_servers>
        <!-- User-specified clusters -->
        <default>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-clickhouse-db-default-0-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-clickhouse-db-default-0-1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-clickhouse-db-default-1-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-clickhouse-db-default-1-1</host>
                    <port>9000</port>
                </replica>
            </shard>
        </default>
        <!-- Autogenerated clusters -->
        <all-replicated>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-clickhouse-db-default-0-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-clickhouse-db-default-0-1</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-clickhouse-db-default-1-0</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>chi-clickhouse-db-default-1-1</host>
                    <port>9000</port>
                </replica>
            </shard>
        </all-replicated>
        <all-sharded>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-clickhouse-db-default-0-0</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-clickhouse-db-default-0-1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-clickhouse-db-default-1-0</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>chi-clickhouse-db-default-1-1</host>
                    <port>9000</port>
                </replica>
            </shard>
        </all-sharded>
    </remote_servers>
</yandex>

zookeeper.xml:
----
<yandex>
    <zookeeper>
        <node>
            <host>zookeeper-0.zookeeper-headless.clickhouse</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper-1.zookeeper-headless.clickhouse</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper-2.zookeeper-headless.clickhouse</host>
            <port>2181</port>
        </node>
    </zookeeper>
    <distributed_ddl>
        <path>/clickhouse/clickhouse-db/task_queue/ddl</path>
    </distributed_ddl>
</yandex>

01-clickhouse-operator-listen.xml:
----
<yandex>
    <!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
    <listen_host>::</listen_host>
    <listen_host>0.0.0.0</listen_host>
    <listen_try>1</listen_try>
</yandex>

02-clickhouse-operator-logger.xml:
----
<yandex>
    <logger>
        <console>1</console>
    </logger>
</yandex>

from clickhouse-operator.

alex-zaitsev avatar alex-zaitsev commented on June 15, 2024

Thanks, @MetalRex101 , do you have metrics exporter pod running? If you upgraded from 0.5.0, you could miss it. You can check your operator logs as well -- if metrics exporter is missing it complains a lot. This is a bug that is already fixed.

Also, please list your CHI spec, if possible.

We are going to release 0.7.0 later this week, so we mainly test this version. Probably the memory issue is already fixed.

from clickhouse-operator.

MetalRex101 avatar MetalRex101 commented on June 15, 2024

@alex-zaitsev, what is CHI spec? How could i get that?
If New release version is coming soon, we could wait for it and try for memory leaks. If no leaks will appear, we could close this issue. Is that ok for you?

from clickhouse-operator.

alex-zaitsev avatar alex-zaitsev commented on June 15, 2024

@MetalRex101 , CHI spec is your ClickHouseInstallation resource specification, that is 'clickhouse-db' in your example

from clickhouse-operator.

 avatar commented on June 15, 2024

We have similar issue but with version 0.8.0. Our operator pod is constantly getting evicted because of too high resource usage. Now that metrics have been separated from operator, we can see that it is metrics pod which is using a lot of memory. It's using over 2GiB before getting evicted which seems way too much just for metrics.

from clickhouse-operator.

alex-zaitsev avatar alex-zaitsev commented on June 15, 2024

@lukatera , what is the size of your cluster and do you actually use monitoring? (i.e. Prometheus integration).

from clickhouse-operator.

 avatar commented on June 15, 2024

Size of the cluster is 8 shards * 2 times replicated. We use Prometheus to collect metrics from clickhouse-metrics.

from clickhouse-operator.

 avatar commented on June 15, 2024

Here's the graph of container_memory_usage_bytes{container_name="metrics-exporter"} over past day.
Screen Shot 2020-02-04 at 8 49 36 AM

from clickhouse-operator.

alex-zaitsev avatar alex-zaitsev commented on June 15, 2024

@lukatera , thanks. We are making some fixes to metrics exporter now, will look at possible memory leak issue and provide a fix.

from clickhouse-operator.

guoshuai2016 avatar guoshuai2016 commented on June 15, 2024

any update here? we also encounter the same issue.

from clickhouse-operator.

sunsingerus avatar sunsingerus commented on June 15, 2024

fix merged into 0.9.2, thanks @guoshuai2016

from clickhouse-operator.

sunsingerus avatar sunsingerus commented on June 15, 2024

Please, take a look

from clickhouse-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.