k3s-io / kine Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 215.0 2.25 MB

Run Kubernetes on MySQL, Postgres, sqlite, dqlite, not etcd.

License: Apache License 2.0

Go 84.10% Dockerfile 0.81% Shell 10.49% Python 3.98% Makefile 0.62%

kine's People

Contributors

Stargazers

Watchers

Forkers

galal-hussein yue9944882 adohe-zz mars1024 fyery-chen aylei shirlies winlinvip ibuildthecloud ibrokethecloud jdolitsky shashidharatd chenai001 narqo chalharu ebauman daxmc99 imikushin simonferquel justenwalker oneinfra oats87 developgo freeekanayaka laashub-soa jtcressy steelcrow leolb-aphp kmodules hyprcubd datadevopscloud angelbarrera92 gitlawr hammermeetnail imiller31 pradipkumarmodi-fk arunsippygmail linus5 monzelmasry david-becher pierredavidbelanger anzersy ncopa toonsevrin bnulwh stevefan1999-personal brandond paulgmiller jangocheng chrisrx giantswarm briandowns neujie tehcyx chris-sanders zhutony merlindmc devopstoday11 dweomer clix-dev-llc doytsujin isgasho maneeshs kabassociates cybernetics khenidak catundercar jeffwan douglasmakey wenerme kubematic llhhbc heartshare kompute liorokman eyalnathan databerrydev afritzler zqzten chanwit pinghe csams linka-cloud alexstocks suqcnn gunnarif maniacs-oss amesh-io michaelwasher phoenixredflash matthewdevenny adamveld12 mudler luthermonson sindweller rosskirkpat alexrogalskiy sseif-rh besich u240

kine's Issues

Where do we follow Kine?

Given:

NOTE: this repository has been recently (2020-11-19) moved out of the github.com/rancher org to github.com/k3s-io supporting the acceptance of K3s as a CNCF sandbox project.

Please advise where active development or correct git repo for Kine is? Thanks

Compilation failure with NOCGO

Compilation fails when exporting NOCGO due to missed changes to New() function signature on sqlite stub backend:


# github.com/rancher/kine/pkg/endpoint
--
168 | pkg/endpoint/endpoint.go:129:28: too many arguments in call to sqlite.New
169 | have (context.Context, string, generic.ConnectionPoolConfig)
170 | want (context.Context, string)
171 | time="2020-08-03T19:11:51Z" level=fatal msg="exit status 2"

[feature request] go-memdb backend

I want to know if we can add a new backend using go-memdb as in-memory database.

https://github.com/hashicorp/go-memdb

Provides the memdb package that implements a simple in-memory database
built on immutable radix trees. The database provides Atomicity, Consistency
and Isolation from ACID. Being that it is in-memory, it does not provide durability.
The database is instantiated with a schema that specifies the tables and indices
that exist and allows transactions to be executed.

The database provides the following:

Multi-Version Concurrency Control (MVCC) - By leveraging immutable radix trees
the database is able to support any number of concurrent readers without locking,
and allows a writer to make progress.

Transaction Support - The database allows for rich transactions, in which multiple
objects are inserted, updated or deleted. The transactions can span multiple tables,
and are applied atomically. The database provides atomicity and isolation in ACID
terminology, such that until commit the updates are not visible.

Rich Indexing - Tables can support any number of indexes, which can be simple like
a single field index, or more advanced compound field indexes. Certain types like
UUID can be efficiently compressed from strings into byte indexes for reduced
storage requirements.

Watches - Callers can populate a watch set as part of a query, which can be used to
detect when a modification has been made to the database which affects the query
results. This lets callers easily watch for changes in the database in a very general
way.

postgres db backend connection pooling

Hello,
using k3s (single server + 2 agents) with an external postgresdb
Im encountering an (unrelated) issue with very frequent dns lookup to our dns servers.

Zooming in, i see kine is opening /closing many connections. Is there a way to configure datastore to pool postgresql connections ?

Using bigint for MySQL/MariaDB id column

Since installing a K3s cluster (2 control panes, 3 workers) with MariaDB 10.3 as a datastore backend, I can see the id column incrementing faster than expected. After several weeks I already hit 65 milllion with about 6000 rows.
Of course, 65 mio is not 2 billion, which would be the maximum for a signed int column at MariaDB, but I wonder if it would be safe to change the column type of id, create_revision and prev_revision to bigint unsigned.

Could you think of unintended side effects or is there anything I'm not aware of that would make this undesirable?

DynamoDB backend

Want to see if this might be considered, seeing as how there was some prior discussion on this: kubernetes/kubernetes#53162, etcd-io/etcd#10321

Best way to benchmark?

Hey, really like the tool. Any thoughts on the best way to benchmark so performance can be compared directly with etcd?

I'm having some success with etcd benchmark, but since not everything is implemented only some of the tests complete. Anything that requires a put will fail. Thoughts?

$ benchmark watch
 1000 / 1000 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 0s
Watch creation summary:

Summary:
  Total:	0.1874 secs.
  Slowest:	0.0177 secs.
  Fastest:	0.0001 secs.
  Average:	0.0018 secs.
  Stddev:	0.0023 secs.
  Requests/sec:	5335.9095

Response time histogram:
  0.0001 [1]	|
  0.0019 [757]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.0036 [162]	|∎∎∎∎∎∎∎∎
  0.0054 [28]	|∎
  0.0072 [18]	|
  0.0089 [7]	|
  0.0107 [7]	|
  0.0124 [8]	|
  0.0142 [3]	|
  0.0160 [0]	|
  0.0177 [9]	|

Latency distribution:
  10% in 0.0005 secs.
  25% in 0.0008 secs.
  50% in 0.0012 secs.
  75% in 0.0019 secs.
  90% in 0.0029 secs.
  95% in 0.0054 secs.
  99% in 0.0125 secs.
  99.9% in 0.0177 secs.
 0 / 1000000 B                                                                                                                                                                                            !   0.00%{"level":"warn","ts":"2020-05-23T16:03:59.596-0400","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-a774489f-5514-4646-acaf-f9d93b860e8c/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = put is not supported"}
panic: rpc error: code = Unknown desc = put is not supported

goroutine 2210 [running]:
go.etcd.io/etcd/tools/benchmark/cmd.benchPutWatches.func3(0xc000634480, 0xc00021cb90, 0xc0002289c0)
	/home/dave/go/src/go.etcd.io/etcd/tools/benchmark/cmd/watch.go:222 +0x1f9
created by go.etcd.io/etcd/tools/benchmark/cmd.benchPutWatches
	/home/dave/go/src/go.etcd.io/etcd/tools/benchmark/cmd/watch.go:216 +0x2d4

question: can the watch interface watches the key prefix instead the key itself?

It seems the Watch API implementation doesn't use the r.RangeEnd.

See

kine/pkg/server/watch.go

Line 49 in 4023835

 func (w *watcher) Start(ctx context.Context, r *etcdserverpb.WatchCreateRequest) { 

I was wondering whether the key prefix watching can be supported, or the watch supporting it implicitly?

kine always cause 1032 error in mysql slave mode

Dear all,

recently we found when we work k3s with mysql replica mode. the kine always show errors like this:

          Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table kube_prod.kine; Can't find record in 'kine', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000051, end_log_pos 19635315
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 19486733
              Relay_Log_Space: 22739548
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Delete_rows event on table kube_prod.kine; Can't find record in 'kine', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000051, end_log_pos 19635315
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1

currently we use skip-slave-errors argument to skip this error, but we want know what happend and how to aviod this problem.

Fail to run k3s using mysql with SSL (... it doesn't contain any IP SANs)

k3s-io/k3s#1763

There are two issues here:

kine does no provide ServerName for tlsConfig, so can't verify by name...
kine does not set InsecureSkipVerify to true when dsn is with param ?tls=skip-verify

Relative refs:

https://github.com/rancher/kine/blob/3faf3a7028014a5baf96454b2b3fe04984ebc69f/pkg/drivers/mysql/mysql.go#L156
https://github.com/rancher/kine/blob/3faf3a7028014a5baf96454b2b3fe04984ebc69f/pkg/tls/config.go#L15
https://github.com/etcd-io/etcd/blob/6e800b9b0161ef874784fc6c679325acd67e2452/pkg/transport/listener.go#L72
https://github.com/rancher/k3s/blob/fcb864a5e20f69a4d5b19feb7f265abb9086b749/pkg/cli/server/server.go#L101

CockroachDB causes problems with K3s using Postgres driver

I'm able to connect to CockroachDB with K3s and Kine, however K3s will not work with CockroachDB. I'm not sure what I can provide here beside output.

I do get a lot of RBAC errors like this:

heduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.312697   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.312887   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.325938   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.333258   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.342171   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.356777   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.362120   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.367089   14160 reflector.go:153] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:246: Failed to list *v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.387236   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.391674   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope

After a while, I get some pq errors:

Jul 11 10:16:13 virt-0 k3s[14160]: time="2020-07-11T10:16:13.396427502-04:00" level=error msg="error while range on /registry/deployments /registry/deployments: pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: time="2020-07-11T10:16:17.834086756-04:00" level=error msg="error while range on /registry/configmaps/kube-system/k3s : pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: time="2020-07-11T10:16:17.834596643-04:00" level=error msg="error while range on /registry/ranges/servicenodeports : pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: time="2020-07-11T10:16:17.834881953-04:00" level=error msg="error while range on /registry/namespaces/default : pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.836634   14160 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"pq: internal error while retrieving user account"
, Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jul 11 10:16:17 virt-0 k3s[14160]: time="2020-07-11T10:16:17.835201049-04:00" level=error msg="error while range on /registry/ranges/serviceips : pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.838544   14160 repair.go:100] unable to refresh the service IP block: rpc error: code = Unknown desc = pq: internal error while retrieving user account
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.837511   14160 repair.go:73] unable to refresh the port allocations: rpc error: code = Unknown desc = pq: internal error while retrieving user account
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.838010   14160 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"pq: internal error while retrieving user account"
, Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.840814   14160 leaderelection.go:331] error retrieving resource lock kube-system/k3s: rpc error: code = Unknown desc = pq: internal error while retrieving user account

Additionally, using certificate auth, K3s will eventually restart waiting for some CRDs to complete, I'm not sure that is specific to CockroachDB though.

Compaction not working on Cockroach DB?

I am running a kube-apiserver via kine on top of a CockroachDB. In the kine logs I see a lot of those errors:

time="2021-05-25T13:41:47Z" level=error msg="Compact failed: failed to compact to revision 82: pq: at or near \"where\": syntax error: unimplemented: this syntax"
time="2021-05-25T13:46:47Z" level=error msg="Compact failed: failed to compact to revision 290: pq: at or near \"where\": syntax error: unimplemented: this syntax"
time="2021-05-25T13:51:47Z" level=error msg="Compact failed: failed to compact to revision 498: pq: at or near \"where\": syntax error: unimplemented: this syntax"
time="2021-05-25T13:56:47Z" level=error msg="Compact failed: failed to compact to revision 705: pq: at or near \"where\": syntax error: unimplemented: this syntax"

I am running the latest master build of kine and CockroachDB version V21.1.1. sql.defaults.serial_normalization in my CockroachDB cluster is set to sql_sequence.

Consider recognize `postgresql://` network protocol

After a few hours of debugging storage backend is not defined error, we have just found that kine currently does not recognize postgresql:// protocol , but does postgres://.

correct: postgres://admin:[email protected]:5432/defaultdb

incorrect: postgresql://admin:[email protected]:5432/defaultdb

can we implement the watch by trigger for PostgresDB?

Hi,
Very glad to see kine now support postgresql! By reading the code, if I am correct, it is still implementing watch by selecting the table. Since postgresDB can support trigger, my question is that is it possible to define a trigger on the kine table, so that any insert/update/delete can trigger an action/event so we don't have to do the query all the time. Are there any disadvantage of this approach? Thank you.

PostgreSQL consider rewrite subquery to nested query with order by - 1000x+ performance improve

original

SELECT (
           SELECT MAX(rkv.id) AS id
           FROM kine AS rkv),
       (
           SELECT MAX(crkv.prev_revision) AS prev_revision
           FROM kine AS crkv
           WHERE crkv.name = 'compact_rev_key'),
       kv.id AS theid,
       kv.name,
       kv.created,
       kv.deleted,
       kv.create_revision,
       kv.prev_revision,
       kv.lease,
       kv.value,
       kv.old_value
FROM kine AS kv
         JOIN (
    SELECT MAX(mkv.id) AS id
    FROM kine AS mkv
    WHERE mkv.name LIKE '/registry/events/%'
    GROUP BY mkv.name) maxkv
              ON maxkv.id = kv.id
WHERE (kv.deleted = 0 OR 'f')
ORDER BY kv.id ASC
LIMIT 2;

execute in 53s

by using nested - same result

select *
from (SELECT (
                 SELECT MAX(rkv.id) AS id
                 FROM kine AS rkv),
             (
                 SELECT MAX(crkv.prev_revision) AS prev_revision
                 FROM kine AS crkv
                 WHERE crkv.name = 'compact_rev_key'),
             kv.id AS theid,
             kv.name,
             kv.created,
             kv.deleted,
             kv.create_revision,
             kv.prev_revision,
             kv.lease,
             kv.value,
             kv.old_value
      FROM kine AS kv
               JOIN (
          SELECT MAX(mkv.id) AS id
          FROM kine AS mkv
          WHERE mkv.name LIKE '/registry/events/%'
          GROUP BY mkv.name) maxkv
                    ON maxkv.id = kv.id
      WHERE (kv.deleted = 0 OR 'f')) t
ORDER BY id ASC
LIMIT 2;

execute in 4ms

full analyse https://explain.depesz.com/s/HUEf

metrics support

In production we want to collect some key metrics of kine for monitoring and alerting, such as the DBStats of Go SQL backend used by kine, SQL operation times and errors etc. Would it be applicable that we add some prometheus metrics for them?

For library mode, we can add an injectable registerer in endpoint config and for stand-alone, we can setup our own metrics handler like etcd.

[feature request] Backup support for kine

I use k3s with sqlite. But I would like to migrate to embedded etcd. I tried:
ETCDCTL_API=3 ./etcdctl --endpoints unix:///var/lib/rancher/k3s/server/kine.sock snapshot save snapsht.etcd

But received error:

{"level":"warn","ts":"2021-05-25T22:13:17.511+0200","caller":"clientv3/maintenance.go:210","msg":"failed to receive from snapshot stream; closing","error":"rpc error: code = Unimplemented desc = unknown service etcdserverpb.Maintenance"}
Error: rpc error: code = Unimplemented desc = unknown service etcdserverpb.Maintenance

It would be great if this metod is implemented, as migration from one storage to another would be quite easy.

Illegal resource version from storage: 0

I have been running k3s with cockroachdb for just over a month now and it has been working well.

Today, I restarted the cockroach cluster during a rolling upgrade to v20.2.4 and restarted k3s, and when it came back up - I get this error spamming my k3s log (for a ton of different types, but for example):

cacher (*core.ConfigMap): unexpected ListAndWatch error: failed to list *core.ConfigMap: illegal resource version from storage: 0; reinitializing...

And the kubectl client will often receive:

Error from server: illegal resource version from storage: 0

I tried rolling back my database version back to v20.2.3 and state to a backup from a week prior, to no avail. Important to note that it does not seem to be every object, as I can still list many resources. However it is a consistent subset that result in this error.

cockroachdb: v20.2.3
k3s: v1.20.2-k3s1-arm64

Usage documentation

Hi I am checking this repo to act as Kubernetes backend storage. I am trying to see there's more user documentation or best practice?

Seems kind doesn't create an upstream factory to create client here. https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory

Instead, it expects kubernetes to use etcd client to talk to kine server? In this case, does kine needs to replicate some etcd server features to make it work as expected?

Why not to implement storage.Interface directly? https://github.com/kubernetes/kubernetes/blob/4e72a35b35796af5e65992bb6e586403c87930de/staging/src/k8s.io/apiserver/pkg/storage/interfaces.go#L159

Superfluous index definition

Just scanning through repo and found superfluous index definition in most of the drivers:

CREATE INDEX IF NOT EXISTS kine_name_index ON kine (name), // Superfluous
CREATE INDEX IF NOT EXISTS kine_name_id_index ON kine (name,id),

First index 'kine_name_index' is superfluous as RDBMSes can use second one 'kine_name_id_index' efficiently when searching just by 'name'. Removing this index can improve noticeably Kine insert performance and search by name performance would not be affected.

Add BadgerDB driver

BadgerDB is an embeddable, fast, pure-Go key-value database. At least from a brief look, it feels that it might fit well as an alternative to sqlite:

it's pure Go, meaning it doesn't (necessarily) require cgo, which looks attractive in cases similar to described in #14
it supports concurrent ACID transactions (I haven't dig deep into the implementation of logstructured.Log, so can't say if that's important for a kine's driver).

See "Design" secsion of BadgerDB's README.

Note, BadgerDB v2 uses optional zstd compression, which currently requires cgo. Refer to dgraph-io/badger#1162 on switching to pure Go zstd.

Nevertheless, I don't think zstd level of compression is required for kine/k3s cases.

Retries in generic driver

Thank you for kine! I have a couple of questions on the retry policy used.

I am trying to understand the retry time interval set in [1]. Trying the back-off strategy used (backoff.Linear(100 + time.Millisecond)) it seems too aggressive to me [2], but I do not have any data to prove this. It seems we will retry 20 times the same operation within 171 milliseconds. The backoff.Linear() takes a Duration as parameter and we currently pass in 1 millisecond and 100 nanoseconds. Aren't the 100 nano seconds shadowed by the 1 millisecond? I suspect the intention was to have something like 100 * time.Millisecond.
We retry only the execute() call [3]. What about the query() [4] and queryRow() [5] calls? Shouldn't these be retried as well?

Thanks

[1] https://github.com/k3s-io/kine/blob/master/pkg/drivers/generic/generic.go#L256
[2] https://play.golang.org/p/pBxe-DUqhy-
[3] https://github.com/k3s-io/kine/blob/master/pkg/drivers/generic/generic.go#L250
[4] https://github.com/k3s-io/kine/blob/master/pkg/drivers/generic/generic.go#L240
[5] https://github.com/k3s-io/kine/blob/master/pkg/drivers/generic/generic.go#L245

Don't use public schema

I created this issue in rancher/k3s but it may be more appropriate here. It would be nice if we could override the schema that should be used by kine without having to run sql queries on the database.

Expected disk write rate.

Hi all,

Just wondering how much diskio would be expected from a two server setup?.

When I inspect my Nas, with mariadb. It seems to sit pretty constant at 5mb/sec write. Is this normal or high?.

When running on things like raspberry pis, you can wear out SD cards and the like. Is the disk usage similar for single server and sqllite?

Microsoft SQL support

Would it be desired to add in sqlserver as a backend?. Kine seems really handy for situations in which running etcd is tricky.

I think it's not too disamilar to existing SQL backends, so hopefully isn't too bad, but it's there any philosophical reasons against it?. Or just a no one has needed it yet?.

[Help Wanted] Watch paused when encoutering a large gap of incremental id

Currently we are using OceanBase as a MySQL flavored db backend of kine, this db is able to act almost the same as MySQL except one key difference: it cannot promise that the incremental id would be incremented with step 1. For example, in normal cases, the id colomn of table kine will be 1->2->3->... as inserting, but when OB switches its master (may happen several times a day), the next id will be a million number, resulting the id colomn to be 1->2->1000001->1000002->..., and this will cause kine ending up to insert numerious gaps and pause the whole watch.

So here's my questions:

In normal cases, which situation will lead to a MODREVISION GAP?
What's the purpose of filling gaps (as it will soon be deleted by compact)?
In the case above, is there a way to avoid watch pause?

Thoughts on scaling kine

Thanks for building this @ibuildthecloud !

After reading up on the etcd data model, I wonder if it will be possible to run kine in a HA setup. Have you tried that? Any thoughts?
Do you think it will be possible to store different data types in different tables so that it can scale well. Also, I am only curious about Postgres. Have you tried using jsonb data type so one can write reach queries against the table directly?

Not possible to specify sqlite db location

It seems like it is impossible to specify the location for the sqlite database. It would be nice if one of those or both could work:

kine --endpoint /var/lib/kine/state.db?more=rwc&_journal=WAL&cache=shared
kine --endpoint file:///var/lib/kine/state.db?more=rwc&_journal=WAL&cache=shared

question: why kine do not use db transaction when it implement TXN interface

Investigate unexpected resource disappearance

We have observed an instance of a single Kubernetes resource unexpectedly disappearing from the apiserver. Due to lack of apiserver audit logs it is unclear if it was removed at the Kubernetes level by a client operation, or if it was somehow incorrectly deleted by compaction code. K3s logs do not show any errors during the period in which we suspect the object was removed.

The backend datastore was a two-node AWS RDS Aurora MySQL cluster running version 5.7.mysql_aurora.2.09.0. K3s version was k3s v1.18.10+k3s2.

We need to do additional testing of multi-node database clusters to ensure that all operations are safe when multiple kine instances are connected to a database cluster. In initial review of the code does not turn up any remaining obviously unsafe operations. There were some questions about transaction safety of inserts, but an initial cursory review of the go-mysql code suggests that this should be safe to use the way we're using it.

gz#13317

Scale to zero with NATS and K3

I use NATS and it’s also a msg just like Kubernetes

It’s easily to use it to scale to zero to achieve google cloud run like architecture . This is because if all rpc and events go through nats, you can check if a resource is “up” before the nats consumer is “there” to consume the rpc or event.

kine is probably not the right repo to raise this issue however I am seeking advice as to where I should look to help work on this feature in the l3s ecosystem of systems …

Any performance benchmark data?

How kine work in production with hundreds of pods or thousands of pods?
Is there any performance benchmark data？

Move value and old_value blobs out of main table

Best practice for most databases suggests not storing blobs alongside other data. The current value and old_value columns should be moved out to another table.

The current model also sees us make a copy of the blob value on every update - the value from the previous revision is re-inserted into the database as old_value on the new revision. This could be optimized out if we only had to insert the new value, and old_value was simply a pointer to an existing row in the blob table.

Allow for custom (propietary) driver

I would like to implement custom drivers with Kine. Unfortunately, it's not trivially for someone to extend Kine without forking. This is because the logic is hard coded: https://github.com/k3s-io/kine/blob/v0.6.0/pkg/endpoint/endpoint.go#L120

I believe it would be fairly easy to move the "inverted" model where the drivers have an init function that registers them (a la how SQL drivers work in Golang). I was curious as to whether people would be interested in this approach?

High Availability with an External DB with CockroachDB

starting kubernetes: preparing server: creating storage endpoint: building kine: storage backend is not defined

CockroachDB is supports lib/pq driver, but k3s is not running with CockroachDB

Any special consideration of Implementing PutRequest?

kine/pkg/server/kv.go

Lines 100 to 102 in 27bd5e7

 func (k *KVServerBridge) Put(ctx context.Context, r *etcdserverpb.PutRequest) (*etcdserverpb.PutResponse, error) { 

 return nil, fmt.Errorf("put is not supported") 

 }

Hi, @brandond

I am reading the code of Kine and I notice Put operation is not implemented.
I know you can do write ops by using Txn, which is already implemented in Kine.
I am just curious about the reason for not implementing Put.
(1) Is there any special challenge in implementing Put?
Or (2) is it just because the Txn implementation is already sufficient, i.e. KubeAdm will never call single put?

Thanks

The active-active mysql cluster primary key conflict because of the auto_increment_increment parameter not working

I make a active-active mysql cluster use for K3S datastore, to avoid primary key conflicts,I added the following parameters to mysql:
Mysql 1 my.cnf:
auto_increment_offset = 1
auto_increment_increment = 2

Mysql 2 my.cnf:
auto_increment_offset = 2
auto_increment_increment = 2
When I use insert SQL to test, it works well.
But when I use K3S, the parameter not working.
Why does the KINE make the parameter not work?

K3s does not work with nats 2.8.3

Unsure what the issue is, but K3s fails to come up when using docker.io/library/nats:2.8.3; docker.io/library/nats:2.7.4 seems to work fine as does docker.io/library/nats:2.8.2

The terminal error is:

FATA[0064] failed to start controllers: failed to create new server context: failed to register CRDs: the server was unable to return a response in the time allotted, but may still be processing the request (get customresourcedefinitions.apiextensions.k8s.io)

I am just using the nats docker image: docker run --rm -it docker.io/library/nats:2.8.3 -js

Is there a beginner's guide/tutorial to run Kubernete with Kine

Update: I think I find the https://github.com/k3s-io/kine/blob/master/examples/minimal.md
Let me first try this tutorial to see whether I can run it.

I have a similar question as #112

Since Kine is supposed to "be ran standalone so any k8s (not just k3s) can use Kine", I am wondering whether you development staff could provide a Quick Start Guide for the readers/users to launch a demo easily, e.g. how to run a small Kubnernete cluster with MySQL/etcd/dqLite as the backend using Kine

I have searched the markdown in this repo, and it seems there is no such tutorials.

Publish official kine images

We should hook up CI on this repo and push rancher/kine images so people can use them as a sidecar for k8s apiservers.

Vault backend

Since Vault's key-value store is API based would a vault driver backend be possible/easy?

Investigate removing goroutine-based TTL expiry model

The current model of starting a goroutine for each entry with a TTL, and then waking the goroutine to attempt deletion, is non-optimal. The goroutines are not cancelled when a newer revision is create. It creates a large number of goroutines when k3s is starting up, including for revisions that are no longer current. It also resets the TTL on startup, since the creation time is not tracked.

An alternative approach involving a dedicated TTL expiry goroutine may be more efficient. Fixing the TTL reset issue probably requires altering the schema to include a creation time.

Heavy usage against MariaDB and unoptimized queries

I am running a multi master k3s setup on three Pi's to see how it works. But I have noticed that my MariaDB server are getting a lot more traffic than I would expect for such a setup. My MariaDB instance only has k3s/kine traffic at the moment, yet the average for the last 14 days is 101 queries per second. Which is primarily SELECTS (90%), and INSERTS(10%).

But there are also a rather large amount of queries using table scans and table sorting, which makes it seem that the indexes are not optimal.

The database itself is only as little as 9MB, so i expected it to fit in nicely in disk cache, but the frequent amount of writes makes that void. I also activated query cache but that is of course also voided quickly due to the updates.

Is this expected behavior? If not, what kind of information is it of interest that i gather?

rancher/kine:v0.6.0 - sqlite disabled

rancher/kine:v0.6.0@8ad8a9a04bd6d01d64dc2cbbe4daceacfa532768872a288614fa6659267e7b46 returns:

# docker run --rm -ti rancher/kine:v0.6.0 --endpoint sqlite:///db.sqlite
FATA[0000] building kine: this binary is built without CGO, sqlite is disabled

Am I using the wrong image?

API translation to OrbitDB

Hi Folks,

I wonder if anyone has already considered doing the API translation to orbitdb. Maybe a closer project maintainer can answer this in short. Thanks

Cheers, Flavio

state.db-wal grown too large

Hi guys ,any idea for k3s-io/k3s#1575

In my workground, there have the same problem. :(

Implement etcd version endpoint

Hi Rancher team!

I am testing this etcdshin using kubeadm. It works!!!

But every time I use it with kubeadm i have to use the following flag: --ignore-preflight-errors=ExternalEtcdVersion because kine seems like does not implement this endpoint.

Is this intended? What version of etcd is implemented? 3.X? I can PR if it should be implemented.

Lessons learned in using a different storage backend for Kubernetes

Hi, I find this a very interesting project and am wondering if you have a write-up or similar document somewhere with your lessons learned in supporting a different storage backend for Kubernetes.

Im using Kine as a starting point to create a no-sql driver to support other key-value databases next to etcd (ie. Riak), and it can be difficult to determine why exactly some things are not working yet while developing. Specifically, Im curious what etcd API endpoints (Watch, Lease, Deletes, Transaction..) and primitives (mod_revision, version, ...) really need to be implemented properly to get a minimal cluster up (1 master, 1 worker, 1 nginx Pod running).

Very interested in your experiences! That would also help me with estimating if this is viable in the short time available to me.

Currently I added a minimal Create, Get, Update, and List, but that does not appear to be enough:

Dec 28 20:13:04 master-1 k3s[29278]: E1228 20:13:04.326211   29278 kubelet.go:2291] "Error getting node" err="node \"master-1\" not found"

Thanks! And again, great project!

Compaction should run in a database transaction

Right now compaction is handled by executing a sequence of SQL commands in isolated statements. Performance could be increased, and likelihood of issues causing compaction failure reduced, if deletion and compact_rev_key updates were done in bulk inside a transaction.

The boundary of Kine (or K3S) compared with K8S

IIUC, Kine can be regarded as a shim layer (or an adaptor) which enables the replacement of etcd with other storage backend.
I know etcd is much heavier and provides tons of APIs. I am just not sure:
(1) What etcd APIs are used by Kubenetes?
(2) Among the APIs used by Kubenetes, which have been covered (hooked) by Kine? Which have not been hooked by Kine?

Do you staff have a list of the etcd APIs used by Kubenete and the list of etcd APIs that have already been covered by Kine? I think that will be very helpful for us to understand the boundary of K3S (i.e. which K8S can do but K3S may suffer from problems)
Thanks

	func (k KVServerBridge) Put(ctx context.Context, r etcdserverpb.PutRequest) (*etcdserverpb.PutResponse, error) {
	return nil, fmt.Errorf("put is not supported")
	}