kmesh-net / kmesh Goto Github PK

View Code? Open in Web Editor NEW

366.0 16.0 48.0 42.17 MB

High Performance ServiceMesh Data Plane Based on Programmable Kernel

Home Page: https://kmesh.net

License: Apache License 2.0

Makefile 1.30% C 44.01% Go 49.40% Shell 4.88% Dockerfile 0.07% CMake 0.17% Smarty 0.17%

ebpf kernel networking traffic-management high-performance kubernetes low-overhead microservice resiliency service-mesh

kmesh's Introduction

Introduction

Kmesh is a high-performance and low overhead service mesh data plane based on eBPF and programmable kernel. Kmesh brings traffic management, security and monitoring to service communication without needing application code changes. It is natively sidecarless, zero intrusion and without adding any resource cost to application container.

Why Kmesh

Challenges of the Service Mesh Data Plane

Service mesh software represented by Istio has gradually become popular and become an important component of cloud native infrastructure. However, there are still some challenges faced:

Extra latency overhead at the proxy layer: Add 2~3ms latency, which cannot meet the SLA requirements of latency-sensitive applications. Although the community has come up with a variety of optimizations, the overhead introduced by sidecar cannot be completely reduced.
High resources occupation: Occupy 0.5 vCPU and 50 MB memory per 1000 requests per second going through the proxy, and the deployment density of service container decreases.

Kmesh Architecture

Kmesh transparently intercept and forward traffic based on node local eBPF without introducing extra connection hops, both the latency and resource overhead are negligible.

Kmesh Architecture

The main components of Kmesh include:

Kmesh-daemon: The management component per node responsible for bpf prog management, xDS configuration subscribe, observability, and etc.
eBPF Orchestration: The traffic orchestration implemented based on eBPF, supports L4 load balancing, traffic encryption, monitoring and simple L7 dynamic routing.
Waypoint: Responsible for advanced L7 traffic governance, can be deployed separately per namespace, per service.

Kmesh innovatively sinks Layer 4 and Simple Layer 7 (HTTP) traffic governance to the kernel, and build a transparent sidecarless service mesh without passing through the proxy layer on the data path.

Simple Mode

Kmesh also provide an advanced mode, which makes use of eBPF and waypoint to process L4 and L7 traffic separately, thus allow you to adopt Kmesh incrementally, enabling a smooth transition from no mesh, to a secure L4, to full L7 processing.

Advanced Mode

Key features of Kmesh

Smooth Compatibility

Application-transparent Traffic Management

High Performance

Forwarding delay 60%↓
Workload startup performance 40%↑

Low Resource Overhead

ServiceMesh data plane overhead 70%↓

Zero Trust

Provide zero trust security with default mutual TLS
Policy enforcement both in eBPF and waypoints

Safety Isolation

eBPF Virtual machine security
Cgroup level orchestration isolation

Open Ecology

Supports XDS protocol standards
Support Gateway API

Quick Start

Please refer to quick start and user guide to try Kmesh quickly.

Performance

Based on Fortio, the performance of Kmesh and Envoy was tested. The test results are as follows:

For a complete performance test result, please refer to Kmesh Performance Test.

Contact

If you have any question, feel free to reach out to us in the following ways:

Contributing

If you're interested in being a contributor and want to get involved in developing Kmesh, please see CONTRIBUTING for more details on submitting patches and the contribution workflow.

License

The Kmesh user space components are licensed under the Apache License, Version 2.0. The BPF code templates, ko(kernel module) and mesh data accelerate are dual-licensed under the General Public License, Version 2.0 (only) and the 2-Clause BSD License (you can use the terms of either license, at your option).

Credit

This project was initially incubated in the openEuler community, thanks openEuler Community for the help on promoting this project in early days.

kmesh's People

Contributors

Stargazers

Watchers

kmesh's Issues

Improve unit test coverage

What would you like to be added:

Currently there are no unit tests, we should add test coverage for each module. Basically i suggest doing this per package.

Eveyone can take part of this, and can list your interested package below

Commit apis generated from protobuf

What would you like to be added:

In kmesh we generated some golang and c api files from the proto definition. I propose we commit them into this repo. And add a CI task to check these files are in line with the original proto

Why is this needed:

These protos are not frequently updated.
Without these api files, we cannot even run go mod tidy successfully.

Split manifest out from ./build

The target is to separate deployment manifest from building script, in the long run we may need to support deploy kmesh using helm or other tools like kustomization.

Kmesh support l4auth policy

Kmesh need support the L4 access authentication policy.：
https://istio.io/latest/zh/docs/ops/ambient/usage/ztunnel/#l4auth

Support label or annotate the pod to indicate that Kmesh is in charge of traffic management.

What would you like to be added:
Support label or annotate the pod to indicate that Kmesh is in charge of traffic management.

Why is this needed:
Kmesh supports collaborating with existing mesh data plane. Consider the following scenario: a namespace has already been injected with a sidecar, and then the Kmesh data plane is injected into this namespace.

According to the design, existing pods in the namespace will continue to have their traffic governed by the sidecar. However, the traffic of new pods created in the namespace will be taken over by Kmesh, although a new pod will still have a sidecar created, as shown in the diagram.

In this situation, there needs to be a method to inform the operations team which pods' traffic is being managed by Kmesh; otherwise, confusion may arise.

Add a link about how to use kmesh in read me quick start

What would you like to be added:
In quick start here https://github.com/kmesh-net/kmesh/blob/main/README.md#quick-start we do only include contents on how to build image and deploy kmesh, bu IMO we should also add kmesh usage.

Why is this needed:

To help users and contributors quickly get started, improve UX

Update `oncn.io/mesh` to `kmesh.net/kmesh`

https://github.com/kmesh-net/kmesh/blob/main/pkg/controller/controller.go#L23C24-L25

Support remote L7 traffic management

What would you like to be added:

As discussed, in kmesh we would want to implement a proxy(maybe call waypoint) which should be able to do L7 traffic management, telemetry and security enforcement. It can be deployed on any node of k8s or outside of the kubernetes cluster.

The waypoint can be for a single service， a namespace or even shared.

Why is this needed:

Currently kmesh L7 depends on kernel enhancement, which is coupled with OS version, and it cannot handle HTTPS HTTP2 and grpc protocol well. So we need a userspace proxy.

Why can't I import kmesh's own package?

Please provide an in-depth description of the question you have:
Before I contribute to the kmesh project, I tried to set up the kmesh development environment. When I executed go mod tidy, I encountered the following error.

kmesh.net/kmesh/api/v2/endpoint: cannot find module providing package kmesh.net/kmesh/api/v2/endpoint: unrecognized import path "kmesh.net/kmesh/api/v2/endpoint": reading https://kmesh.net/kmesh/api/v2/endpoint?go-get=1: 404 Not Found

I tried many methods but none of them could resolve it. Do you have any suggestions on how to solve this?
What do you think about this question?:
Maybe there's something I'm not dealing with that's causing this.
Environment:

Kmesh version:0.1.0

The Kmesh as the proxy of the server works with the envoy client has problem.

When the Kmesh collaborates with the Envoy, the Kmesh can function as the client to bypass the Envoy and manage the Kmesh, which greatly improves the performance.
However, when the Kmesh function as the server, a problem occurs. If the client sends messages through the Envoy, the server shorts the Envoy when the Kmesh function exists. However, when the client sends messages, the Envoy may use the mtls encryption. As a result, an error occurs when the server receives messages.

Removed resources in wds may contains both Workload and Service types

What happened:

Now we only consider the removed resources are only pod or only service. This is not right, more details we can see the controlplane istiod.

kmesh/pkg/controller_workload/workload/workload_event.go

Lines 466 to 483 in e5db146

 func handleDeleteResponse(rsp *service_discovery_v3.DeltaDiscoveryResponse) error { 

 var ( 

 err error 

 ) 

 if strings.Contains(strings.Join(rsp.RemovedResources, ""), "Kubernetes//Pod") { 

 // delete as a workload 

 if err = RemoveWorkloadResource(rsp.GetRemovedResources()); err != nil { 

 log.Errorf("RemoveWorkloadResource failed: %s", err) 

 } 

 } else { 

 // delete as a service 

 if err = RemoveServiceResource(rsp.GetRemovedResources()); err != nil { 

 log.Errorf("RemoveServiceResource failed: %s", err) 

 } 

 } 

 return err

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kmesh version:
Others:

在通过daemonset方式启动kmesh时kubectl logs报错

openeuler23-03环境，在安装istio-1.19.3之后通过kubectl apply -f kmesh.yaml安装kmesh时候报错，该 kmesh-deploy daemonset可以在k8s集群中运行

但是kubectl logs kmesh-deploy-t4zcj 会出现如下报错

[root@host-192-168-100-230 docker]# kubectl logs kmesh-deploy-t4zcj
time="2023-12-08T08:39:01Z" level=info msg="options InitDaemonConfig successful" subsys=manager
time="2023-12-08T08:39:01Z" level=info msg="bpf Start successful" subsys=manager
time="2023-12-08T08:39:01Z" level=info msg="controller Start successful" subsys=manager
time="2023-12-08T08:39:01Z" level=info msg="command StartServer successful" subsys=manager
panic: runtime error: index out of range [3] with length 0

goroutine 26 [running]:
encoding/binary.littleEndian.Uint32(...)
        /usr/lib/golang/src/encoding/binary/binary.go:80
oncn.io/mesh/pkg/nets.ConvertIpToUint32({0xc00044f8f0?, 0xc0004461b0?})
        /root/kmesh/test/mugen-master/testcases/smoke-test/kmesh/oe_test_service_function/rpmbuild/BUILD/kmesh-0.0.1/pkg/nets/nets.go:34 +0x4e
oncn.io/mesh/pkg/controller/envoy.newApiSocketAddress(0xb?)
        /root/kmesh/test/mugen-master/testcases/smoke-test/kmesh/oe_test_service_function/rpmbuild/BUILD/kmesh-0.0.1/pkg/controller/envoy/ads_loader.go:138 +0xf1
oncn.io/mesh/pkg/controller/envoy.newApiClusterLoadAssignment(0xc00047d1a0)
        /root/kmesh/test/mugen-master/testcases/smoke-test/kmesh/oe_test_service_function/rpmbuild/BUILD/kmesh-0.0.1/pkg/controller/envoy/ads_loader.go:107 +0x2ad
oncn.io/mesh/pkg/controller/envoy.(*AdsLoader).CreateApiClusterByCds(0xc000500f60, 0x2, 0xc00055f000)
        /root/kmesh/test/mugen-master/testcases/smoke-test/kmesh/oe_test_service_function/rpmbuild/BUILD/kmesh-0.0.1/pkg/controller/envoy/ads_loader.go:76 +0x258
oncn.io/mesh/pkg/controller/envoy.(*ServiceEvent).handleCdsResponse(0xc0005a1860, 0xc0000d4d80)
        /root/kmesh/test/mugen-master/testcases/smoke-test/kmesh/oe_test_service_function/rpmbuild/BUILD/kmesh-0.0.1/pkg/controller/envoy/ads_event.go:150 +0x393
oncn.io/mesh/pkg/controller/envoy.(*ServiceEvent).processAdsResponse(0xc0005a1860, 0xc0000d4d80)
        /root/kmesh/test/mugen-master/testcases/smoke-test/kmesh/oe_test_service_function/rpmbuild/BUILD/kmesh-0.0.1/pkg/controller/envoy/ads_event.go:112 +0x1a9
oncn.io/mesh/pkg/controller/envoy.(*AdsClient).runControlPlane(0xc000430690, {0x1b26a90, 0xc0008cf300})
        /root/kmesh/test/mugen-master/testcases/smoke-test/kmesh/oe_test_service_function/rpmbuild/BUILD/kmesh-0.0.1/pkg/controller/envoy/ads_client.go:142 +0x15c
created by oncn.io/mesh/pkg/controller/envoy.(*AdsClient).Run
        /root/kmesh/test/mugen-master/testcases/smoke-test/kmesh/oe_test_service_function/rpmbuild/BUILD/kmesh-0.0.1/pkg/controller/envoy/ads_client.go:160 +0x125

尝试demo示例也没有看到kmesh参与流量治理

Refactor CacheFactory and CacheFlush

Why is this needed:

type CacheFactory interface {
	StatusFlush(status core_v2.ApiStatus) int
	StatusDelete(status core_v2.ApiStatus)
	StatusReset(old, new core_v2.ApiStatus)
}

CacheFactory is the interface to update and delete xds resources that received from xds control plane.

But currently the cache flushing procedure is quite hard to understand, And there are other caveats:

It does not support delta cache flush, which is useful for delta xds.
the underlying implementor of CacheFactory stores redundant resources

Questions about error return in ads_envent.go

Please provide an in-depth description of the question you have:

func (svc *ServiceEvent) handleCdsResponse(rsp *service_discovery_v3.DiscoveryResponse) error {
	var (
		err     error
		cluster = &config_cluster_v3.Cluster{}
	)

	current := sets.New[string]()
	for _, resource := range rsp.GetResources() {
		if err = anypb.UnmarshalTo(resource, cluster, proto.UnmarshalOptions{}); err != nil {
			continue
		}
		current.Insert(cluster.GetName())
		// compare part[0] CDS now
		// Cluster_EDS need compare tow parts, compare part[1] EDS in EDS handler
		apiStatus := core_v2.ApiStatus_UPDATE
		newHash := hash.Sum64String(resource.String())
		if newHash != svc.DynamicLoader.ClusterCache.GetCdsHash(cluster.GetName()) {
			svc.DynamicLoader.ClusterCache.SetCdsHash(cluster.GetName(), newHash)
			log.Debugf("[CreateApiClusterByCds] update cluster %s, status %d, cluster.type %v",
				cluster.GetName(), apiStatus, cluster.GetType())
			svc.DynamicLoader.CreateApiClusterByCds(apiStatus, cluster)
		} else {
			log.Debugf("unchanged cluster %s", cluster.GetName())
		}
	}

	removed := svc.DynamicLoader.ClusterCache.GetResourceNames().Difference(current)
	for key := range removed {
		svc.DynamicLoader.UpdateApiClusterStatus(key, core_v2.ApiStatus_DELETE)
	}

	// TODO: maybe we don't need to wait until all clusters ready before loading, like cluster delete

	if len(svc.DynamicLoader.clusterNames) > 0 {
		svc.rqt = newAdsRequest(resource_v3.EndpointType, svc.DynamicLoader.clusterNames)
		svc.DynamicLoader.clusterNames = nil
	} else {
		svc.DynamicLoader.ClusterCache.Flush()
	}
	return nil
}

In the above code Kmesh processes the cds response.
However, the error during processing is not returned, but just a return nil in the end.

func (svc *ServiceEvent) processAdsResponse(rsp *service_discovery_v3.DiscoveryResponse) {
	var err error

	log.Debugf("handle ads response, %#v\n", rsp.GetTypeUrl())

	svc.ack = newAckRequest(rsp)
	if rsp.GetResources() == nil {
		return
	}

	switch rsp.GetTypeUrl() {
	case resource_v3.ClusterType:
		err = svc.handleCdsResponse(rsp)
	case resource_v3.EndpointType:
		err = svc.handleEdsResponse(rsp)
	case resource_v3.ListenerType:
		err = svc.handleLdsResponse(rsp)
	case resource_v3.RouteType:
		err = svc.handleRdsResponse(rsp)
	default:
		err = fmt.Errorf("unsupport type url %s", rsp.GetTypeUrl())
	}

	if err != nil {
		log.Error(err)
	}
}

So in the above function, won't the final error be nil?

What do you think about this question?:
I don't know if this is a bug, or if there is another way of passing error in xds.

Environment:

Kmesh version: 0.2.0

Do we need to add issus templates?

We only have black issus when we create issus. Should we provide templates for the commonly used issus themes. Like bug report, question, security vulnerability and so on

Status HTTP server should be securely protected

Kmesh startup a server for debugging and status query here https://github.com/kmesh-net/kmesh/blob/main/cmd/command/http_server.go. it also allows updating bpf map

It is not secure to have no authz, we should set different policies for different interfaces. We can refer to istio status server to do so.

Kmesh support Workload model

Currently, Kmesh has implemented traffic governance functions for L4 and L7 through XDS protocol. However, in some scenarios, microservice applications focus more on L4 traffic governance, and L7 governance can be deployed as needed. The Istio community has launched a Workload model to provide lightweight L4 traffic governance functions, which Kmesh needs to consider supporting.

Kmesh CNI overwrite cni conflist with empty value

What happened:

After running kmesh for some time, we cannot restart kmesh daemon and also not able to start a new application pod.

By looking into kmesh logs, we do see:

time="2024-03-06T10:06:41Z" level=info msg="Copied /usr/bin/kmesh-cni to /opt/cni/bin." subsys="cni installer"
time="2024-03-06T10:06:41Z" level=info msg="failed to read conflist: /etc/cni/net.d/10-calico.conflist, error parsing configuration list: unexpected end of JSON input" subsys="cni installer"
time="2024-03-06T10:06:41Z" level=error msg="can not found the valid cni config!\n" subsys="cni installer"
time="2024-03-06T10:06:41Z" level=error msg="can not found the valid cni config!\n" subsys=manager

What you expected to happen:

Kmesh should not overwrite invalid conflist

How to reproduce it (as minimally and precisely as possible):

not sure how to reproduce

Anything else we need to know?:

Environment:

Kmesh version:
Others:

kmesh image start failed, cannot to istiod

What happened:
when I apply kmesh.yaml , a issue occurred:

[root@master config]# kubectl logs -f -n kmesh-system kmesh-deploy-gcdsn
time="2024-02-04T02:29:14Z" level=error msg="failed to get service istiod in namespace istio-system!" subsys=controller/envoy
time="2024-02-04T02:29:14Z" level=info msg="services "istiod" is forbidden: User "system:serviceaccount:kmesh-system:kmesh" cannot get resource "services" in API group "" in the namespace "istio-system"" subsys=controller/envoy
time="2024-02-04T02:29:14Z" level=info msg="options InitDaemonConfig successful" subsys=manager
time="2024-02-04T02:29:15Z" level=info msg="bpf Start successful" subsys=manager
time="2024-02-04T02:29:35Z" level=error msg="ads StreamAggregatedResources failed, rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 192.168.0.1:15010: i/o timeout"" subsys=manager

and, apply fortio pod:

Events:
Type Reason Age From Message

Normal Scheduled 15s default-scheduler Successfully assigned default/fortio-client-deployment-5dd87867f8-gtnkm to master
Warning FailedCreatePodSandBox 13s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9eda64cbd12cc5096acf1f30b6c6a8729cf233a7e1adf1dbcdb2cc74cd98e505" network for pod "fortio-client-deployment-5dd87867f8-gtnkm": networkPlugin cni failed to set up pod "fortio-client-deployment-5dd87867f8-gtnkm_default" network: failed to get k8s client: stat kmesh-cni-kubeconfig: no such file or directory

What you expected to happen:

[root@master config]# kubectl logs -f -n kmesh-system kmesh-deploy-wnpmr
time="2024-02-04T02:26:59Z" level=info msg="options InitDaemonConfig successful" subsys=manager
time="2024-02-04T02:27:00Z" level=info msg="bpf Start successful" subsys=manager
time="2024-02-04T02:27:00Z" level=info msg="controller Start successful" subsys=manager
time="2024-02-04T02:27:00Z" level=info msg="command StartServer successful" subsys=manager
time="2024-02-04T02:27:00Z" level=info msg="start write CNI config\n" subsys="cni installer"
time="2024-02-04T02:27:00Z" level=info msg="kmesh cni use chained\n" subsys="cni installer"
time="2024-02-04T02:27:02Z" level=info msg="Copied /usr/bin/kmesh-cni to /opt/cni/bin." subsys="cni installer"
time="2024-02-04T02:27:02Z" level=info msg="kubeconfig either does not exist or is out of date, writing a new one" subsys="cni installer"
time="2024-02-04T02:27:02Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-02-04T02:27:02Z" level=info msg="command Start cni successful" subsys=manager

How to reproduce it (as minimally and precisely as possible):

kubectl apply -f kmesh.yaml -n kmesh-system

kubectl apply -f fortio-client.yaml in kmesh/test/performance/long_test/config

Anything else we need to know?:

Environment:oe2303

Kmesh version:
Others:

Support mtls for service to service communication

What would you like to be added:

For east-west service communication, we should encrypt the traffic to implement zero-trust network.

As to ingress traffic through ingressgateway, we should do similar, but IMO this can be done in v0.4

Why is this needed:

Kmesh v0.2.0 needs to be restarted for configuration changes to take effect (using Istio 1.19 and Kubernetes 1.27)

What happened:

With Kmesh v0.2.0 already running, I deployed a Virtual Service for L7 url routing. Below are my two observations

The exported bpf config was NOT updated with the routing rule
The request did NOT get routed as expected.

However, after restarting Kmesh, the following was observed

The exported bpf config was updated with the routing rule
The request got routed as expected.

For reference, refer to this comment from another github issue:

#133 (comment)
#133 (comment)

What you expected to happen:

After deploying the virtual service, the bpf map should get updated and the routing should take effect WITHOUT having to restart Kmesh

How to reproduce it (as minimally and precisely as possible):

Run the below steps with Kmesh v0.2.0 already running

Step 1

Deploy 2 Backend Services httpecho-a and httpecho-b

yaml for service httpecho-a

# kubectl  apply -f service-a.yaml
apiVersion: v1
kind: Service
metadata:
  name: httpecho-a
  labels:
    app: httpecho-a
    service: httpecho-a
spec:
  ports:
  - port: 5000
    name: http
  selector:
    app: httpecho-a
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpecho-a
  labels:
    app: httpecho-a
    version: v1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpecho-a
      version: v1
  template:
    metadata:
      labels:
        app: httpecho-a
        version: v1
    spec:
      containers:
      - name: httpecho-a
        image: docker.io/istio/examples-helloworld-v1
        resources:
          requests:
            cpu: "100m"
        imagePullPolicy: IfNotPresent #Always
        ports:
        - containerPort: 5000

yaml for httpecho-b

# kubectly apply -f service-b.yaml
apiVersion: v1
kind: Service
metadata:
  name: httpecho-b
  labels:
    app: httpecho-b
    service: httpecho-b
spec:
  ports:
  - port: 5000
    name: httpb
  selector:
    app: httpecho-b
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpecho-b
  labels:
    app: httpecho-b
    version: v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpecho-b
      version: v2
  template:
    metadata:
      labels:
        app: httpecho-b
        version: v2
    spec:
      containers:
      - name: httpecho-b
        image: docker.io/istio/examples-helloworld-v1
        resources:
          requests:
            cpu: "100m"
        imagePullPolicy: IfNotPresent #Always
        ports:
        - containerPort: 5000

Step 2: Deploy netutils client for sending http requests

kubectl run --image=hwchiu/netutils netutils

Send requests to http-echo-a and httpecho-b with the netutils client. Note the difference in pod names in the response from each service

$ kubectl exec -it netutils -- curl httpecho-a:5000
Hello V1, routed from pod httpecho-a-gu83s


$ kubectl exec -it netutils -- curl httpecho-b:5000
Hello V1, routed from pod httpecho-b-t7wx5

Step 3: Deploy Virtual Service for L7 Url Routing

This rule should route all requests sent to httpecho-b to httpecho-a instead

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: route-b-to-a
spec:
  hosts:
  - httpecho-b
  http:
  - match:
    - port: 5000
    route:
    - destination:
        host: httpecho-a.default.svc.cluster.local
        port:
          number: 5000

Step 4: Send request to httpecho-b

Note the pod name in the response. If routing works, it should be httpecho-a-gu83s, and NOT httpecho-b-t7wx5.

$ kubectl exec -it netutils -- curl  httpecho-b:5000/hello
Hello V1, routed from pod httpecho-b-t7wx5

Also, export the bpf config using curl GET /bpf/kmesh/maps:15200. The newly added rule is NOT a part of the bpf config.

Step 4: Restart Kmesh and run the above command once again:

The request gets routed to httpecho-a

$ kubectl exec -it netutils -- curl  httpecho-b:5000/hello
Hello V1, routed from pod httpecho-a-gu83s

export the bpf config using curl GET /bpf/kmesh/maps:15200. The newly added rule is a part of the exported bpf config.

The same issue occurs when we try to delete the routing rule. The request keeps getting routed to httpecho-a. Once we restart kmesh, the request gets routed to httpecho-b

Anything else we need to know?:

Environment:

Kmesh version: v0.2.0
Istio: 1.19.0
Kubernetes: 1.27.0

Missing null pointer check in ads_loader.go

What happened:
Missing null pointer check in ads_loader.go
e.g.:

kmesh/pkg/controller/envoy/ads_loader.go

Lines 121 to 141 in c1c1cbf

 func newApiSocketAddress(address *config_core_v3.Address) *core_v2.SocketAddress { 

 var addr *config_core_v3.SocketAddress 

 switch address.GetAddress().(type) { 

 case *config_core_v3.Address_SocketAddress: 

 addr = address.GetSocketAddress() 

 default: 

 return nil 

 } 

 if addr == nil || !nets.GetConfig().IsEnabledProtocol(addr.GetProtocol().String()) { 

 return nil 

 } 

 return &core_v2.SocketAddress{ 

 // Protocol: core_v2.SocketAddress_Protocol(addr.GetProtocol()), 

 Port: nets.ConvertPortToBigEndian(addr.GetPortValue()), 

 Ipv4: nets.ConvertIpToUint32(addr.GetAddress()), 

 } 

 }

What you expected to happen:
add nil pointer check in ads_loader.go

Environment:

Kmesh version:0.2.0
Others:

Support build kmesh from container

When I first run build.sh to build kmesh, it installs a lot of dependencies on my VM. While istio has an image named build-tools, a developer can run make build make docker, make lint make gen, and many other cmd within the container. It can provide good contributing experience for new contributors.

So let's provide a build tools image, and support build kmesh binary/image from the container.

deserial_update_elem failed

What happened:
After kmesh daemon start up, i saw this error:

time="2024-01-16T03:56:12Z" level=info msg="options InitDaemonConfig successful" subsys=manager
time="2024-01-16T03:56:13Z" level=info msg="bpf Start successful" subsys=manager
time="2024-01-16T03:56:13Z" level=info msg="controller Start successful" subsys=manager
time="2024-01-16T03:56:13Z" level=info msg="command StartServer successful" subsys=manager
time="2024-01-16T03:56:13Z" level=info msg="start write CNI config\n" subsys=cniplugin
time="2024-01-16T03:56:13Z" level=info msg="kmesh cni use chained\n" subsys=cniplugin
time="2024-01-16T03:56:13Z" level=info msg="Copied /usr/bin/kmesh-cni to /opt/cni/bin." subsys=cniplugin
time="2024-01-16T03:56:13Z" level=info msg="kubeconfig either does not exist or is out of date, writing a new one" subsys=cniplugin
time="2024-01-16T03:56:13Z" level=info msg="wrote kubeconfig file /opt/cni/bin/kmesh-cni-kubeconfig" subsys=cniplugin
time="2024-01-16T03:56:13Z" level=info msg="command Start cni successful" subsys=manager
RouteConfiguration is not set
time="2024-01-16T03:56:22Z" level=error msg="RouteConfigUpdate deserial_update_elem failed" subsys=cache/v2
RouteConfiguration is not set
time="2024-01-16T03:56:22Z" level=error msg="RouteConfigUpdate deserial_update_elem failed" subsys=cache/v2
RouteConfiguration is not set
time="2024-01-16T03:56:22Z" level=error msg="RouteConfigUpdate deserial_update_elem failed" subsys=cache/v2
RouteConfiguration is not set
time="2024-01-16T03:56:22Z" level=error msg="RouteConfigUpdate deserial_update_elem failed" subsys=cache/v2
RouteConfiguration is not set
time="2024-01-16T03:56:22Z" level=error msg="RouteConfigUpdate deserial_update_elem failed" subsys=cache/v2
RouteConfiguration is not set
time="2024-01-16T03:56:22Z" level=error msg="RouteConfigUpdate deserial_update_elem failed" subsys=cache/v2
RouteConfiguration is not set
time="2024-01-16T03:56:22Z" level=error msg="RouteConfigUpdate deserial_update_elem failed" subsys=cache/v2
RouteConfiguration is not set
time="2024-01-16T03:56:22Z" level=error msg="RouteConfigUpdate deserial_update_elem failed" subsys=cache/v2

Not sure why the bpf map write failed

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kmesh version:
Others:

Is there a roadmap?

I'm interested in these, I'll golang and rust language

Propose make use of istio.io/istio/pkg/adsc

What would you like to be added:

Why is this needed:

istio.io/istio/pkg/adsc provides the lib for ads client, it contains both StoW and Delta clients

HashName mem leak

What happened:

In workload model, we introduced many bpf maps keyyed by str's hash value. And we used an inner object HashName to provide unique hash val, it stores a map innerly. But we can see the keys is never deleted even when a workload/service removed.

FYI: https://github.com/kmesh-net/kmesh/blob/main/pkg/controller_workload/workload/workload_hash.go

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kmesh version:
Others:

Problems with libbpf in make build

What would you like to be added:
Add a note about the libbpf version in the make image and the host libbpf version.
Why is this needed:
The libbpf used in the make image is version 0.8.0. The bpf_map_create has been changed in different libbpf versions.
If the host version is different from the one in the make image, an error will occurs.

In Kmesh v0.1.0, enable L7 Routing on Kubernetes 1.27 and Istio 1.19

What would you like to be added:
We would like Kmesh to be compatible with Kubernetes 1.27 and Istio 1.19 for L7 routing.

Why is this needed:
our production environment is running Kubernetes 1.27 and Istio 1.19. Currently, the latest Kubernetes version for which L7 routing works is 1.20, and the latest Istio version is 1.14.5 (see table at the bottom of the issue for reference).

Steps To Reproduce Issue

Setup

OS: OpenEuler 23.03
Kubernetes Version: 1.27
Istio Version: 1.19
Kmesh Version: v0.1.0
Backend Services: 2 HTTP Services httpecho-a and httpecho-b
HTTP Client: Netutils

Step 1:
Deploy two backend services httpecho-a and httpecho-b using the below configs

Yaml for httpecho-a

# kubectl  apply -f service-a.yaml
apiVersion: v1
kind: Service
metadata:
  name: httpecho-a
  labels:
    app: httpecho-a
    service: httpecho-a
spec:
  ports:
  - port: 5000
    name: http
  selector:
    app: httpecho-a
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpecho-a
  labels:
    app: httpecho-a
    version: v1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpecho-a
      version: v1
  template:
    metadata:
      labels:
        app: httpecho-a
        version: v1
    spec:
      containers:
      - name: httpecho-a
        image: docker.io/istio/examples-helloworld-v1
        resources:
          requests:
            cpu: "100m"
        imagePullPolicy: IfNotPresent #Always
        ports:
        - containerPort: 5000

Yaml for httpecho-b

# kubectly apply -f service-b.yaml
apiVersion: v1
kind: Service
metadata:
  name: httpecho-b
  labels:
    app: httpecho-b
    service: httpecho-b
spec:
  ports:
  - port: 5000
    name: httpb
  selector:
    app: httpecho-b
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpecho-b
  labels:
    app: httpecho-b
    version: v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpecho-b
      version: v2
  template:
    metadata:
      labels:
        app: httpecho-b
        version: v2
    spec:
      containers:
      - name: httpecho-b
        image: docker.io/istio/examples-helloworld-v1
        resources:
          requests:
            cpu: "100m"
        imagePullPolicy: IfNotPresent #Always
        ports:
        - containerPort: 5000

Step 2: Deploy netutils client for sending http requests

kubectl run --image=hwchiu/netutils netutils

Send requests to http-echo-a and httpecho-b with the netutils client. Note the difference in pod names in the response from each service

$ kubectl exec -it netutils -- curl httpecho-a:5000
Hello V1, routed from pod httpecho-a-gu83s


$ kubectl exec -it netutils -- curl httpecho-b:5000
Hello V1, routed from pod httpecho-b-t7wx5

Step 3: Deploy Virtual Service for L7 Url Routing

This rule should route all requests sent to httpecho-b to httpecho-a instead

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: route-b-to-a
spec:
  hosts:
  - httpecho-b
  http:
  - match:
    - port: 5000
    route:
    - destination:
        host: httpecho-a.default.svc.cluster.local
        port:
          number: 5000

Step 4: Send request to httpecho-b

Note the pod name in the response. If routing works, it should be httpecho-a-gu83s, and NOT httpecho-b-t7wx5.

$ kubectl exec -it netutils -- curl  httpecho-b:5000/hello
Hello V1, routed from pod httpecho-b-t7wx5

If you run the above reproduction steps on Kubernetes 1.20 and Istio 1.19 the request would get successfully routed to httpecho-a.

Additional Context

As a part of our investigation, we tried running the same tests on other versions of Istio and Kubernetes. Below is an overview of our findings

Support cni plugin install from kmesh daemon

IIRC, the kmesh plugin is installed from the bash script https://github.com/kmesh-net/kmesh/blob/main/build/docker/start_kmesh.sh#L7

While the plugin configuration is set here https://github.com/kmesh-net/kmesh/blob/main/pkg/cni_plg/cni_plg.go#L31

So i am wondering kmesh-cniplugin can also be installed within kmesh daemon using go code.

ip bytes converts to uint32 incorrect

What happened:
As IP address is stored with big endian, but here we use little endian to restore. https://github.com/kmesh-net/kmesh/blob/main/pkg/nets/nets.go#L56

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kmesh version:
Others:

Test bot

/assign

Add CI Checks

This is an umbrella issue for CI tasks.
There are many checks we should make in CI

make docker failed

What happened:

$ make docker
docker build --build-arg arch=amd64 -f build/docker/kmesh.dockerfile -t  .
ERROR: "docker buildx build" requires exactly 1 argument.
See 'docker buildx build --help'.

Usage:  docker buildx build [OPTIONS] PATH | URL | -

Start a build
make: *** [Makefile:134: docker] Error 1

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kmesh version:
Others:

# docker version
Client: Docker Engine - Community
Version:           23.0.3
API version:       1.42
Go version:        go1.19.7
Git commit:        3e7cbfd
Built:             Tue Apr  4 22:06:10 2023
OS/Arch:           linux/amd64
Context:           default

Server: Docker Engine - Community
Engine:
 Version:          23.0.3
 API version:      1.42 (minimum version 1.12)
 Go version:       go1.19.7
 Git commit:       59118bf
 Built:            Tue Apr  4 22:06:10 2023
 OS/Arch:          linux/amd64
 Experimental:     false
containerd:
 Version:          1.6.20
 GitCommit:        2806fc1057397dbaeefbea0e4e17bddfbd388f38
runc:
 Version:          1.1.5
 GitCommit:        v1.1.5-0-gf19387a
docker-init:
 Version:          0.19.0
 GitCommit:        de40ad0

mini test can not pass

By default, Kmesh takes effect for all sockets before. However, this behavior was modified after commit 84c496d. From the perspective of Kubernetes, you need to mark the Kubernetes namespace so that pods in the namespace can be managed by kmesh. This actually marks the classid in the pod cgroup.

The mini test simplifies the test environment. There is no Kubernetes. Therefore, the test need to manually create a cgroup to store the sockets created by which processes should be managed by Kmesh.

Use Github Actions to Automatically Sync Code Pushed to Github to Gitee

What would you like to be added:
Use Github Actions to Automatically Sync Code Pushed to Github to Gitee.
Why is this needed:
In addition to having a code repository on GitHub, the Kmesh project also has a code repository on Gitee. But the code repository on Gitee lags far behind and has a lot more missing code. And the newly released was also not synced to the Gitee repository. Therefore, we hope there is a GitHub action that can automatically push the new version code to Gitee when releasing a version.

complie error: /usr/include/bpf/bpf_helpers.h:74:2: error: '#' is not followed by a macro parameter

env:
[root@localhost kmesh]# uname -r
5.10.0-60.18.0.50.oe2203.x86_64
[root@localhost kmesh]# yum info libbpf
Last metadata expiration check: 22:58:20 ago on Fri 17 Nov 2023 11:45:50 AM CST.
Installed Packages
Name : libbpf
Epoch : 2
Version : 0.3
Release : 1.h0.oe2203
Architecture : x86_64
Size : 239 k
Source : libbpf-0.3-1.h0.oe2203.src.rpm
Repository : @System
From repo : everything
Summary : Libbpf library
URL : https://github.com/libbpf/libbpf
License : LGPLv2 or BSD
Description : A mirror of bpf-next linux tree bpf-next/tools/lib/bpf directory plus its
: supporting header files. The version of the package reflects the version of
: ABI.

[root@localhost kmesh]#

compile failed: complie error: /usr/include/bpf/bpf_helpers.h:74:2: error: '#' is not followed by a macro parameter
[root@localhost kmesh]# ./build.sh
......
In file included from ../../include/common.h:28:
/usr/include/bpf/bpf_helpers.h:74:2: error: '#' is not followed by a macro parameter
#if GNUC && !clang
^
/usr/include/bpf/bpf_helpers.h:76:2: error: #else after #else
#else
^
/usr/include/bpf/bpf_helpers.h:80:2: error: #else after #else
After the preceding error is temporarily avoided, the following error message is displayed when the compilation continues:
/home/wcy/code/kmesh/oncn-mda/ebpf_src/sock_redirect.c:23:31: error: use of undeclared identifier 'NULL'
struct sock_key *redir_key = NULL;
It looks like some basic header files are not included. pls check it, thanks.

End to end tests

What would you like to be added:

This is to improve test coverage for features, we should test each feature in black box way. I am not sure whether github action machine allow running kmesh. This may be a blocker, so first we can test it.

The tasks can be divided as：

Provide script or go code to install kubernetes and kmesh
Prepare apps and framework that we need for e2e test
Write the first test as example
Write the other cases one by one

Why is this needed:

The sample kmesh yaml file needs to be updated.

https://github.com/kmesh-net/kmesh/blob/main/build/docker/kmesh.yaml is a simple kmesh yaml.
But it's not very suitable now. After functions related to multiple data planes are added, YAML needs to operate conflist in the /etc/cni/net.d/ directory and insert the cni plugin to the corresponding /opt/cni/bin directory.
You need to add the corresponding file path mapping to the default YAML file.

Refactor: Ads and WDS should share connection

What would you like to be added:

As workload controler introduced, we almost duplicate the client implementation from ads. In order to make it simple to maintain we should refactor the client.

Why is this needed:

Feature: decouple dependency on kubeconfig file

Now kmesh daemon needs to mount kubeconfig file path to startup, and kmesh-cni has similar requirement.

It is not possible to have a kubeconfig file in real Kubernetes nodes for normal pods to use.

Foe kmesh daemon, it can use the sa mounted to communicate with kube-apiserver. And for kmesh-cni, kmesh daemon should generate a kubeconfig file from serviceaccount. And pass the path to kmesh-cni.

refer to istio https://github.com/istio/istio/blob/master/manifests/charts/istio-cni/templates/configmap-cni.yaml#L28-L31

kmesh L4 should support detailed query interface

kmesh should support capabilities to observe it's running status by daemon interface

无法编译kmesh

使用22.03 SP1进行测试，已经成功修改内核，但是运行./build.sh -b会出现如下错误：

BUILD   kmesh-daemon
# oncn.io/mesh/pkg/bpf
In file included from pkg/bpf/bpf_kmesh.go:23:
/root/kmesh/bpf/kmesh/include/kmesh_common.h:56:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma'
   56 | } outer_map SEC(".maps");
      | ^~~
/root/kmesh/bpf/kmesh/include/kmesh_common.h:56:1: error: expected identifier or '(' before '#pragma'
/root/kmesh/bpf/kmesh/include/kmesh_common.h:64:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma'
   64 | } inner_map SEC(".maps");
      | ^~~
/root/kmesh/bpf/kmesh/include/kmesh_common.h:64:1: error: expected identifier or '(' before '#pragma'
/root/kmesh/bpf/kmesh/include/kmesh_common.h: In function 'kmesh_get_ptr_val':
/root/kmesh/bpf/kmesh/include/kmesh_common.h:128:46: error: 'outer_map' undeclared (first use in this function)
  128 |  inner_map_instance = kmesh_map_lookup_elem(&outer_map, &idx);
      |                                              ^~~~~~~~~
/root/kmesh/bpf/kmesh/include/kmesh_common.h:128:46: note: each undeclared identifier is reported only once for each function it appears in
make: *** [Makefile:42: all] Error 2

请问这是否是某个依赖的版本问题？应当如何修复？

Improve ADS with istiod

The bootstrap config currently is almost static, we can only update the ads server address now. And it cannot communicate with server via a secure port.

IMO, we have several steps to make the ads more secure

Support secure ads communication with kmesh token
Support dynamic node id generate, node id is the identity correspond with pod name. "nodeType~" + ip + "~" + podname.namespace+ "~" + "namespace.svc.cluster.local"
Bring nonce when ack/nack
we donot need to request for listener after CDS handled each time. IMO, we only need send once each stream. Otherwise, istiod will duplicate listener pushes

Image Tag Mismatch Between Helm Chart and Repository

What happened:
Image Pull Failure When Installing Kmesh with Helm

Normal   Pulling      28m (x5 over 31m)      kubelet  Pulling image "ghcr.io/kmesh-net/kmesh:0.2.0"
Normal   BackOff      3m21s (x121 over 31m)  kubelet  Back-off pulling image "ghcr.io/kmesh-net/kmesh:0.2.0"

Then I find Image Tag Mismatch Between Helm Chart and Repository

In Helm Chart:

    image:
      repository: ghcr.io/kmesh-net/kmesh
      tag: 0.2.0
    imagePullPolicy: IfNotPresent

In Image Repository:

docker pull ghcr.io/kmesh-net/kmesh:v0.2.0

What you expected to happen:
Sync Image Tags Between Helm Chart and Registry
How to reproduce it (as minimally and precisely as possible):
Prepare the cluster environment, then run follow commend:

helm install kmesh ./deploy/helm -n kmesh-system --create-namespace

Anything else we need to know?:
Perhaps we should modify the script that generates the image so that it removes the v from the tag
Environment:

Kmesh version: v0.2.0
Others: istio :1.20.1

Add an indication whether a pod is in charge by kmesh

Curretly whether kmesh takes effect on a pod not only depends on the namespace label istio.io/dataplane : kmesh, but also whether the pod has sidecar injected, So i propose we add a label or annotation to the pod that kmesh will be in charge.

This could be done in the kmesh cni plugin via a pod update

Improve project readme

The project logo is too large, which wastes page space and results in poor reading experience.
Project-related tags and badges to be added, e.g. CI, SLSA level, releases, openssf best practices etc.

Support systemd cgroup driver

What would you like to be added:

In enableKmeshControl we added a markENABLE_KMESH_MARK = "0x1000"
to the net cgroup of a pod. Currently it only works on cgroups driver, but recent k8s releases make systemd as the default cgroup driver.
So suggest we automatically detect which cgroup driver kubelet use, and set the mark accordingly.

Why is this needed:

Provide ways to run integration tests

What would you like to be added:

A convenient way to run end-to-end tests is needed.

IMO，Reference to Cilium End-To-End Connectivity Testing, a suit of end-to-end tests with k8s cluster is better for us than re-implemented Mock framework, like Istio.

Why is this needed:

REQUEST: New membership for LiZhenCheng9527

GitHub Username

LiZhenCheng9527

Organization you are requesting membership in

Kmesh

Requirements

I have reviewed the community membership guidelines
I have joined in the community mailing group and slack
I have watched Kmest-net/kmesh on GitHub to subscribe updates (My account appears in the watchers list)
I have enabled two-factor authentication on my GitHub account
I am actively contributing to 1 or more Kmesh subprojects
I have spoken to my sponsors ahead of this application, and they have agreed to sponsor my application

List of contributions to the Kurator project

PRs reviewed / authored:
#84
#90
#127
#139
#142
#143
#145
#147
#151
#155
#162
#164
#163
Issues authored:
#83
#109
#126
#143
#157
#159
Issues responded to:
#104

	func handleDeleteResponse(rsp *service_discovery_v3.DeltaDiscoveryResponse) error {
	var (
	err error
	)

	if strings.Contains(strings.Join(rsp.RemovedResources, ""), "Kubernetes//Pod") {
	// delete as a workload
	if err = RemoveWorkloadResource(rsp.GetRemovedResources()); err != nil {
	log.Errorf("RemoveWorkloadResource failed: %s", err)
	}
	} else {
	// delete as a service
	if err = RemoveServiceResource(rsp.GetRemovedResources()); err != nil {
	log.Errorf("RemoveServiceResource failed: %s", err)
	}
	}

	return err

	func newApiSocketAddress(address config_core_v3.Address) core_v2.SocketAddress {
	var addr *config_core_v3.SocketAddress

	switch address.GetAddress().(type) {
	case *config_core_v3.Address_SocketAddress:
	addr = address.GetSocketAddress()
	default:
	return nil
	}

	if addr == nil \|\| !nets.GetConfig().IsEnabledProtocol(addr.GetProtocol().String()) {
	return nil
	}

	return &core_v2.SocketAddress{
	// Protocol: core_v2.SocketAddress_Protocol(addr.GetProtocol()),
	Port: nets.ConvertPortToBigEndian(addr.GetPortValue()),
	Ipv4: nets.ConvertIpToUint32(addr.GetAddress()),
	}
	}

kmesh-net / kmesh Goto Github PK

kmesh's Introduction

Introduction

Why Kmesh

Challenges of the Service Mesh Data Plane

Kmesh Architecture

Key features of Kmesh

Quick Start

Performance

Contact

Contributing

License

Credit

kmesh's People

Contributors

Stargazers

Watchers

Forkers

kmesh's Issues

GitHub Username

Organization you are requesting membership in

Requirements

Sponsors

List of contributions to the Kurator project

Recommend Projects

Recommend Topics

Recommend Org