Comments (10)
It should work without resource requests (e.g. asking SR-IOV Device Plugin to select VFs). In that case sriov CNI tries to assign a random VF to the Pod, if there's enough free VFs of the PF on that host.
Can you please provide the exact error msg?
I guess the Pod was scheduled to a wrong host where SR-IOV VFs were not prepared in advance.
from danm.
i have test it with 2 YAML file.
one is(successful to create pod):
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
name: fakefhup
namespace: cran1
spec:
selector:
matchLabels:
name: fakefhup
template:
metadata:
annotations:
danm.k8s.io/interfaces: |
danm.k8s.io/interfaces: |
[
{ "network":"internal" },
{ "network":"bip" },
{ "network":"intmsg", "ip":"dynamic" },
{ "network":"fronthaulcu", "ip":"dynamic" }
]
labels:
name: fakefhup
spec:
dnsPolicy: ClusterFirst
#nodeSelector:
# nodetype: caas_master
containers:
- name: fakefhup
image: registry.kube-system.svc.nokia.net:5555/rcp/centos:7
command: ['sh', '-c', 'while true; do echo Hello Kubernetes! && sleep 100;done']
resources:
requests:
nokia.k8s.io/sriov_ens1f1: '1' #one P1 ens1f1 physical NIC based SR-IOV VF requested for bip
nokia.k8s.io/sriov_ens11f1: '1' #one P1 ens11f1 physical NIC based SR-IOV VF requested for fronthaulcu
limits:
nokia.k8s.io/sriov_ens1f1: '1' # keep the same value as request
nokia.k8s.io/sriov_ens11f1: '1' # keep the same value as request
other is(failed to create pod):
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
name: fakefhup
namespace: cran1
spec:
selector:
matchLabels:
name: fakefhup
template:
metadata:
annotations:
danm.k8s.io/interfaces: |
danm.k8s.io/interfaces: |
[
{ "network":"internal" },
{ "network":"bip" },
{ "network":"intmsg", "ip":"dynamic" },
{ "network":"fronthaulcu", "ip":"dynamic" }
]
labels:
name: fakefhup
spec:
dnsPolicy: ClusterFirst
#nodeSelector:
# nodetype: caas_master
containers:
- name: fakefhup
image: registry.kube-system.svc.nokia.net:5555/rcp/centos:7
command: ['sh', '-c', 'while true; do echo Hello Kubernetes! && sleep 100;done']
#resources:
# requests:
# nokia.k8s.io/sriov_ens1f1: '1' #one P1 ens1f1 physical NIC based SR-IOV VF requested for bip
# nokia.k8s.io/sriov_ens11f1: '1' #one P1 ens11f1 physical NIC based SR-IOV VF requested for fronthaulcu
# limits:
# nokia.k8s.io/sriov_ens1f1: '1' # keep the same value as request
# nokia.k8s.io/sriov_ens11f1: '1' # keep the same value as request
the error log of kubelet is:
Warning FailedCreatePodSandBox 32s kubelet, 192.168.87.22 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c199bde6c16e3e7ba9e029af70f8ed4a4bf47b6370bb5c170998f447bea8f205" network for pod "fakefhup-92s7b": NetworkPlugin cni failed to set up pod "fakefhup-92s7b_cran1" network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
in the all host, we have prepare the enough VF.
from danm.
It seems like a bug.
Can you please share the DanmNet definition for the failing case? Thanks.
Anyway, why don't you want to use resource requests? (as that is the preferred method)
from danm.
The danmnet definition as flollowing:
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: internal
namespace: cran1
spec:
NetworkID: internal
NetworkType: flannel
Options:
allocation_pool:
end: ""
start: ""
container_prefix: ""
host_device: ""
rt_tables: 254
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: intmsg
namespace: cran1
spec:
NetworkID: intmsg
NetworkType: ipvlan
Options:
host_device: "eno1"
cidr: "192.168.2.0/24"
allocation_pool:
start: "192.168.2.5"
end: "192.168.2.100"
container_prefix: "intmsg"
rt_tables: 0
vlan: 2
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: bip
namespace: cran1
spec:
NetworkID: bip
NetworkType: sriov
Options:
allocation_pool:
start: ""
end: ""
host_device: "ens1f1"
container_prefix: "bip"
vlan: 717
device_pool: "xxx.k8s.io/sriov_ens1f1" //I hide device pool name with xxx
from danm.
annotations:
danm.k8s.io/interfaces: |
danm.k8s.io/interfaces: |
[
is this a typo in your comment, or an issue in your manifest?
from danm.
i am do the test again, and the issue is same as before.
below 2 network is used, one is flannel, other is SRIOV.
[root@controller-3 YAML]# kubectl get danmnet internal -n=cran2 -o yaml
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
creationTimestamp: "2019-05-15T14:24:55Z"
generation: 2
name: internal
namespace: cran2
resourceVersion: "39274"
selfLink: /apis/danm.k8s.io/v1/namespaces/cran2/danmnets/internal
uid: 362a366f-771d-11e9-9606-d8c497cf132e
spec:
NetworkID: internal
NetworkType: flannel
Options:
allocation_pool:
end: ""
start: ""
container_prefix: ""
host_device: ""
rt_tables: 254
Validation: "True"
[root@controller-3 YAML]# kubectl get danmnet fronthaulmanagement -n=cran2 -o yaml
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
creationTimestamp: "2019-05-16T07:13:18Z"
generation: 26
name: fronthaulmanagement
namespace: cran2
resourceVersion: "406248"
selfLink: /apis/danm.k8s.io/v1/namespaces/cran2/danmnets/fronthaulmanagement
uid: 1481c4af-77aa-11e9-a2be-d8c497cf1308
spec:
NetworkID: fronthaulmanagement
NetworkType: sriov
Options:
alloc: gQ==
allocation_pool:
end: 10.70.31.38
start: 10.70.31.34
cidr: 10.70.31.32/29
container_prefix: fhm
device_pool: nokia.k8s.io/sriov_ens11f1
host_device: ens11f1
rt_tables: 0
vlan: 705
Validation: "True"
first, use below YAML to create pod:
apiVersion: apps/v1
kind: Deployment
metadata:
name: zhoutong-pod
namespace: cran2
spec:
selector:
matchLabels:
app: pod-test
replicas: 1
template:
metadata:
labels:
app: pod-test
annotations:
danm.k8s.io/interfaces: |
[
{
"network":"internal"
},
{
"network":"fronthaulmanagement"
}
]
spec:
hostNetwork: false
nodeSelector:
nodename: caas_master1
containers:
- name: zhoutong-container
image: registry.kube-system.svc.nokia.net:5555/rcp/centos:7
command: ['/bin/bash']
imagePullPolicy: IfNotPresent
stdin: true
tty: true
restartPolicy: Always
the result is POD can't startup with below kubelet error logs:
E0520 02:32:02.002353 101682 cni.go:331] Error adding cran2_zhoutong-pod-5fb64758d-w5wn6/93d2cd0cc9fd8bd8615caf6204a8b2bc6df78efa08e8c2804579def220d8b445 to network danm/meta_cni: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0520 02:32:02.446638 101682 remote_runtime.go:109] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "93d2cd0cc9fd8bd8615caf6204a8b2bc6df78efa08e8c2804579def220d8b445" network for pod "zhoutong-pod-5fb64758d-w5wn6": NetworkPlugin cni failed to set up pod "zhoutong-pod-5fb64758d-w5wn6_cran2" network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0520 02:32:02.446720 101682 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "zhoutong-pod-5fb64758d-w5wn6_cran2(4f456f2b-7aa7-11e9-a2be-d8c497cf1308)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "93d2cd0cc9fd8bd8615caf6204a8b2bc6df78efa08e8c2804579def220d8b445" network for pod "zhoutong-pod-5fb64758d-w5wn6": NetworkPlugin cni failed to set up pod "zhoutong-pod-5fb64758d-w5wn6_cran2" network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0520 02:32:02.446768 101682 kuberuntime_manager.go:693] createPodSandbox for pod "zhoutong-pod-5fb64758d-w5wn6_cran2(4f456f2b-7aa7-11e9-a2be-d8c497cf1308)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "93d2cd0cc9fd8bd8615caf6204a8b2bc6df78efa08e8c2804579def220d8b445" network for pod "zhoutong-pod-5fb64758d-w5wn6": NetworkPlugin cni failed to set up pod "zhoutong-pod-5fb64758d-w5wn6_cran2" network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input
E0520 02:32:02.446870 101682 pod_workers.go:190] Error syncing pod 4f456f2b-7aa7-11e9-a2be-d8c497cf1308 ("zhoutong-pod-5fb64758d-w5wn6_cran2(4f456f2b-7aa7-11e9-a2be-d8c497cf1308)"), skipping: failed to "CreatePodSandbox" for "zhoutong-pod-5fb64758d-w5wn6_cran2(4f456f2b-7aa7-11e9-a2be-d8c497cf1308)" with CreatePodSandboxError: "CreatePodSandbox for pod \"zhoutong-pod-5fb64758d-w5wn6_cran2(4f456f2b-7aa7-11e9-a2be-d8c497cf1308)\" failed: rpc error: code = Unknown desc = failed to set up sandbox container \"93d2cd0cc9fd8bd8615caf6204a8b2bc6df78efa08e8c2804579def220d8b445\" network for pod \"zhoutong-pod-5fb64758d-w5wn6\": NetworkPlugin cni failed to set up pod \"zhoutong-pod-5fb64758d-w5wn6_cran2\" network: netplugin failed but error parsing its diagnostic message \"\": unexpected end of JSON input"
W0520 02:32:03.294604 101682 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "zhoutong-pod-5fb64758d-w5wn6_cran2": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "93d2cd0cc9fd8bd8615caf6204a8b2bc6df78efa08e8c2804579def220d8b445"
W0520 02:32:03.337761 101682 pod_container_deletor.go:75] Container "93d2cd0cc9fd8bd8615caf6204a8b2bc6df78efa08e8c2804579def220d8b445" not found in pod's containers
W0520 02:32:03.341558 101682 cni.go:309] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "93d2cd0cc9fd8bd8615caf6204a8b2bc6df78efa08e8c2804579def220d8b445"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xf5fdaa]
goroutine 1 [running]:
main.getAllocatedDevices(0xc000318750, 0x130b540, 0xc00011d860, 0xc00011eec0, 0x1a, 0xc000476b00, 0x4, 0xc00031c000)
/build/src/github.com/nokia/danm/pkg/danm/danm.go:194 +0x6a
main.setupNetworking(0xc000318750, 0x0, 0x0, 0x6)
/build/src/github.com/nokia/danm/pkg/danm/danm.go:233 +0x897
main.createInterfaces(0xc00032acb0, 0x1184102, 0x5)
/build/src/github.com/nokia/danm/pkg/danm/danm.go:82 +0x4ff
github.com/nokia/danm/pkg/vendor/github.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc000121ef8, 0xc00032acb0, 0x13191e0, 0xc000100540, 0x1201ab8, 0x0, 0x130d580)
/build/src/github.com/nokia/danm/pkg/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:162 +0x259
github.com/nokia/danm/pkg/vendor/github.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc000121ef8, 0x1201ab8, 0x1201ac0, 0x13191e0, 0xc000100540, 0x42d231)
/build/src/github.com/nokia/danm/pkg/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:173 +0x32e
github.com/nokia/danm/pkg/vendor/github.com/containernetworking/cni/pkg/skel.PluginMainWithError(...)
/build/src/github.com/nokia/danm/pkg/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:210
github.com/nokia/danm/pkg/vendor/github.com/containernetworking/cni/pkg/skel.PluginMain(0x1201ab8, 0x1201ac0, 0x13191e0, 0xc000100540)
/build/src/github.com/nokia/danm/pkg/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:222 +0xf3
main.main()
/build/src/github.com/nokia/danm/pkg/danm/danm.go:477 +0x8c
then i use below YAML to create POD, it's working and POD is running:
apiVersion: apps/v1
kind: Deployment
metadata:
name: zhoutong-pod
namespace: cran2
spec:
selector:
matchLabels:
app: pod-test
replicas: 1
template:
metadata:
labels:
app: pod-test
annotations:
danm.k8s.io/interfaces: |
[
{
"network":"internal"
},
{
"network":"fronthaulmanagement"
}
]
spec:
hostNetwork: false
nodeSelector:
nodename: caas_master1
containers:
- name: zhoutong-container
image: registry.kube-system.svc.nokia.net:5555/rcp/centos:7
command: ['/bin/bash']
imagePullPolicy: IfNotPresent
stdin: true
tty: true
resources:
requests:
nokia.k8s.io/sriov_ens11f1: '1'
limits:
nokia.k8s.io/sriov_ens11f1: '1'
restartPolicy: Always
from danm.
yep, definitely there is an error in DANM code. I mean it cores, so :)
but, besides improving the error handling for this scenario the problem is that I really don't think DANM, and the SR-IOV CNI should try and allocate anything on its own in a Device Plugin managed setup.
Let's say we expose 8 VFs from a PF. Those 8 VFs are managed by the Device Manager inside Kubelet from that point onward, and NOT by the DP.
If we would automatically allocate a VF to a Pod without it specifying resource requests, Device Manager would never know about this allocation happening behind its back.
As a result, it would continue to advertise 8 VFs worth of capacity, when in reality we only have 7 left.
So, yes, it is actually mandatory. We just need to gracefully handle this scenario within DANM code, and return an explicit error, rather than core.
from danm.
yep, definitely there is an error in DANM code. I mean it cores, so :)
but, besides improving the error handling for this scenario the problem is that I really don't think DANM, and the SR-IOV CNI should try and allocate anything on its own in a Device Plugin managed setup.
Let's say we expose 8 VFs from a PF. Those 8 VFs are managed by the Device Manager inside Kubelet from that point onward, and NOT by the DP.
If we would automatically allocate a VF to a Pod without it specifying resource requests, Device Manager would never know about this allocation happening behind its back.
As a result, it would continue to advertise 8 VFs worth of capacity, when in reality we only have 7 left.So, yes, it is actually mandatory. We just need to gracefully handle this scenario within DANM code, and return an explicit error, rather than core.
yes, agree with you, please update the README to indicate that the resource requests in POD YAML is mandatory.
from danm.
yep, definitely there is an error in DANM code. I mean it cores, so :)
but, besides improving the error handling for this scenario the problem is that I really don't think DANM, and the SR-IOV CNI should try and allocate anything on its own in a Device Plugin managed setup.
Let's say we expose 8 VFs from a PF. Those 8 VFs are managed by the Device Manager inside Kubelet from that point onward, and NOT by the DP.
If we would automatically allocate a VF to a Pod without it specifying resource requests, Device Manager would never know about this allocation happening behind its back.
As a result, it would continue to advertise 8 VFs worth of capacity, when in reality we only have 7 left.
So, yes, it is actually mandatory. We just need to gracefully handle this scenario within DANM code, and return an explicit error, rather than core.yes, agree with you, please update the README to indicate that the resource requests in POD YAML is mandatory.
definitely. I will keep this Issue open to track both the update of the documentation, and the improvement in the error handling code
from danm.
I'm kind of sure that the underlying code issue is fixed by #119.
Documentation was also updated: 9cd7db7
So I think this issue is can be considered done
from danm.
Related Issues (20)
- spoof check is turning on automatically while using vf's from mellanox nic HOT 4
- Build fail HOT 1
- failed to get Pod info from K8s API server due to:Unauthorized HOT 1
- cannot create pod due to `Error delegating ADD to CNI plugin:flannel because:OS exec call failed:missing network name` HOT 2
- Unable to deploy the pod with SRIOV-VF's HOT 4
- add ipv6 address to network interface fail HOT 3
- Support building danm with default CRI HOT 2
- IP Address allocation fails HOT 15
- Tenant network always loss ipvlan link HOT 4
- How to check what is the VF getting assigned to a POD/container while created? HOT 1
- SRIOV VF not released back to resource pool HOT 3
- Error delegating ADD to CNI plugin:calico because:OS exec call failed:invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable HOT 2
- Support for NodePort service on secondary POD interfaces HOT 3
- How to use host-device plugin? HOT 1
- "Static IP allocation failed", requested IP address already in use HOT 2
- CNI delegation failed due to error:Error delegating ADD to CNI plugin:sriov because:OS exec call faild:netplugin failed with no error message HOT 2
- danm/calico: pod to pod communication does not traverse nodes HOT 6
- Not able to deploy Danm 4.3.0 in kubernetes 1.21.8 using installer job. HOT 1
- [v4.3.0] - invalid version: module contains a go.mod file, so module path must match major version HOT 1
- Ipv6 global mngtmpaddr dynamic address observed in interface HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from danm.