Comments (20)
Quite agree with Levo, seems the problem come from the configuration file for calico, in my opinion, at least "etcd_endpoints" "etcd_key_file" "etcd_cert_file" and "etcd_ca_cert_file" are needed.
from danm.
Updated dn so metadata.name matches spec.NetworkID -- didn't know they have to match
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: calico-mgmt
namespace: example-sriov
spec:
NetworkID: calico-mgmt
NetworkType: calico
Pod still failed to start but with new error:
Warning FailedCreatePodSandBox 7s kubelet, mtx-huawei2-bld02 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "834f6e9a1d195a6d410a3e39d1ddb8333d71874801414ce84ba2c04b492086bf" network for pod "sriov-pod": NetworkPlugin cni failed to set up pod "sriov-pod_example-sriov" network: CNI network could not be set up: CNI operation for network:calico-mgmt failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:calico because:OS exec call failed:no etcd endpoints specified
from danm.
glad you really decided to try DANM :)
yes, generally speaking Calico should work, I think we had multiple users successfully using it in the past
in your case it simply a typo:
name: cali-mgmt
danm.k8s.io/interfaces: |
[
{"network":"cali_co_-mgmt", "ip":"dynamic"}
]
from danm.
ah sorry, only saw the update now. I'm still on my morning coffee :)
NetworkID and name: they don't need to match, but you need to provide the name of the network in the connection definition section of your Pod manifest. However you can name your networks anything!
and for the error: it is thrown by the Calico code, after DANM has delegated the operation. I guess Calico expects some configuration to be present in its backend which is missing. But I confess I'm not that big of a Calico expert, so not sure exactly what's missing.
But for sure the error is not coming from DANM.
summoning @rospring and @clivez , AFAIK they have some Calico experience: guys, any idea what could be the issue here?
from danm.
after some doc reading:
https://docs.projectcalico.org/v3.5/usage/calicoctl/configure/etcd
I guess you are missing the ETCD_ENDPOINTS environment variable, or config file option so the Calico CNI cannot find its own backend
from danm.
Thank you guys. Without DANM, Calico has been working on the k8s cluster as the overlay network, I just reused / renamed its config file to calico-mgmt.conf for danm, so wasn't sure why / where to add the additional config info when it's used as a delegate?
(btw, I've used calico with multus, reusing the same config file, didn't have this kind of issue...)
from danm.
Hmm, interesting. We need to go deeper then :)
Two things come to my mind:
- can you share with us how i the ETCD store configured for Calico in your cluster? Is it through environment variables, or via config file / ConfigMap?
- can you try it with a CNI config file which purely contains Calico's config? the current one has plugin chaining which we don't really do, as we have a 1:1 mapping of interfaces and CNI delegation operations.
That might be the root cause
from danm.
I followed kubeadm doc to apply calico on k8s,
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
Back then it used 2 files,
https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
In calico daemonset it specified k8s for datastore, not etcd.
# Use Kubernetes API as the backing datastore.
- name: DATASTORE_TYPE
value: "kubernetes"
When running multus with calico, I used the same option, "datastore_type": "kubernetes",
cat /etc/cni/net.d/05-multus.conf
{
"name": "multus-cni-network",
"type": "multus",
"delegates": [
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "mtx-huawei2-bld08",
"mtu": 1440,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
],
"kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig"
}
So DANM's way of delegating with calico is more restricted?
Thanks. -Jessica
from danm.
well, when handling static delegates you can say it like that. we don't support chaining together plugins, because chaining is usually simply not needed.
so, questions arises: what is "portmap" CNI even used for? :)
Until now we never had a customer who needed the "standard" plugin chaining CNI feature to get something done- simply because we can configure all the features required by a user through our user friendly management API. So, because we have something better, we don't do the less flexible approach of customizing interface provisioning.
If you tell me what portmpapping CNI is required for, I might give you an alternative which you only need to configure into the dynamic network management API, and not into static files.
Alternatively we can also support chaining if required.
When it comes to dynamic delegates everything is configured through the same dynamic, centralized REST API. Therefore I would say these delegates are actually way less restrictive than sticking to the component specific static CNI files.
So, trying to come up with some takeaways, and next steps:
- do you really need chaining, or this is just the default provisioning and "portmapping" is not really required?
- if it is required, maybe we already have a dynamically configurable feature substituting it in a friendlier way
- if not, maybe we can develop one :)
- or support chaining, if absolutely required for your use-case!
- but please try it out first with a CNI config which is not a chained one (i.e. without "plugins", only containing the Calico CNI config), because it is still just a hunch that the chaining is the root cause of your issue
from danm.
The portmap cni was there by default in calico-config, not sure why, k8s doc just says it's required to support hostPort. our apps don't use that. Gonna remove and see.
Thanks. -Jessica
from danm.
Thank you sir, worked w/o cni chaining,
cat calico-mgmt.conf
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "mtx-huawei2-bld08",
"mtu": 1440,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
}
Gonna move on to add sriov network.
Thanks. -Jessica
ps. will soon be away for 2 weeks
from danm.
Cool!
We are not running anywhere, no worries :) Feel free to open follow up issues if you encounter anything out of ordinary during your SRIOV trial!
from danm.
Please let me know if should put this in a new issue.
continue to follow example/device_plugin_demo
$ cat sriov_net.yaml
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: calico-mgmt
namespace: example-sriov
spec:
NetworkID: calico-mgmt
NetworkType: calico
apiVersion: danm.k8s.io/v1
kind: DanmNet
metadata:
name: sriov-a
namespace: example-sriov
spec:
NetworkID: sriov-a
NetworkType: sriov
Options:
device_pool: "intel.com/sriov_net_A"
container_prefix: data_net
rt_tables: 250
vlan: 300
cidr: 10.100.20.0/24
allocation_pool:
start: 10.100.20.10
end: 10.100.20.100
$ cat sriov_pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: sriov-pod
namespace: example-sriov
labels:
env: test
annotations:
danm.k8s.io/interfaces: |
[
{"network":"calico-mgmt", "ip":"dynamic"},
{"network":"sriov-a", "ip":"none"}
]
spec:
containers:
- name: sriov-pod
image: busybox:latest
args:- sleep
- "1000"
resources:
requests:
intel.com/sriov_net_A: '1'
limits:
intel.com/sriov_net_A: '1'
nodeSelector:
sriov: enabled
Events:
Type Reason Age From Message
Normal Scheduled 4s default-scheduler Successfully assigned example-sriov/sriov-pod to mtx-huawei2-bld03
Warning FailedCreatePodSandBox 1s kubelet, mtx-huawei2-bld03 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b91e13fddfdcfeb7a421efbb1b592f24fe2ec5ebdf2862a25ddcff6a78c139af" network for pod "sriov-pod": NetworkPlugin cni failed to set up pod "sriov-pod_example-sriov" network: CNI network could not be set up: CNI operation for network:sriov-a failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:sriov because:OS exec call failed:failed to set up IPAM plugin type "fakeipam" from the device "eno31": No IP was passed to fake IPAM
Normal SandboxChanged 1s kubelet, mtx-huawei2-bld03 Pod sandbox changed, it will be killed and re-created.
$ kubectl get node mtx-huawei2-bld03 -o json | jq '.status.allocatable'
{
"cpu": "64",
"ephemeral-storage": "48294789041",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"intel.com/sriov_net_A": "16",
"intel.com/sriov_net_B": "0",
"memory": "196389160Ki",
"pods": "110"
}
Tried dynamic, instead of none, {"network":"sriov-a", "ip":"none"}
The error was "IPv4 address cannot be dynamically allocated for an L2 network!"
Should it be static? how could the example have worked?
Thanks. -Jessica
from danm.
not picky when it comes to number of issues, no KPIs for it :) so we can continue it in this thread if you want!
so, two issues.
the first one is a regression we have introduced recently: "none" type IP allocation does not currently work with SR-IOV. See related issue: #107
It is scheduled to be corrected in DANM 4.1
The second is a config issue in the manifest, but it is actually the desired result: CIDR is not defined in the network manifest, meaning that the network represents a L2 network. So, if you want L3 VFs (with IP), add the "cidr" attribute to the manifest to define the subnet from which IPs can be allocated to a Pod
from danm.
Is this line not enough in above dn?
cidr: 10.100.20.0/24
Could you find a complete example of sriov with dynamic? Even better if it also has routing across nodes...
Thanks. -Jessica
from danm.
ah my bad, did not notice yours already has a CIDR! yes it should be enough.
are you running 3.3, or 4.0? In 3.3 the networks were only validated after their creation, so it can happen that failed. In 4.0 we validate them already at the time of their creation with the "webhook" component.
However, if you run 4.0 "webhook" is a mandatory component. If you run 4.0, but without the webhook, that would explain this behaviour.
if you are running 3.3: can you send me the exact output of "kubectl describe sriov-a -n example-sriov", and the the output of kubectl logs of any netwatcher Pod?
then I can tell you more
regarding routing: well, with SR-IOV you are basically building a good, old-fashioned L2 domain. so assuming you have configured the VLAN tag in the DanmNet for all of your PFs of all of your computes in your switch, connectivity between nodes is achieved by the simple in-subnet switching
if you want to connect to other IPs belonging to other subnets, you can provision IP routes via the "routes" parameter in the DanmNet, or policy-based IP routes via the "proutes" parameter in the connection annotation
from danm.
Meanwhile: if you do use 4.0 I corrected the "none" type issue in #110
If you change the CNI binary on your cluster to the new one you could give it a go
from danm.
Let's leave consider this thread closed from the perspective of the original issue, but if you still have any questions related to SR-IOV feel free to open a new one!
from danm.
Quite agree with Levo, seems the problem come from the configuration file for calico, in my opinion, at least "etcd_endpoints" "etcd_key_file" "etcd_cert_file" and "etcd_ca_cert_file" are needed.
@clivez Hello there, I am now utilizing Danm to create the calico networks, but I am facing the same error "CNI operation for network:calico-1 failed with:CNI delegation failed due to error:Error delegating ADD to CNI plugin:calico because:OS exec call failed:no etcd endpoints specified".
You mentioned above that "etcd_endpoints" "etcd_key_file" "etcd_cert_file" and "etcd_ca_cert_file" are minimum needed, then which config file should I setup these arguments, /etc/cni/net.d/calico-1.conf or /etc/cni/net.d/calico-kubeconfig?
In the meanwhile, I try to setup etcd_endpoints IP, referenced from etcd_pod_kube_system, in both /etc/cni/net.d/calico-1.conf and /etc/cni/net.d/calico-kubeconfig, it seems not working.
Sorry, If I should not reply an closed issue, I'll open another new one or ask on slack.
from danm.
I think the problem here was similar to what you have experienced with your Flannel config, i.e. the Calico config in this case was also in "chained" format
have you verified it yet?
from danm.
Related Issues (20)
- spoof check is turning on automatically while using vf's from mellanox nic HOT 4
- Build fail HOT 1
- failed to get Pod info from K8s API server due to:Unauthorized HOT 1
- cannot create pod due to `Error delegating ADD to CNI plugin:flannel because:OS exec call failed:missing network name` HOT 2
- Unable to deploy the pod with SRIOV-VF's HOT 4
- add ipv6 address to network interface fail HOT 3
- Support building danm with default CRI HOT 2
- IP Address allocation fails HOT 15
- Tenant network always loss ipvlan link HOT 4
- How to check what is the VF getting assigned to a POD/container while created? HOT 1
- SRIOV VF not released back to resource pool HOT 3
- Error delegating ADD to CNI plugin:calico because:OS exec call failed:invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable HOT 2
- Support for NodePort service on secondary POD interfaces HOT 3
- How to use host-device plugin? HOT 1
- "Static IP allocation failed", requested IP address already in use HOT 2
- CNI delegation failed due to error:Error delegating ADD to CNI plugin:sriov because:OS exec call faild:netplugin failed with no error message HOT 2
- danm/calico: pod to pod communication does not traverse nodes HOT 6
- Not able to deploy Danm 4.3.0 in kubernetes 1.21.8 using installer job. HOT 1
- [v4.3.0] - invalid version: module contains a go.mod file, so module path must match major version HOT 1
- Ipv6 global mngtmpaddr dynamic address observed in interface HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from danm.