datashim-io / datashim Goto Github PK
View Code? Open in Web Editor NEWA kubernetes based framework for hassle free handling of datasets
Home Page: http://datashim-io.github.io/datashim
License: Apache License 2.0
A kubernetes based framework for hassle free handling of datasets
Home Page: http://datashim-io.github.io/datashim
License: Apache License 2.0
As reported here #106
In dataset definitions like this:
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: kind-example-v0.2-try6-cp4d3f6d318f7c
spec:
local:
type: "COS"
secret-name: "bucket-creds"
secret-namespace: "m4d-system"
endpoint: "http://s3.eu.cloud-object-storage.appdomain.cloud"
provision: "true"
bucket: "kind-example-v0.2-try6-cp4d3f6d318f7c"
there should be the necessary message in the dataset.status
FYI @shlomitk1
The current label format does not follow the best practices I have been seeing around (https://github.com/IBM/dataset-lifecycle-framework/blob/master/examples/hive/sampleapp/samplepod.yaml#L8)
Have you considered prefixing your labels similarly to https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
I am trying to mount a dataset for the 1000 Genome project - https://registry.opendata.aws/1000-genomes/
I have created the dataset object using:
---
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: 1000-genome-dataset
spec:
local:
type: "COS"
accessKeyID: ""
secretAccessKey: ""
endpoint: "https://s3-us-east-1.amazonaws.com"
bucket: "1000genomes"
readonly: "true" #OPTIONAL, default is false
The PVC for the dataset gets provisioned but when trying to mount into a pod, get errors like this:
Warning FailedMount 92s kubelet MountVolume.SetUp failed for volume "pvc-a785f139-e992-4790-a2f7-57ad1efa5476" : rpc error: code = Unknown desc = Error fuseMount command: goofys
args: [--endpoint=https://s3-us-east-1.amazonaws.com --profile=pvc-a785f139-e992-4790-a2f7-57ad1efa5476 --type-cache-ttl 1s --stat-cache-ttl 1s --dir-mode 0777 --file-mode 0777 --http-timeout 5m -o allow_other -o ro 1000genomes /var/lib/kubelet/pods/256d5c70-5751-4aaa-8095-fc951047a3db/volumes/kubernetes.io~csi/pvc-a785f139-e992-4790-a2f7-57ad1efa5476/mount]
output: 2021/05/03 22:35:32.659638 main.FATAL Unable to mount file system, see syslog for details
Any pointer would be greatly appreciated
https://bestpractices.coreinfrastructure.org/
This will help you as part of your CNCF application process
Need to investigate a bit, but admission controller is not working on k8s 1.19.x version as it complains about the self signed certificates
Hi,
I'd like to integrate your awesome project into my terraform script, using helm. I'm kind of a beginner with helm, so I was wondering if you could explain to me how to add the datashim charts as a repo. As far as I understand, it requires an index.yaml which I cannot find in the charts.
I could install it with kubectl and the yaml file, but I'd like to exclude the efs driver as I do not need it and I don't want it to waste resources.
If you intentionally didn't add an index.yaml, could you please point me into the right direction of how to handle this? Creating my own index.yaml? Thanks a lot in advance!
The use case is about working with data on IBM COS. I followed the guide here: https://github.com/IBM/dataset-lifecycle-framework/wiki/Data-Volumes-for-Notebook-Servers#create-a-dataset-for-the-s3-bucket
where it creates a COS bucket, it needs:
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: your-dataset
spec:
local:
type: "COS"
accessKeyID: "access_key_id"
secretAccessKey: "secret_access_key"
endpoint: "https://YOUR_ENDPOINT"
bucket: "YOUR_BUCKET"
region: "" #it can be empty
Which requires a service credential to be created.
I wonder if it can support creating dataset via:
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: your-dataset
spec:
local:
type: "COS"
ibm_cloud_iam_apikey: "<base64 encoded api key>"
bucket: "YOUR_BUCKET"
region: "" #it can be empty
Which will make COS admin's life much easier since it can delegate secret management/rotate to IBM Cloud IAM.
The manifest file does not configure the SecurityContextConstraints for the following ServiceAccounts:
csi-provisioner
csi-s3
csi-nodeplugin
csi-attacher
As a result, on OpenShift, containers which are expecting to run in privileged mode are unable to get access to features such as hostNetwork, and hostPath. In turn, the DaemonSets csi-s3
and csi-nodeplugin-nfsplugin
are unable to spawn pods on the cluster nodes because the ServiceAccounts csi-s3
and csi-nodeplugin
are not registered as users in the privileged
SecurityContextConstraints. Similar issues manifest due to the ServiceAccounts csi-attacher
and csi-provisioner
not being registered as users in the privileged
SecurityContextConstraints.
Done when
The instructions in the README.md file address the RBAC configuration of the service accounts. This can either be done via
oc adm policy add-scc-to-user privileged -n dlf -z csi-provisioner -z csi-s3 -z csi-nodeplugin -z csi-attacher
alternatively, the instructions could use the JSON patch feature of the kubectl
utility like so:
kubectl patch scc privileged --type=json -p '[
{"op": "add", "path": "/users/-", "value": "system:serviceaccount:dlf:csi-provisioner"},
{"op": "add", "path": "/users/-", "value": "system:serviceaccount:dlf:csi-nodeplugin"},
{"op": "add", "path": "/users/-", "value": "system:serviceaccount:dlf:csi-attacher"},
{"op": "add", "path": "/users/-", "value": "system:serviceaccount:dlf:csi-s3"}]'
We want to have the mutatingwebhookconfiguration explicitly defined in the dlf.yaml deployment. So that we can also delete the webhook using the same dlf.yaml
I noticed that at the moment DLF will query for the installed caching plugins and will always use the first result (if any) to cache the dataset. However, in the case where multiple caching plugins are installed, it would come handy to be able to specify which one to use and cache the dataset. Also, there are cases that the user may want to opt-out of caching at all.
As a solution to the above points, I am thinking that a new label in the dataset, with the key cache.plugin
and as value the name of the caching plugin to use, could be used to identify which plugin to use against the installed ones. Also, when the value of this plugin is None
the user could easily opt-out caching the dataset.
Any thoughts?
Thanks
Hi,
another question ๐
Have you thought about how to prevent users from mounting pvc's of others in NFS?
We have one export share. When user is creating a Dataset, he/she needs to specify the path.
Let's say the path is /nfs/export
and option createDirPVC: "true"
. In this case, the user gets his/her own share at /nfs/export/myshare
. However, nothing stops user from mounting whole export just by sepcifying path as /nfs/export
and setting createDirPVC: "false"
I think this is a great issue in multitenancy environments and therfore unusable because of big security problem. Maybe if helm chart was up and ready for use, the path could be configurable somewhere in values and the user wouldn't actually have to specify if he/she wants to create a directory but default setting would be to create a directory with name path + Dataset name
and mount only the resulting path in PVC.
H3 is an embedded High speed, High volume, and High availability object store, backed by a high-performance key-value store (RocksDB, Redis, etc.). H3 also provides a FUSE implementation to allow object access using file semantics. The CSI H3 mount plugin (csi-h3 for short), allows you to use H3 FUSE for implementing persistent volumes in Kubernetes.
In practice, csi-h3 implements a fast and efficient filesystem on top of a key-value store. With csi-h3 deployed, and a Redis server running, you just need to specify the Redis endpoint and the bucket name you want to use, in order to get a mountpoint for your containers. H3 is embedded in csi-h3, so there are no other requirements to install.
H3 could be supported In DLF, with a dataset definition like the following:
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: example-dataset
spec:
local:
type: "H3"
storageUri: "redis://redis.default.svc:6379"
bucket: "b1"
Note that H3 supports many additional key-value stores, but in the distributed environment of Kubernetes, you need a key-value store that can be accessed through a network protocol. For persistent storage, Ardb provides Redis connectivity over a range of key-value implementations, including RocksDB, LevelDB, and others. In that case, the storageUri
used will still be in the form redis://...
, but the actual service will be provided by Ardb.
Document an example of using python's k8s apis, ie create_namespaced_custom_object
cc @davidyuyuan
$ make minikube-install
results in
Installing NooBaa...done
Building NooBaa data loader...done
Creating test OBC...error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
error: the server doesn't have a resource type "obc"
This is happening because of this line in examples/noobaa/noobaa_install.sh
:
wget -P ${DIR} https://github.com/noobaa/noobaa-operator/releases/download/v2.0.10/noobaa-linux-v2.0.10 > /dev/null 2>&1
Running make minikube-install
mostly seems to work, but between loading the images into minikube and applying the yaml, spits out these errors:
/bin/bash: ./release-tools/generate-keys.sh: No such file or directory
/bin/bash: line 1: /tmp/tmp.w89e0ut2NV/ca.crt: No such file or directory
W0921 17:11:40.845263 162519 helpers.go:535] --dry-run is deprecated and can be replaced with --dry-run=client.
error: Cannot read file /tmp/tmp.w89e0ut2NV/webhook-server-tls.crt, open /tmp/tmp.w89e0ut2NV/webhook-server-tls.crt: no such file or directory
error: no objects passed to apply
/bin/bash: line 5: ./src/dataset-operator/deploy/webhook.yaml.template: No such file or directory
error: no objects passed to apply
It seems like the generate-keys.sh
stuff happens in-cluster now, so maybe this is nothing to worry about? It's a bit disconcerting though :-)
I'm getting the following error while deploying datashim:
error: unable to recognize "https://raw.githubusercontent.com/datashim-io/datashim/master/release-tools/manifests/dlf.yaml": no matches for kind "CSIDriver" in version "storage.k8s.io/v1"
Earlier installations were successful. I suspect that #105 is the cause.
kubectl logs csi-attacher-nfsplugin-0 -c csi-attacher
on the cluster showed that the volume could not be attached as there was no patch
permission for csi-attacher-nfs-plugin
The problem lies in the lines with sleep 15
like here https://github.com/IBM/dataset-lifecycle-framework/blob/10f3be95913aa9624245c9c48f750a4d7d9dbfc8/examples/noobaa/noobaa_install.sh#L28
We should wait until the objects are ready and not 15 seconds
On IKS, mounting S3 CSI drivers still shows the below errors. Although this is not blocking the pod from mounting, it takes the kubelet few minutes to realize the message is bogus which creates some bottleneck on mounting time.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m6s default-scheduler Successfully assigned default/nginx to 10.168.14.70
Warning FailedMount 5m kubelet MountVolume.SetUp failed for volume "pvc-ae703fc0-26d4-4ba2-bb92-2d709985e72b" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc = Error fuseMount command: goofys
args: [--endpoint=http://minio-service.kubeflow:9000 --profile=pvc-ae703fc0-26d4-4ba2-bb92-2d709985e72b --type-cache-ttl 1s -f --stat-cache-ttl 1s --dir-mode 0777 --file-mode 0777 --http-timeout 5m -o allow_other -o ro e6dbfd34-1ed9-11eb-8b10-d62589704c0d /var/data/kubelet/pods/f3889d7b-0ee6-40d3-8add-87ddb82a1901/volumes/kubernetes.io~csi/pvc-ae703fc0-26d4-4ba2-bb92-2d709985e72b/mount]
output: 2020/11/04 20:11:37.734151 s3.ERROR code=NoCredentialProviders msg=no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors, err=<nil>
2020/11/04 20:11:37.734280 main.ERROR Unable to access 'e6dbfd34-1ed9-11eb-8b10-d62589704c0d': NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2020/11/04 20:11:37.734297 main.FATAL Mounting file system: Mount: initialization failed
Warning FailedMount 3m3s kubelet Unable to attach or mount volumes: unmounted volumes=[example-dataset], unattached volumes=[default-token-wspcg example-dataset]: timed out waiting for the condition
Warning FailedMount 2m59s kubelet MountVolume.SetUp failed for volume "pvc-ae703fc0-26d4-4ba2-bb92-2d709985e72b" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal Pulling 2m50s kubelet Pulling image "nginx"
Normal Pulled 2m49s kubelet Successfully pulled image "nginx"
Normal Created 2m49s kubelet Created container nginx
Normal Started 2m49s kubelet Started container nginx
If the dataset operator pod gets evicted for some reason a new instance is started. However, the evicted instance seems to be holding a lock causing a deadlock.
In the new operator instance logs you can only see the following line repeated indefinitely:
{"level":"info","ts":1589796411.546784,"logger":"leader","msg":"Not the leader. Waiting."}
The pod is not capable of continuing its execution.
This seems to be an issue with the operator-framework that is being triaged at the below link:
operator-framework/operator-sdk#1305
Hi,
I have cluster with pod security policy enabled. When I try to deploy operator I always get Error: container has runAsNonRoot and image will run as root
in dataset-operator Deployment.
The issue is resolved with adding security context under spec.template.spec
. I used
spec:
securityContext:
runAsUser: 1000
and pod starts now.
Could similar fix be added to the code?
We want to have an option to turn on/off for extracting dataset tar file to s3.
Cache remote s3 buckets using ceph.
Tracked branch: https://github.com/IBM/dataset-lifecycle-framework/commits/fixed-caching
Wiki with instructions: https://github.com/IBM/dataset-lifecycle-framework/wiki/Ceph-Caching
It might be easier to provide a helm chart than custom scripts with envsubst
When I use a dataset label to mount a dataset to a deployment, nothing happens.
Upon further inspection, it looks like the MutatingWebhookConfiguration
does not trigger for deployments
, but only for pods
. For deployments, the mutate
function should do exactly the same changes it does for pods, but work on the /spec/template/spec
path instead of /spec
.
Hi there,
Thanks for the efforts, finding this project very useful. One issue i'm having however (apologies if it's obvious) is mounting an existing bucket, even though I am specifying the bucket in the secret e.g this is for a non-AWS s3 endpoint;
apiVersion: v1
data:
accessKeyID: accessKey
bucket: bucket
endpoint: endpoint
region: ""
secretAccessKey: secretAccessKey
kind: Secret
metadata:
name: csi-s3-pvc
namespace: test-namespace
type: Opaque
Rather than mounting the specified bucket, it instead generates a new bucket with the name of the Kubernetes pvc. Just want to confirm that I am doing things correctly and if not what I need to change?
Versions:
Attacher: 2.2.0
Provisioner: 1.6.0
If you do the full installation on a different namespace, the previous installation breaks.
We need to have 2 things fixed
FYI @davidyuyuan
Hi all,
I have a Kubernetes cluster on AWS (EKS).
we are currently using some workaround (script at the init of a node) to be able to mount s3 bucket on pod.
I tried to use datashim which looks very promising.
I installed the setup with https://raw.githubusercontent.com/IBM/dataset-lifecycle-framework/master/release-tools/manifests/dlf.yaml
here my dataset config:
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: archive-dataset
spec:
local:
type: "COS"
accessKeyID: "XXX"
secretAccessKey: "XXX"
endpoint: "https://s3.amazonaws.com"
bucket: "bucket_name-ap-east-1"
region: "ap-east-1"
But I end up with the error:
Warning ProvisioningFailed 3m11s (x9 over 7m32s) ch.ctrox.csi.s3-driver_csi-provisioner-s3-0_0ef0ce8b-2b1e-4a4e-8ebd-a82731d7ae1b failed to provision volume with StorageClass "csi-s3": rpc error: code = Unknown desc = failed to check if bucket bucket_name-ap-east-1 exists: 400 Bad Request
All the pod in dlf
namespace are running fine (Running
status, I didn't dig the log yet)
I tried with different credentials.
I can mount successfully the bucket locally with s3fs
or goofys
(with the same credentials).
Did I miss anything?
Thank you very much for your work.
Currenty the user can create a dataset like this:
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: your-dataset
spec:
local:
type: "COS"
accessKeyID: "{AWS_ACCESS_KEY_ID}"
secretAccessKey: "{AWS_SECRET_ACCESS_KEY}"
endpoint: "{S3_SERVICE_URL}"
bucket: "{BUCKET_NAME}"
region: "" #it can be empty
Then if they specify a pod like this:
apiVersion: v1
kind: Pod
metadata:
name: simple-nginx
labels:
dataset.0.id: "your-dataset"
dataset.0.useas: "configmap"
spec:
containers:
- name: nginx
image: nginx
It will be mutated as follows:
- configMapRef:
name: your-dataset
prefix: your-dataset_
- prefix: your-dataset_
secretRef:
name: your-dataset
As a result the credentials would be available in the pod with the your-dataset_
prefix as env variables.
However, there are scenarios where we only want authorized images to access the credentials and not any pod.
We are designing with @mrsabath how this could be achieved with https://github.com/IBM/trusted-service-identity and this issue will capture this process.
From the DLF perspective we need to upload to Vault the secrets once a Dataset is created. The key-values would look like this:
<cluster>/<namespace>/<dataset>/accessKeyID
<cluster>/<namespace>/<dataset>/secretAccessKey
....
Then we need to modify our admission controller would add the necessary labels to the user's pod which would allow TSI to check whether this pod can use these credentials or not. Ideally it should work as before and expose it as env variables:
<dataset>_accessKeyID = xxxxx
<dataset>_secretAccessKey = xxxx
In case the image is not authorized, the credentials should not be injected
I am trying to set this up with an S3 bucket. I have provided my keys. I get:
ย | failed to provision volume with StorageClass "csi-s3": rpc error: code = Unknown desc = failed to initialize S3 client: Endpoint: does not follow ip address or domain name standards. |
---|
I have:
endpoint=s3.eu-west-2.amazonaws.com
which is clearly wrong, but what would be RIGHT?
The following apiVersion
will be deprecated in in v1.22 and are used in the dataset-operator:
I have done manual some tests and upgrade the admissionregistration apiVersion
from
Anyway, the problem where I blocking is on the migration of apiextensions.k8s.io/v1beta1
from
Following these recommendations, I have tried to update the dataset-operator CRD files and successfully deploy them.
Here what I have for the file src/dataset-operator/chart/templates/crds/com.ie.ibm.hpsys_datasets_crd.yaml
:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: datasets.com.ie.ibm.hpsys
spec:
group: com.ie.ibm.hpsys
names:
kind: Dataset
listKind: DatasetList
plural: datasets
singular: dataset
scope: Namespaced
versions:
- name: v1alpha1
subresources:
status: {}
served: true
storage: true
schema:
openAPIV3Schema:
type: object
description: Dataset is the Schema for the datasets API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: DatasetSpec defines the desired state of Dataset
properties:
local:
additionalProperties:
type: string
description: 'INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
Important: Run "operator-sdk generate k8s" to regenerate code after
modifying this file Add custom validation using kubebuilder tags:
https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html
Conf map[string]string `json:"conf,omitempty"`'
type: object
remote:
additionalProperties:
type: string
type: object
type: object
status:
description: DatasetStatus defines the observed state of Dataset
properties:
error:
description: 'INSERT ADDITIONAL STATUS FIELD - define observed state
of cluster Important: Run "operator-sdk generate k8s" to regenerate
code after modifying this file Add custom validation using kubebuilder
tags: https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html'
type: string
type: object
But when I try to deploy a new simple S3 Dataset:
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: test
spec:
local:
type: "COS"
accessKeyID: "KeyID"
secretAccessKey: "Secret"
endpoint: "https://s3.eu-west-1.amazonaws.com"
bucket: "test-bucket"
readonly: "true" #OPTIONAL, default is false
The controller sees the new resource:
...
{"level":"info","ts":1622124637.1511607,"logger":"controller_dataset","msg":"Reconciling Dataset","Request.Namespace":"default","Request.Name":"test"}
{"level":"info","ts":1622124637.1608975,"logger":"controller_dataset","msg":"Reconciling Dataset","Request.Namespace":"default","Request.Name":"test"}
But nothing happens. ๐
I can describe the dataset resource but the PVC is not created.
If some people can help on this. ๐
I'm ready to help and contribute but blocking on this.
When trying to deploy a pod with DLF labels inside an namespace with istio inject, I'm seeing the errors below. It looks like there's some conflicts between the DLF and istio mutation.
The Pod "nginx" is invalid: spec.volumes[4].name: Duplicate value: "example-dataset"
Here is my pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
labels:
dataset.0.id: "example-dataset"
dataset.0.useas: "mount"
spec:
containers:
- name: nginx
image: nginx
EOF
Hi,
I would like if container node-driver-registrar in daemonset csi-nodeplugin-nfsplugin is expected to fail and not restart after some node problem. If a node reboots, the container node-driver-registrar stops working with log
I0202 01:11:30.758475 1 node_register.go:58] Starting Registration Server at: /registration/nfs.csi.k8s.io-reg.sock
I0202 01:11:30.758743 1 node_register.go:67] Registration Server started at: /registration/nfs.csi.k8s.io-reg.sock
I0202 01:11:31.425742 1 main.go:77] Received GetInfo call: &InfoRequest{}
I0202 01:11:54.363628 1 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
E0202 14:07:00.314932 1 connection.go:129] Lost connection to unix:///plugin/csi.sock.
I've found out when I created job which had to mount some PVC with csi-nfs
storageclass and it did not schedule on the node which was rebooted yesterday. Logs from the job:
Warning FailedMount 51s kubelet Unable to attach or mount volumes: unmounted volumes=[dest-volume], unattached volumes=[dest-volume default-token-9btxt]: timed out waiting for the condition
Warning FailedMount 45s (x9 over 2m53s) kubelet MountVolume.MountDevice failed for volume "pvc-bd3d6316-0342-45f0-981d-0cdc9ca165c3" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name nfs.csi.k8s.io not found in the list of registered CSI drivers
Shouldn't it be somehow periodically ensured that the daemon is alive? I can create a cronjob or something for my cluster but thought I will ask first.
Hi,
today I tried to deploy both S3 and NFS Datasets in our environment and they work flawlessly.
However, I found out that the NFS doesn't set up a new directory for each deployment but uses same for all.
Beforehand we've been using nfs-client-provisioner (some helm chart) but it is deprecated now. With that, you configured the nfs path and server and it created new directory for each PVC (under configured nfs path).
This behaviour is very handy because when you don't know in advance what you need, the deployment will create it for you and you don't have to worry about creating new path for each Pod.
Could this be supported?
Scenario: created a Dataset CRD named "kind-example-v0.2-try6-cp4d3f6d318f7c".
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: kind-example-v0.2-try6-cp4d3f6d318f7c
spec:
local:
type: "COS"
secret-name: "bucket-creds"
secret-namespace: "m4d-system"
endpoint: "http://s3.eu.cloud-object-storage.appdomain.cloud"
provision: "true"
bucket: "kind-example-v0.2-try6-cp4d3f6d318f7c"
Problem: A bucket has been successfully created. However, the Dataset status is stuck on "Pending".
Reason:
This is caused by a failure to reconcile a pvc resource. From csi-provisioner-s3-0 log in dlf namespace:
volume_store.go:144] error saving volume pvc-80abdaf5-bb2e-4f3b-a733-4ea96c0f1552: PersistentVolume "pvc-80abdaf5-bb2e-4f3b-a733-4ea96c0f1552" is invalid: spec.csi.name: Invalid value: "kind-example-v0.2-try6-cp4d3f6d318f7c": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'
Hi,
goofys needs syslog if fatal event happens. Installing package netcat-openbsd
and running nc -k -l -U /var/run/syslog &
fixes the issue and enpoint will not disconnect. I think this image quay.io/k8scsi/csi-node-driver-registrar:v1.2.0
needs the fix. (source)
Could you please share the design document to understand how it mounting the bucket/nfs share ... ?
Storage buckets work nicely if they are empty. The existing files and directories are owned by root so they are inaccessible by non-root users in a container. I have tried object stores on GCP, AWS and a custom S3-compatible object store. Note that this applies to the files and directories created outside of DLF via S3 APIs. The ones created via DLF have the correct ownership if a bucket is mounted a second time.
When we are trying to push a big dataset using the ARCHIVE
type, sometimes it will fail due to the large workload. Then after that, all other datasets created by the same DLF cluster won't able to be mounted on any pod. Redeploy DLF and minio won't solve this issue.
Here are the events after pod failing to mount on DLF's PVC:
18m Warning ProvisioningFailed persistentvolumeclaim/example-dataset failed to provision volume with StorageClass "csi-s3": rpc error: code = DeadlineExceeded desc = context deadline exceeded
2s Warning FailedMount pod/nginx Unable to attach or mount volumes: unmounted volumes=[example-dataset], unattached volumes=[example-dataset default-token-7qlxh]: timed out waiting for the condition
3m11s Warning VolumeFailedDelete persistentvolume/pvc-6f6e9892-7fdf-4d8f-b1a2-c75d416c9b97 rpc error: code = Unknown desc = failed to initialize S3 client: Endpoint: does not follow ip address or domain name standards.
Here is the dataset that can ruin the whole DLF cluster:
cat <<EOF | kubectl apply -f -
apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
name: example-dataset
namespace: default
spec:
type: "ARCHIVE"
url: "https://dax-cdn.cdn.appdomain.cloud/dax-oil-reservoir-simulations/1.0.0/oil-reservoir-simulations.tar.gz"
format: "application/x-tar"
EOF
I was trying to configure an S3 dataset with a separate secret definition and realized that Datashim works only when the secrets are in the stringData
format. Since kubectl
creates strings with values in data
, it would be more convenient to allow both formats.
The README states addressing of datasets is done by "using the unique ID defined at creation time" - when I look at the example it looks like the addressing is done by the name of dataset CR - can you maybe clarify that in the README?
By default we are looking at all the namespaces, we should pass a list of namespaces to monitor instead
Hello and thank you for this really cool project.
I am trying to create a dataset on a k8s cluster that is hosted on an open stack provider. And it seems that every time I create a dataset I get a pvc and pv that are very large (9314 Gi) even though the S3 bucket I am using only has dummy data that is less than 1 GB in total.
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
example-dataset-gcs Bound pvc-85d52fa7-ab1e-4a4a-abb9-2ab687455188 9314Gi RWX csi-s3 22m
I thought this was happening because I was using open stack s3 compatible storage. However the same thing occurred when using gcs (which is also supposed to be s3 compatible). I apologize I could not try AWS s3 out because I do not have easy access to an account.
Is there a way to specify how big the PV/PVC can be?
This is my only issue. Everything else seems to work.
I followed the templates here for creating the dataset:
https://github.com/IBM/dataset-lifecycle-framework/blob/master/examples/templates/example-dataset-s3-secrets.yaml
https://github.com/IBM/dataset-lifecycle-framework/blob/master/examples/templates/example-s3-secret.yaml
I installed using this command:
kubectl apply -f https://raw.githubusercontent.com/IBM/dataset-lifecycle-framework/master/release-tools/manifests/dlf.yaml
For this I am using kubernetes 1.19.6 on a rke cluster deployed on an openstack provider.
Should this rbac map to the namespace where csi-nodeplugin
SA is deployed? Looks like this is a typo.
Can Dataset support preexisting Secret where accessKeyID and secretAccessKey are stored? There may be two reasons:
Related, the rest of the information: endpoint, bucket, region may be available in a configmap. Please consider that as secondary.
In opendatahub.io we use https://github.com/opendatahub-io/jupyterhub-singleuser-profiles to customize Jupyter instances for users.
I'd like to come up with a good integration between DLF and the profiles library.
As discussed with @YiannisGkoufas we would need to be able to mount volumes read-only too avoid users overwritting datasets for others
Then we could add a simple way to automatically label pods and add mounts to pods for users from a specific group
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.