Helm Charts
Documentation for deploying Dragonfly on kubernetes using helm.
For more integrations such as Docker, CRI-O, Singularity/Apptainer, Nydus, eStargz, Harbor, Git LFS, Hugging Face, TorchServe, Triton Server, etc., refer to Integrations.
Prerequisites
Name | Version | Document |
---|---|---|
Kubernetes cluster | 1.20+ | kubernetes.io |
Helm | v3.8.0+ | helm.sh |
containerd | v1.5.0+ | containerd.io |
Setup kubernetes cluster
Kind is recommended if no Kubernetes cluster is available for testing.
Create kind multi-node cluster configuration file kind-config.yaml
, configuration content is as follows:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
Create a kind multi-node cluster using the configuration file:
kind create cluster --config kind-config.yaml
Switch the context of kubectl to kind cluster:
kubectl config use-context kind-kind
Kind loads Dragonfly image
Pull Dragonfly latest images:
docker pull dragonflyoss/scheduler:latest
docker pull dragonflyoss/manager:latest
docker pull dragonflyoss/client:latest
docker pull dragonflyoss/dfinit:latest
Kind cluster loads Dragonfly latest images:
kind load docker-image dragonflyoss/scheduler:latest
kind load docker-image dragonflyoss/manager:latest
kind load docker-image dragonflyoss/client:latest
kind load docker-image dragonflyoss/dfinit:latest
Create Dragonfly cluster based on helm charts
Create the Helm Charts configuration file values.yaml
, and set the container runtime to containerd
.
Please refer to the configuration documentation for details.
manager:
image:
repository: dragonflyoss/manager
tag: latest
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
scheduler:
image:
repository: dragonflyoss/scheduler
tag: latest
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
seedClient:
image:
repository: dragonflyoss/client
tag: latest
metrics:
enable: true
config:
verbose: true
client:
image:
repository: dragonflyoss/client
tag: latest
metrics:
enable: true
config:
verbose: true
dfinit:
enable: true
image:
repository: dragonflyoss/dfinit
tag: latest
config:
containerRuntime:
containerd:
configPath: /etc/containerd/config.toml
registries:
- hostNamespace: docker.io
serverAddr: https://index.docker.io
capabilities: ['pull', 'resolve']
Create a Dragonfly cluster using the configuration file:
$ helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
$ helm install --wait --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f values.yaml
NAME: dragonfly
LAST DEPLOYED: Thu Apr 18 19:26:39 2024
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"
2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.
3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/
Check that Dragonfly is deployed successfully:
$ kubectl get po -n dragonfly-system
NAME READY STATUS RESTARTS AGE
dragonfly-client-gvspg 1/1 Running 0 34m
dragonfly-client-kxrhh 1/1 Running 0 34m
dragonfly-manager-864774f54d-6t79l 1/1 Running 0 34m
dragonfly-mysql-0 1/1 Running 0 34m
dragonfly-redis-master-0 1/1 Running 0 34m
dragonfly-redis-replicas-0 1/1 Running 0 34m
dragonfly-redis-replicas-1 1/1 Running 0 32m
dragonfly-redis-replicas-2 1/1 Running 0 32m
dragonfly-scheduler-0 1/1 Running 0 34m
dragonfly-seed-client-0 1/1 Running 5 (21m ago) 34m
Containerd downloads images through Dragonfly
Pull alpine:3.19
image in kind-worker node:
docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19
Verify
You can execute the following command to check if the alpine:3.19
image is distributed via Dragonfly.
# Find pod name.
export POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=client" -o=jsonpath='{.items[?(@.spec.nodeName=="kind-worker")].metadata.name}' | head -n 1 )
# Find task id.
export TASK_ID=$(kubectl -n dragonfly-system exec ${POD_NAME} -- sh -c "grep -hoP 'library/alpine.*task_id=\"\K[^\"]+' /var/log/dragonfly/dfdaemon/* | head -n 1")
# Check logs.
kubectl -n dragonfly-system exec -it ${POD_NAME} -- sh -c "grep ${TASK_ID} /var/log/dragonfly/dfdaemon/* | grep 'download task succeeded'"
# Download logs.
kubectl -n dragonfly-system exec ${POD_NAME} -- sh -c "grep ${TASK_ID} /var/log/dragonfly/dfdaemon/*" > dfdaemon.log
The expected output is as follows:
{
2024-04-19T02:44:09.259458Z INFO
"download_task":"dragonfly-client/src/grpc/dfdaemon_download.rs:276":: "download task succeeded"
"host_id": "172.18.0.3-kind-worker",
"task_id": "a46de92fcb9430049cf9e61e267e1c3c9db1f1aa4a8680a048949b06adb625a5",
"peer_id": "172.18.0.3-kind-worker-86e48d67-1653-4571-bf01-7e0c9a0a119d"
}
Performance testing
Containerd pull image back-to-source for the first time through Dragonfly
Pull alpine:3.19
image in kind-worker
node:
time docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19
When pull image back-to-source for the first time through Dragonfly, it takes 28.82s
to download the alpine:3.19
image.
Containerd pull image hits the cache of remote peer
Delete the client whose Node is kind-worker
to clear the cache of Dragonfly local Peer.
# Find pod name.
export POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=client" -o=jsonpath='{.items[?(@.spec.nodeName=="kind-worker")].metadata.name}' | head -n 1 )
# Delete pod.
kubectl delete pod ${POD_NAME} -n dragonfly-system
Delete alpine:3.19
image in kind-worker
node:
docker exec -i kind-worker /usr/local/bin/crictl rmi alpine:3.19
Pull alpine:3.19
image in kind-worker
node:
time docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19
When pull image hits cache of remote peer, it takes 12.524s
to download the
alpine:3.19
image.
Containerd pull image hits the cache of local peer
Delete alpine:3.19
image in kind-worker
node:
docker exec -i kind-worker /usr/local/bin/crictl rmi alpine:3.19
Pull alpine:3.19
image in kind-worker
node:
time docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19
When pull image hits cache of local peer, it takes 7.432s
to download the
alpine:3.19
image.
Preheat image
Expose manager's port 8080
:
kubectl --namespace dragonfly-system port-forward service/dragonfly-manager 8080:8080
Please create personal access Token before calling Open API, and select job
for access scopes, refer to personal-access-tokens.
Use Open API to preheat the image alpine:3.19
to Seed Peer, refer to preheat.
curl --location --request POST 'http://127.0.0.1:8080/oapi/v1/jobs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your_personal_access_token' \
--data-raw '{
"type": "preheat",
"args": {
"type": "image",
"url": "https://index.docker.io/v2/library/alpine/manifests/3.19",
"filteredQueryParams": "Expires&Signature",
"username": "your_registry_username",
"password": "your_registry_password"
}
}'
The command-line log returns the preheat job id:
{
"id": 1,
"created_at": "2024-04-18T08:51:55Z",
"updated_at": "2024-04-18T08:51:55Z",
"task_id": "group_2717f455-ff0a-435f-a3a7-672828d15a2a",
"type": "preheat",
"state": "PENDING",
"args": {
"filteredQueryParams": "Expires&Signature",
"headers": null,
"password": "",
"pieceLength": 4194304,
"platform": "",
"tag": "",
"type": "image",
"url": "https://index.docker.io/v2/library/alpine/manifests/3.19",
"username": ""
},
"scheduler_clusters": [
{
"id": 1,
"created_at": "2024-04-18T08:29:15Z",
"updated_at": "2024-04-18T08:29:15Z",
"name": "cluster-1"
}
]
}
Polling the preheating status with job id:
curl --request GET 'http://127.0.0.1:8080/oapi/v1/jobs/1' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your_personal_access_token'
If the status is SUCCESS
, the preheating is successful:
{
"id": 1,
"created_at": "2024-04-18T08:51:55Z",
"updated_at": "2024-04-18T08:51:55Z",
"task_id": "group_2717f455-ff0a-435f-a3a7-672828d15a2a",
"type": "preheat",
"state": "SUCCESS",
"args": {
"filteredQueryParams": "Expires&Signature",
"headers": null,
"password": "",
"pieceLength": 4194304,
"platform": "",
"tag": "",
"type": "image",
"url": "https://index.docker.io/v2/library/alpine/manifests/3.19",
"username": ""
},
"scheduler_clusters": [
{
"id": 1,
"created_at": "2024-04-18T08:29:15Z",
"updated_at": "2024-04-18T08:29:15Z",
"name": "cluster-1"
}
]
}
Pull alpine:3.19
image in kind-worker
node:
time docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19
When pull image hits preheat cache, it takes 11.030s
to download the
alpine:3.19
image.