Skip to main content
Version: Next

v2.5

This document outlines the roadmap for Dragonfly v2.5, focusing on performance optimization, enhanced functionality, and expanded use cases in AI/ML workloads. Dragonfly v2.5 is scheduled for release on June 30, 2026.

Core Components

Manager

  • Enhance service performance and resource utilization while reducing CPU/Memory overhead.
  • Implement visualization for Persistent Task/Cache Task features.
  • Enhance user experience and UI design in the Manager Console.

Scheduler

  • Enhance service performance and resource utilization while reducing CPU/Memory overhead.
  • Optimize the scheduling algorithm to improve bandwidth utilization in the P2P network.

Client

  • Enhance service performance and resource utilization while reducing CPU/Memory overhead.
  • Implement Dfstore command for persistent task.

Service Quality

  • Implement a client-side download task queue to prevent too many concurrent downloads.
  • Implement circuit breakers and rate limiting for each component to prevent cascading failures during sudden traffic spikes.
  • Centralized rate limiting for download tasks and back-to-origin traffic at the cluster level to prevent excessive load on the origin.
  • Support emergency plans and implement service degradation for specified requests.

File Distribution

  • Implement a bandwidth-aware negotiation protocol to distribute requests across multiple parent nodes, preventing single-parent bottlenecks.
  • Optimize the Dragonfly Injector (Webhook) to support injecting the Dragonfly download tool into containers, thereby improving ease of use in cloud-native environment.

AI Model/Dataset Distribution

  • Implement RDMA-based distribution of model weights.
  • Support cache task memory-level download tasks.

AI Agent

  • Enhanced Snapshotter's snapshot and restore performance.

Others

Observability

  • Improve and refine the monitoring metrics system.
  • Optimize the alerting mechanism and enhance issue diagnosis capabilities.

Security

  • Add gRPC Auth Token authentication mechanism.

Testing

  • Add more E2E tests and unit tests.

Documentation

  • Enhance the landing page UI.
  • Add more documentation on system interactions and implementation details.

Nydus

Testing

  • Containerize smoke tests.
  • The unit test coverage for medium to large PRs should not be lower than the current project coverage rate.

Core Components

Nydusd

  • Integrate Dragonfly SDK to request Dragonfly cache service.
  • RAFS V6 fuse and EROFS switching. When all local cached blob files exist, consider switching to EROFS to reduce fuse overhead.
  • Solution for nydusd token permanent expiration, consider supporting hot update capability for configuration files and auth.
  • Remove support for external volume for modelpack. This solution is no longer in use, so remove related code.

Snapshotter

  • snapshotter helm chart migration

Nydusify

  • support compacting image on commit

Agent Sandbox

  • Best practice documentation for using nydus in agent sandbox scenarios.