AI Model Distribution Challenges and Best Practices

Speakers: Wenbo Qi, Xiaoya Xia, Peng Tao, Wenpeng Li, Han Jiang

This video is posted in 2025-06-10.

As the demand for scalable AI/ML grows, efficiently distributing AI models in cloud-native infrastructure has become a pivotal challenge for enterprises. The panel dives into the technical and operational strategies for deploying models at scale -- from optimizing model storage and transfer to ensuring consistency across clusters and regions. Experts from different companies and CNCF projects will debate critical questions like: How can Kubernetes-native workflows automate and accelerate model distribution while minimizing latency and bandwidth costs? How to efficiently distribute huge models sizing hundreds of GBs or TBs? What are the challenges proposed by distributed inference and the prefilling-decoding architecture? How are models updated in the reinforcement learning post-training paradigm? What role do standards like OCI artifacts or specialized registries play in streamlining versioned model delivery?