Skip to main content

2 posts tagged with "container image"

View All Tags

· 4 min read

Containerd Accepted Nydus-snapshotter

Early January, Containerd community has taken in nydus-snapshotter as a sub-project. Check out the code, particular introductions and tutorial from its new repository. We believe that the donation to containerd will attract more users and developers for nydus itself and bring much value to the community users.

Nydus-snapshotter is a containerd's remote snapshotter, it works as a standalone process out of containerd, which only pulls nydus image's bootstrap from remote registry and forks another process called nydusd. Nydusd has a unified architecture, which means it works in form of a FUSE user-space filesystem daemon, a virtio-fs daemon or a fscache user-space daemon. Nydusd is responsible for fetching data blocks from remote storage like object storage or standard image registry, thus to fulfill containers' requests to read its rootfs.

Nydus is an excellent container image acceleration solution which significantly reduces time cost by starting container. It is originally developed by a virtual team from Alibaba Cloud and Ant Group and deployed in very large scale. Millions of containers are created based on nydus image each day in Alibaba Cloud and Ant Group. The underlying technique is a newly designed, container optimized and oriented read-only filesystem named Rafs. Several approaches are provided to create rafs format container image. The image can be pushed and stored in standard registry since it is compatible with OCI image and distribution specifications. A nydus image can be converted from a OCI source image where metadata and files data are split into a "bootstrap" and one or more "blobs" together with necessary manifest.json and config.json. Development of integration with Buildkit is in progress.

rafs disk layout

Nydus provides following key features:

  • Chunk level data de-duplication among layers in a single repository to reduce storage, transport and memory cost
  • Deleted(whiteout) files in certain layer aren't packed into nydus image, therefore, image size may be reduced
  • E2E image data integrity check. So security issues like "Supply Chain Attack" can be avoided and detected at runtime
  • Integrated with CNCF incubating project Dragonfly to distribute container images in P2P fashion and mitigate the pressure on container registries
  • Different container image storage backends are supported. For example, Registry, NAS, Aliyun/OSS and applying other remote storage backend like AWS S3 is also possible.
  • Record files access pattern during runtime gathering access trace/log, by which user's abnormal behaviors are easily caught. So we can ensure the image can be trusted

Beyond above essential features, nydus can be flexibly configured as a FUSE-base user-space filesystem or in-kernel EROFS with an on-demand loader user-space daemon and integrating nydus with VM-based container runtime is much easier.

  • Lightweight integration with VM-based containers runtime like KataContainers. In fact, KataContainers is considering supporting nydus as a native image acceleration solution.
  • Nydus closely cooperates with Linux in-kernel disk filesystem Containers' rootfs can directly be set up by EROFS with lazy pulling capability. The corresponding changes had been merged into Linux kernel since v5.16

To run with runc, nydusd works as FUSE user-space daemon:

runc nydus

To work with KataContainers, it works as a virtio-fs daemon:

kata nydus

Nydus community is working together with Linux Kernel to develop erofs+fscache based user-space on-demand read.

runc erofs nydus

Nydus and eStargz developers are working together on a new project named acceld in Harbor community to provide a general service to support the conversion from OCI v1 image to kinds of acceleration image formats for various accelerator providers, so that keep a smooth upgrade from OCI v1 image. In addition to the conversion service acceld and the conversion tool nydusify, nydus is also supporting buildkit to enable exporting nydus image directly from Dockerfile as a compression type.

In the future, nydus community will work closely with the containerd community on fast and efficient methods and solution of distributing container images, container image security, container image content storage efficiency, etc.

· 7 min read

Introducing Dragonfly Container Image Service

Small is Fast, Large is Slow

With containers, it is relatively fast to deploy web apps, mobile backends, and API services right out of the box. Why? Because the container images they use are generally small (hundreds of MBs).

A larger challenge is deploying applications with a huge container image (several GBs). It takes a good amount of time to have these images ready to use. We want the time spent shortened to a certain extent to leverage the powerful container abstractions to run and scale the applications fast.

Dragonfly has been doing well at distributing container images. However, users still have to download an entire container image before creating a new container. Another big challenge is arising security concerns about container image.

Conceptually, we pack application's environment into a single image that is more easily shared with consumers. Image is then put into a filesystem locally on top of which an application can run. The pieces that are now being launched as nydus are the culmination of the years of work and experience of our team in building filesystems. Here we introduce the dragonfly image service (codename nydus) as an extension to the Dragonfly project. It's software that minimizes download time and provides image integrity check across the whole lifetime of a container, enabling users to manage applications fast and safely.

nydus is co-developed by engineers from Alibaba Cloud and Ant Group. It is widely used in the internal production deployments. From our experience, we value its container creation speedup and image isolation enhancement the most. And we are seeing interesting use cases of it from time to time.

Nydus: Dragonfly Image Service

The nydus project designs and implements an user space filesystem on top of a container image format that improves over the current OCI image specification. Its key features include:

  • Container images are downloaded on demand
  • Chunk level data duplication
  • Flatten image metadata and data to remove all intermediate layers
  • Only usable image data is saved when building a container image
  • Only usable image data is downloaded when running a container
  • End-to-end image data integrity
  • Compactible with the OCI artifacts spec and distribution spec
  • Integrated with existing CNCF project dragonfly to support image distribution in large clusters
  • Different container image storage backends are supported

Nydus mainly consists of a new containier image format and a FUSE (Filesystem in USErspace) daemon to translate it into container accessible mountpoint.

nydus-architecture| center | 768x356

The FUSE daemon takes in either FUSE or virtiofs protocol to service POD created by conventional runc containers or Kata Containers. It supports pulling container image data from container image registry, OSS, NAS, as well as Dragonfly supernode and node peers. It can also optionally use a local directory to cache all container image data to speed up future container creation.

Internally, nydus splits a container image into two parts: a metadata layer and a data layer. The metadata layer is a self-verifiable merkle tree. Each file and directory is a node in the merkle tree with a hash aloneside. A file's hash is the hash of its file content, and a directory's hash is the hash of all of its descendents. Each file is divided into even sized chunks and saved in a data layer. File chunks can be shared among different container images by letting file nodes pointing inside them point to the same chunk location in the shared data layer.

nydus-format| center | 768x356

How can you benefit from nydus?

The immediate benefit of running nydus image service is that users can launch containers almost instantly. In our tests, we found out that nydus can boost container creation from minutes to seconds.

nydus-performance| center | 768x356

Another less-obvious but important benefit is runtime data integration check. With OCIv1 container images, the image data cannot be verified after being unpacked to local directory, which means if some files in the local directories are undermined either intentionally or not, containers will simply take them as is, incurring data leaking risk. In contrast, nydus image won't be unpacked to local directory at all, what's more, given that verification can be enforced on every data access to nydus image, the data leak risk can be completely avoided by forcing to fetch the data from the trusted image registry again.

nydus-integraty| center | 768x356

The Future of Nydus

The above examples showcase the power of nydus. For the last year, we've worked alongside the production team, laser-focused on making nydus stable, secure, easy to use.

Now, as the foundation for nydus has been laid, our new focus is the ecosystem it aims to serve broadly. We envision a future where users install dragonfly and nydus on their clusters, run containers with large image as fast they do with regular size image today, and feel confident about the safety of data on their container image.

For the community

While we have widely deployed nydus in our production, we believe a proper upgrade to OCI image spec shouldn’t be built without the community. To this end, we propose nydus as a reference implementation that aligns well with the OCI image spec v2 proposal [1], and we look forward to working with other industry leaders should this project come to fruition.

FAQ

Q: What are the challenges with oci image spec v1?

Q: How is this different than crfs?

  • The basic idea of the two are quite similar. Deep down, the nydus image format supports chunk level data deduplication and end-to-end data integraty at runtime, which is an improvement over the stargz format used by crfs.

Q: How is this different than Teleport of Azure?

  • Azure Teleport is like the current OCI image format plus a SMB-enabled snapshotter. It supports container image lazy-fetching and suffers from all the Tar format defects. OTOH, nydus deprecates the legacy Tar format and takes advantage of the merkle tree format to provide more advantages over the Tar format.

Q: What if network is down while container is running with nydus?

  • With OCIv1, container would fail to start at all should network be down while container image is not fully downloaded. Nydus has changed that a lot because it goes with lazy fetch/load mechanism, a failure in network may take down a running container. Nydus addresses the problem with a prefetch mechanism which can be configured to
  • run in background right after starting a container.

[1]:OCI Image Specification V2 Requirements

In the mean time, the OCI (Open Container Initiate) community has been actively discussing the emerging of OCI image spec v2 aiming to address new challenges with oci image spec v1.

Starting from June 2020, the OCI community spent more than a month discussing the requirements for OCI image specification v2. It is important to notice that OCIv2 is just a marketing term for updating the OCI specification to better address some use cases. It is not a brand new specification.

The discussion went from an email thread (Proposal Draft for OCI Image Spec V2) and a shared document to several OCI community online meetings, and the result is quite aspiring. The concluded OCIv2 requirements are:

  • Reduced Duplication
  • Canonical Representation (Reproducible Image Building)
  • Explicit (and Minimal) Filesystem Objects and Metadata
  • Mountable Filesystem Format
  • Bill of Materials
  • Lazy Fetch Support
  • Extensibility
  • Verifiability and/or Repairability
  • Reduced Uploading
  • Untrusted Storage

For detailed meaning of each requirement, please refer to the original shared document. We actively joined the community discussions and found out that the nydus project fits nicely to these requirements. It further encouraged us to opensource the nydus project to help the community discussion with a working code base.