Skip to content

Why it can't create tag after pulling image from proxy cache?

stonezdj(Daojun Zhang) edited this page May 12, 2025 · 1 revision

First, Harbor doesn't guarantee it will create a tag for each image it proxied. Just so you know, users should not depend on the tags in the proxy cache project in your solution, for example, replicating the image in the proxy cache project to another remote Harbor registry based on the tag.

Because Harbor's proxy cache mechanism involves three parties: the image pull client (client), Harbor, and the upstream registry. The client usually can be Docker client, containerd, podman, or oras.

In a typical pull request, such as Docker, ctr, or ctrctl, it contains the following HTTP requests:

  1. The client will send a HEAD request to Harbor to get the digest of the current manifest.
  2. The client then sends the GET request to get the content of the manifest by digest.
  3. The client parses the manifest and GETs each layer's blob.

Harbor creates a tag based on request 1 in the background.

For some image pull clients will cache the image information locally, for example, the tag and digest mapping information. If it is cached on the client side, it will skip request 1 and send request 2 to Harbor. Then Harbor creates the manifest without a tag.

If there is any existing blob in the client's local cache, then in step 3, the client will only send requests for blobs that don't exist locally. For those blobs that exist locally, it won't send a GET request to Harbor. Harbor will not cache this layer in Harbor and will also fail to cache the artifact.

If the layer is too large to download in 5 minutes, then Harbor will fail to cache it in Harbor, and then Harbor will also fail the cache process of the manifest that contains this layer.

Container images have different types: basic container images and image indices. Image indexes, for example, busybox:latest, might contain different architectures, such as amd64 and arm64. All child images share the same tag latest. If you pull the image only on an amd64 machine, it only pulls the amd64 architecture image. When caching the full content of the original manifest busybox:latest, Harbor will do an integrity check on the manifest to make sure every layer specified in the manifest exists on the Harbor server. if some layers aren't pulled, such as arm64. It won't cache the manifest of the child image, nor cache the image of this architecture, and also fails to push the full content of the original content. To work around this issue, Harbor trims and pushes the manifest, and tags this trimmed manifest in the repository. It also caches the full content of the original manifest in the Redis server. That is why all image index digests created in the Harbor proxy cache differ from the original ones in the upstream registry.

For some client types, such as skopeo, it only sends the GET request to Harbor, and Harbor will not create a tag for the current image. It is already fixed in this PR: https://github.com/goharbor/harbor/pull/21141.

Reasons for not creating a tag in the proxy cache can be categorized:

  • The pulled image is an image index or image list.
  • The client doesn't send the HEAD manifest request (with tag) to Harbor because it is already cached at the client.
  • The image is too large to cache in 5 minutes.
  • The client only sends the GET request to fetch the manifest.