feat(smb): add volume isolation and stage/unstage support to SMB CSI … #943

MattPOlson · 2025-04-17T20:01:29Z

…driver

Enables NodeServiceCapability_RPC_STAGE_UNSTAGE_VOLUME and ControllerServiceCapability_RPC_PUBLISH_UNPUBLISH_VOLUME
Appends SHA-256 hash of SMB credentials to volume ID for credential-based volume isolation
NodePublishVolume updated to bind-mount from staging path to pod volume path
Implements NodeUnstageVolume with reference count check using /proc/mounts
Preserves all existing SMB CSI functionality (Kerberos, GID, subDir handling)

What type of PR is this?
/kind feature
/kind design

What this PR does / why we need it:
This PR enhances the SMB CSI driver to support STAGE_UNSTAGE_VOLUME, enabling a single SMB share mount per node with bind mounts for each pod. It also introduces credential-based volume ID hashing to isolate mounts for identical shares accessed with different credentials.

Which issue(s) this PR fixes:
Fixes #353
Fixes #935

Fixes #

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
tested upgrade from previous version

Special notes for your reviewer:
This is a non-breaking change that enhances driver correctness and security without removing or deprecating existing functionality. Happy to add follow-up documentation or test case references.

Release note:

none

testresults.txt

…driver - Enables NodeServiceCapability_RPC_STAGE_UNSTAGE_VOLUME and ControllerServiceCapability_RPC_PUBLISH_UNPUBLISH_VOLUME - Appends SHA-256 hash of SMB credentials to volume ID for credential-based volume isolation - NodePublishVolume updated to bind-mount from staging path to pod volume path - Implements NodeUnstageVolume with reference count check using /proc/mounts - Preserves all existing SMB CSI functionality (Kerberos, GID, subDir handling)

linux-foundation-easycla · 2025-04-17T20:01:33Z

The committers listed above are authorized under a signed CLA.

✅ login: MattPOlson (c954b51, c159577, b472e1a)

k8s-ci-robot · 2025-04-17T20:01:37Z

Welcome @MattPOlson!

It looks like this is your first PR to kubernetes-csi/csi-driver-smb 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/csi-driver-smb has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-04-17T20:01:39Z

Hi @MattPOlson. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-04-17T20:01:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MattPOlson
Once this PR has been reviewed and has the lgtm label, please assign jingxu97 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gnufied · 2025-04-18T14:06:53Z

/ok-to-test

The SMB CSI driver does not implement ControllerPublishVolume or ControllerUnpublishVolume since SMB shares are mounted directly by nodes and do not require controller-side attach/detach. Removing the ControllerServiceCapability_RPC_PUBLISH_UNPUBLISH_VOLUME capability avoids advertising unsupported functionality and allows the sanity test suite to pass. This change does not impact node-side STAGE_UNSTAGE_VOLUME support, which remains fully functional.

…ort Windows and Darwin Refactored NodeUnstageVolume to use a platform-aware HasMountReferences() helper, moving Linux-specific /proc/mounts parsing into smb_common_linux.go and stubbing it out for Windows and Darwin to prevent test failures on non-Linux environments. - Fixes Windows e2e failures due to /proc/mounts not being available - Ensures future compatibility for multi-platform CSI driver builds - Preserves original Linux mount reference tracking behavior

coveralls · 2025-04-21T02:24:57Z

Pull Request Test Coverage Report for Build 14562959558

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

61 of 91 (67.03%) changed or added relevant lines in 4 files are covered.
23 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-2.3%) to 76.174%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/smb/controllerserver.go	8	10	80.0%
pkg/smb/smb_common_linux.go	12	16	75.0%
pkg/smb/nodeserver.go	39	63	61.9%

Files with Coverage Reduction	New Missed Lines	%
pkg/smb/smb_common_linux.go	6	61.76%
pkg/smb/nodeserver.go	17	68.49%

Totals
Change from base Build 14400027271:	-2.3%
Covered Lines:	1119
Relevant Lines:	1469

💛 - Coveralls

andyzhangx

@MattPOlson thanks for the fix, some comments:

Adds credential-based hashing to the volume ID so PVCs using the same share + same credentials resolve to the same global mount.

from my understanding, this behavior change does not solve the issue since in common scenario one file share using one credential

Uses NodeStageVolume + bind mounts to avoid duplicate mounts entirely.

I don't find any code change in NodeStageVolume function in your PR

Updates NodeUnstageVolume to safely unmount the global share only when it's no longer referenced by any bind mount.

I think this is the most important fix that solves this issue! could you only make this change in the PR and we could skip windows related change since Windows does not allow multiple mounts on the same smb share.

Thanks

MattPOlson · 2025-04-21T13:10:16Z

@andyzhangx

Thanks for the thoughtful feedback, really appreciate the detailed review.

You're absolutely right that the NodeUnstageVolume() enhancement is the core fix. That reference-checking logic is what ensures the driver does not prematurely unmount the global SMB share while bind mounts from other pods are still active. Without this, the kubelet can misinterpret active usage and trigger unmount errors, especially when multiple pods or workloads interact with the same SMB share.

Regarding the credential-based hashing, I added that to address a secondary concern mentioned in #353 specifically, the risk of a new PVC unintentionally reusing a stale global mount that was originally mounted with different credentials. While that scenario is rare (most real-world use cases use a single credential per share), I wanted to guard against accidental credential leakage. That said, I’m completely fine with removing this from the PR to keep the scope focused.

On the NodeStageVolume() question confirmed, there were no functional changes made to that method in this PR. All staging behavior remains as it was upstream. I originally had an idea around that but later changed my mind.

The Windows-related changes were added to ensure both build compatibility and correct runtime behavior across platforms. The reference-checking logic in NodeUnstageVolume() depends on /proc/mounts, which only exists on Linux. To make this logic portable and safe:

I introduced a platform-aware HasMountReferences() helper.
On Linux, it performs the actual /proc/mounts inspection to prevent premature unmounting.
On Windows and Darwin, it’s safely stubbed to return false, which aligns with their behavior, since Windows does not support multiple mounts of the same SMB share, this logic isn’t applicable.
These guards are necessary to ensure the driver remains both functional and portable across supported operating systems.

In summary, I'm happy to:

Keep this PR focused on just the NodeUnstageVolume() fix (with required platform support)

Remove the credential hashing logic

Let me know if you’d prefer this rebased down to just the essential fix, I can prep that immediately. Thanks again for helping guide this improvement!

andyzhangx · 2025-04-21T14:43:50Z

pkg/smb/smb_common_linux.go

@@ -48,3 +51,23 @@ func prepareStagePath(_ string, _ *mount.SafeFormatAndMount) error {
 func Mkdir(_ *mount.SafeFormatAndMount, name string, perm os.FileMode) error {
 	return os.Mkdir(name, perm)
 }
+
+func HasMountReferences(stagingTargetPath string) (bool, error) {


can you add explanation for this func? I still doubt how did you solve the problem by only searching stagingTargetPath since in my testing environment (with 2 replicas referencing to one PVC), the bind mount would be like following, you need to count the target mount share during NodeUnstageVolume to make sure there are no other mounts referencing the smb share.

# cat /proc/mounts | grep smb //smb-server.default.svc.cluster.local/share/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9 /var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/085ffd65e2835034cdf2a23f67a498673427c7497def5d808bfb505c0df0b1a4/globalmount cifs rw,relatime,vers=3.1.1 //smb-server.default.svc.cluster.local/share/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9 /var/lib/kubelet/pods/4aad1717-a5cb-4f93-a28b-2a8a8c36d1bd/volumes/kubernetes.io~csi/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9/mount cifs rw,relatime,vers=3.1.1 //smb-server.default.svc.cluster.local/share/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9 /var/lib/kubelet/pods/2164da99-d68f-4886-9bcb-9bb9a42c844f/volumes/kubernetes.io~csi/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9/mount cifs rw,relatime,vers=3.1.1

can you add explanation for this func? I still doubt how did you solve the problem by only searching stagingTargetPath since in my testing environment (with 2 replicas referencing to one PVC), the bind mount would be like following, you need to count the target mount share during NodeUnstageVolume to make sure there are no other mounts referencing the smb share.

# cat /proc/mounts | grep smb //smb-server.default.svc.cluster.local/share/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9 /var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/085ffd65e2835034cdf2a23f67a498673427c7497def5d808bfb505c0df0b1a4/globalmount cifs rw,relatime,vers=3.1.1 //smb-server.default.svc.cluster.local/share/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9 /var/lib/kubelet/pods/4aad1717-a5cb-4f93-a28b-2a8a8c36d1bd/volumes/kubernetes.io~csi/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9/mount cifs rw,relatime,vers=3.1.1 //smb-server.default.svc.cluster.local/share/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9 /var/lib/kubelet/pods/2164da99-d68f-4886-9bcb-9bb9a42c844f/volumes/kubernetes.io~csi/pvc-00a18b4a-977d-4ce8-a911-fc661a7332f9/mount cifs rw,relatime,vers=3.1.1

@andyzhangx
You’re right that our initial implementation of HasMountReferences() was scanning /proc/mounts for paths prefixed by stagingTargetPath, which doesn't fully reflect how bind mounts work in real-world deployments. As you pointed out, kubelet performs bind mounts from the global path into pod-specific paths like /var/lib/kubelet/pods/.../volumes/.../mount, and these paths are not subdirectories of the staging path.

The correct approach here is to:

Parse /proc/mounts

Count the number of entries where the source (the SMB share URI) matches the mounted share

Only proceed with unmounting if stagingTargetPath is the last remaining mount target

I’ll update the implementation of HasMountReferences() to reflect this, by comparing entries that have the same mount source as the staging path. This aligns with how GetDeviceMountRefs() works internally in kubelet, and ensures that the global mount is only unmounted once all bind mounts are gone.

Thanks for flagging this — I’ll update the PR accordingly.

@MattPOlson thanks for the work, pls also provide the /proc/mounts examples when there are multiple PVCs using the same file share in the PR description, I think we only need this fix right now, thanks.

@andyzhangx
Thanks for the continued feedback and review.

After extensive testing and deeper inspection of /proc/mounts and /proc/self/mountinfo, I’ve confirmed that there is no reliable or portable way to determine which global mount path a pod bind mount is referencing when multiple global mounts exist for the same SMB share.

The Problem
When two PVCs reference the same SMB share (e.g., //smb-server/share) and get different volume handles, the driver ends up creating multiple global mounts like:

/var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/<guid1>/globalmount /var/lib/kubelet/plugins/kubernetes.io/csi/smb.csi.k8s.io/<guid2>/globalmount

Then, all pod mounts — regardless of which global mount they bind from — show up in /proc/mounts with the same source:

source: //smb-server/share device: 0:334

So:

We can detect that the share is still in use.

But we cannot detect which global mount path is actually being used.

There’s no way to know which global mount is safe to unmount.

This leads to orphaned global mounts, errors in GetDeviceMountRefs, and premature or blocked unmounts.

Why Normalizing the Global Mount is the Fix
If the driver normalizes volume handles (e.g., by hashing the share + credentials), then:

Kubernetes will use the same volumeHandle for identical mounts

Only one global mount path is created per node per share

All bind mounts share that one staging path

Cleanup is simple: once all bind mounts are gone, the staging path is unmounted

This aligns with the behavior of block devices and is consistent with Kubernetes’ expectations for how NodeStageVolume and NodePublishVolume interact.

Summary
Without volumeHandle normalization, there's no reliable mechanism to correlate bind mounts back to the correct staging path — especially with CIFS, where device IDs and mount sources are always the same.

That’s why I strongly recommend restoring the normalization logic. It’s the only way to ensure safe, deterministic cleanup of global mounts in multi-PVC, multi-pod SMB scenarios.

andyzhangx · 2025-04-21T14:44:55Z

@MattPOlson thanks for the detailed explanations, pls only make necessary changes to fix the original issue.

MattPOlson · 2025-04-22T17:55:09Z

@MattPOlson thanks for the detailed explanations, pls only make necessary changes to fix the original issue.

@andyzhangx

Thanks again for the review, I’d like to clarify why the volumeHandle normalization is essential to fully solving the problem we're seeing in production, even with the upstream behavior.

How This Happens with the Current Upstream Driver
Although the upstream SMB CSI driver generates a unique volumeHandle per PVC, those handles can still reference the same underlying SMB share (e.g., //smb-server/share). This leads to the following behavior:

Multiple PVCs, each with unique volumeHandles, stage the same SMB share.

This results in multiple global mount paths, such as:

/plugins/...//globalmount

But from the Linux kernel's perspective, all those mounts reference the same CIFS device.

When one PVC is deleted, the kubelet tries to unmount its corresponding globalmount path.

However, GetDeviceMountRefs() detects that the same source is still mounted in other locations, and kubelet refuses to unmount, thinking it's still in use.

We've confirmed this with logs like:

GetDeviceMountRefs check failed ... device mount path is still mounted by other references [...]

This behavior becomes more common when Helm charts, templates, or automation generate many PVCs that point to the same SMB share.

How volumeHandle Normalization Solves This
By reintroducing a deterministic volumeHandle, computed as a hash of source + credentials, we can ensure:

The same share + creds → same volumeHandle

Only one global mount is created per node

Additional PVCs referencing the same share use bind mounts only (via NodePublishVolume)

NodeUnstageVolume only unmounts when no other pods are using the mount

This approach has several advantages:

Prevents GetDeviceMountRefs() collisions and premature unmount failures
Reduces duplicate mounts and simplifies cleanup
Enforces credential-based isolation (if creds differ, mount paths differ)
Retains existing upstream logic for all other paths

TL;DR
The current upstream behavior causes multiple global mounts to the same share

Linux sees them as the same device and blocks unmounts

Our changes resolve this by ensuring the share is only mounted once per node

The fix is safe, backward-compatible, and improves both reliability and security

Happy to reintroduce this logic in a scoped commit if you'd like to review it in isolation — thanks again for considering!

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. labels Apr 17, 2025

k8s-ci-robot added kind/design Categorizes issue or PR as related to design. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 17, 2025

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 17, 2025

k8s-ci-robot requested review from andyzhangx and ZeroMagic April 17, 2025 20:01

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 17, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 18, 2025

MattPOlson added 2 commits April 19, 2025 17:01

andyzhangx reviewed Apr 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(smb): add volume isolation and stage/unstage support to SMB CSI … #943

feat(smb): add volume isolation and stage/unstage support to SMB CSI … #943

MattPOlson commented Apr 17, 2025

linux-foundation-easycla bot commented Apr 17, 2025 •

edited

Loading

k8s-ci-robot commented Apr 17, 2025

k8s-ci-robot commented Apr 17, 2025

k8s-ci-robot commented Apr 17, 2025

gnufied commented Apr 18, 2025

coveralls commented Apr 21, 2025

andyzhangx left a comment •

edited

Loading

MattPOlson commented Apr 21, 2025 •

edited

Loading

andyzhangx Apr 21, 2025

MattPOlson Apr 22, 2025 •

edited

Loading

andyzhangx Apr 24, 2025

MattPOlson May 2, 2025

andyzhangx commented Apr 21, 2025

MattPOlson commented Apr 22, 2025

feat(smb): add volume isolation and stage/unstage support to SMB CSI … #943

Are you sure you want to change the base?

feat(smb): add volume isolation and stage/unstage support to SMB CSI … #943

Conversation

MattPOlson commented Apr 17, 2025

linux-foundation-easycla bot commented Apr 17, 2025 • edited Loading

k8s-ci-robot commented Apr 17, 2025

k8s-ci-robot commented Apr 17, 2025

k8s-ci-robot commented Apr 17, 2025

gnufied commented Apr 18, 2025

coveralls commented Apr 21, 2025

Pull Request Test Coverage Report for Build 14562959558

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

andyzhangx left a comment • edited Loading

Choose a reason for hiding this comment

MattPOlson commented Apr 21, 2025 • edited Loading

andyzhangx Apr 21, 2025

Choose a reason for hiding this comment

MattPOlson Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

andyzhangx Apr 24, 2025

Choose a reason for hiding this comment

MattPOlson May 2, 2025

Choose a reason for hiding this comment

andyzhangx commented Apr 21, 2025

MattPOlson commented Apr 22, 2025

linux-foundation-easycla bot commented Apr 17, 2025 •

edited

Loading

andyzhangx left a comment •

edited

Loading

MattPOlson commented Apr 21, 2025 •

edited

Loading

MattPOlson Apr 22, 2025 •

edited

Loading