Skip to content

Support certificate auto-rotation in Kestrel #32351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aelij opened this issue May 3, 2021 · 81 comments · Fixed by #49979
Closed

Support certificate auto-rotation in Kestrel #32351

aelij opened this issue May 3, 2021 · 81 comments · Fixed by #49979
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-kestrel partner-impact
Milestone

Comments

@aelij
Copy link
Contributor

aelij commented May 3, 2021

Is your feature request related to a problem? Please describe.

In Kubernetes, certificates are mounted as secret volumes, which can be configured to update automatically when the cert is rotated (e.g. from Key Vault). To achieve auto-rotation in Kestrel today, we need to hook up ServerCertificateSelector and listen to file changes (e.g. using IFileProvider.Watch).

Describe the solution you'd like

Add a HttpsConnectionAdapterOptions.ServerCertificatePath property that would watch for file changes.

@blowdart
Copy link
Contributor

blowdart commented May 3, 2021

This is a feature we're considering, above and beyond kubernetes. We're not at the design stage yet, but it's on the radar

@blowdart blowdart added this to the Backlog milestone May 3, 2021
@ghost
Copy link

ghost commented May 3, 2021

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

@davidfowl
Copy link
Member

davidfowl commented May 5, 2021

@blowdart this is something I think we should support natively. We did work in .NET 5 to make Kestrel respect configuration reload but that doesn't work well for things in configuration that change without configuration itself changing.

This is going to affect YARP as well @Tratcher.

@aelij Are you using certmgr?

@davidfowl davidfowl removed this from the Backlog milestone May 5, 2021
@aelij
Copy link
Contributor Author

aelij commented May 5, 2021

@davidfowl No, we're planning on using the new Secrets Store CSI Driver once it's out of preview, which natively supports auto-rotation.

@blowdart blowdart self-assigned this May 5, 2021
@BrennanConroy BrennanConroy added this to the Next sprint planning milestone May 5, 2021
@ghost
Copy link

ghost commented May 5, 2021

Thanks for contacting us.

We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

@adityamandaleeka
Copy link
Member

@blowdart Removing this from 6. Please move it back if you think it should get done.

@ghost
Copy link

ghost commented Sep 1, 2021

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

@Tratcher Tratcher modified the milestones: Backlog, .NET 7 Planning Oct 28, 2021
@adityamandaleeka adityamandaleeka assigned amcasey and unassigned blowdart Mar 20, 2023
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
@amcasey
Copy link
Member

amcasey commented Jun 5, 2023

Certs can be configured either in code or via IConfiguration (e.g. in appsettings.json). There's already a mechanism for opting into reloading when the IConfiguration changes and it would be pretty straightforward to model a change to the certificate file as a change to the configuration. Code-based configuration is a little trickier, since it doesn't presently have a notion of "change".

@aelij @craigktreasure Is IConfiguration-based configuration sufficient for your use cases? I can imagine that you might be getting a dynamic path at runtime, making this impossible.

@aelij
Copy link
Contributor Author

aelij commented Jun 5, 2023

IConfiguration should be enough, provided it can support multiple certs. Just to be sure we're on the same page, the cert's file path would not change in our case.

@amcasey
Copy link
Member

amcasey commented Jun 5, 2023

@aelij Tell me more about the multiple certs? I assume you mean different certs for different endpoints?

Yes, we're talking about the cert file changing in-place without a corresponding change to the path or (e.g.) appsettings.json.

@aelij
Copy link
Contributor Author

aelij commented Jun 5, 2023

I assume you mean different certs for different endpoints?

Yes

@craigktreasure
Copy link

In my case, it was single cert and the path was specified on disk via IConfiguration through an environment variable.

@amcasey
Copy link
Member

amcasey commented Jun 5, 2023

Excellent. If the IConfiguration-based approach works for everyone, we should be able to do this without introducing new API surface area for consumers to reason about. I'll follow up on that approach and report back.

@amcasey
Copy link
Member

amcasey commented Aug 18, 2023

Expose symlink information from, e.g., IFileInfo.

@amcasey what kind of symlink information do you need? Getting the target of a symlink in order to query its metadata should be possible with FileSystemInfo.ResolveLinkTarget.

That's at the System.IO layer, but we're working at the Microsoft.Extensions.FileProviders layer. We don't want to dig down in case the backing implementation isn't a disk.

Looking at the polling source, it looks like it makes its has-changed decision based solely on the last write time.

Yes, I assume that's probably to make polling less costly. Were you expecting it to work in a different way like comparing the contents?

No, that's what I expected. Seeing the redundant event message (which indicates that the new and old mtimes were the same) made me think there might be some other criteria.

Interesting, it's still not working with DOTNET_USE_POLLING_FILE_WATCHER.

@aelij I tried to look at your repro https://github.com/aelij/kestrel-aks-kv-rotation but it is too minimal to see how the files are polled and I'm not very familiar with the internals of ASP.NET Core. Am I understanding correctly that there should be PhysicalFileWatchers for /certs/cert1.key and /certs/cert1.crt? Can you point me to the source where they are used? I'm trying to understand if this is really a bug in the dotnet/runtime.

Sorry, it's a long thread, but there are some notes above. The file layout is described here and the watch is on (in that example),
tls.key.

(Note that @aelij, who has been exceptionally helpful, is on European time and is, hopefully, enjoying the weekend.)

@davidfowl
Copy link
Member

davidfowl commented Aug 18, 2023

What we need is watching files that are actually sym linked to trigger the change token. It needs to work like a normal file change. I'm not sure we need to know if a file is a sym link or not, that might be useful in general but for this, it should just work if I watch a file and that file changes and it happens to be a link. If that behavior needs to be opt-in, it should be opt-in, but the complexity should be hidden in the watch implementation.

@jozkee
Copy link
Member

jozkee commented Aug 18, 2023

Seeing the redundant event message (which indicates that the new and old mtimes were the same) made me think there might be some other criteria.

That could be dotnet/runtime#55951 if it is executing fast enough, i.e: in milliseconds. But if this was tested manually then is unlikely.

> What we need is watching files that are actually sym linked to trigger the change token

Well that was addressed already on dotnet/runtime#55664 unless its buggy.
EDIT: I think you are to referring to what you want in aspnetcore 😅.

@pinkfloydx33
Copy link

Why does the reload work for normal AppSettings files mounted into a container in the same manner, but not in this case?

@jozkee
Copy link
Member

jozkee commented Aug 21, 2023

@pinkfloydx33 that's what I'm trying to understand too.

@amcasey
Copy link
Member

amcasey commented Aug 21, 2023

I'm wondering if this was for an actually redundant change event. Looking at the polling source, it looks like it makes its has-changed decision based solely on the last write time.

Figured this out over the weekend: the change event is for the target of the link and the timestamp check is on the link itself. (If there's some way to pull the path out of the event, I haven't found it, so I'm associating a path with each change token manually.)

I'd like to update the mtime check to consider the target-of-link mtime as well, but (AFAICT) IFileInfo doesn't expose ResolveLinkTarget so that's not possible. However, if we force it to use polling, we should be able to drop the mtime check entirely, since it doesn't send redundant events like FileSystemWatcher.

@amcasey
Copy link
Member

amcasey commented Aug 21, 2023

Why does the reload work for normal AppSettings files mounted into a container in the same manner, but not in this case?

Where do those files sit in the directory hierarchy you listed above? Is there an appsettings.json next to each tls.key?

@pinkfloydx33
Copy link

In that particular example no, it's just the key.

But generally that's exactly how I mount AppSettings files in a container (at /app/settings but same concepts apply). If I terminal into the pod and edit one of those files and look at the logs I can see my yarp gateway refresh within seconds.

@amcasey
Copy link
Member

amcasey commented Aug 21, 2023

If the app is finding its appsettings.json using a path that doesn't require symlink resolution, I'd expect it to just work. Alternatively, have you enabled polling file watching?

@amcasey
Copy link
Member

amcasey commented Aug 21, 2023

This thread is getting fairly long, so I'll attempt to summarize our current status.

Problem

The appsettings.json file for an aspnetcore app contains the path to a certificate file. We would like to reload endpoints using that certificate file if it changes on disk, even if there's no corresponding change to the appsettings file. The reload behavior will be the same as if the appsettings file had been updated with a new certificate path.

File Layout

app/
    tls.key -> ..data/tls.key
    ..data/ -> 2023_08_14/
    2023_08_14/
        tls.key
    2023_08_15/
        tls.key

That is, tls.key is a symlink to ..data/tls.key, which ultimately resolves to 2023_08_14/tls.key. (The actual timestamped directory is more precise - this is just an example.)

The file change we expect to see in practice is an update of ..data/ to point at 2023_08_15/, rather than 2023_08_14/.

Status

At present, aspnetcore adds a file watcher (either FileSystemWatcher or polling, according to the user's environment variables) to tls.key. There's no globbing - it's just that single file.

When using FileSystemWatcher, symlinks are not considered at all. When the ..data/ symlink is updated, the watcher produces no events and aspnetcore does not reload.

When using polling, the mtime of the resolved symlink target (i.e. 2023_08_14/tls.key before the change and 2023_08_15/tls.key afterward) is used. However, to compensate for duplicate events in the FileSystemWatcher case, aspnetcore drops events that don't update the mtime of the watched file. Unfortunately, as far as aspnetcore knows, it is only watching tls.key so that is the only mtime it checks. IFileSystem does not expose ResolveLinkTarget, so there's no way to consider symlinks.

As a result, aspnetcore does not reload the affected endpoints in either mode.

Ways Forward

Regardless of which path we take, I'll need to update the code to stop instantiating PhysicalFileWatcher directly since, as @davidfowl pointed out, we should be retrieving file watchers from the IHostEnvironment.

Option: Do Nothing

If we make no further changes (beyond the fix mentioned above), things won't work on k8s, which was the motivating scenario for this work. It will, however, result in a low-risk change with intelligible behavior.

Option: Adjust the mtime checks

We could detect that the user has opted for polling file watching and skip the mtime check in that case. That would be enough to unblock the k8s scenario, but would likely increase CPU usage by making all file watching poll-based.

Option: Remove the mtime checks

If we remove the mtime checks but allow and/or default to FileSystemWatcher, we will see redundant change events that cause unnecessary reloads. It's possible that debouncing at some other layer will mitigate this.

Option: Switch from mtime checks to SHAs

Obviously, this would use a lot more CPU. We can limit this by combining it with polling (i.e. so we hash at most every 4 seconds). Given that IFileProvider didn't feel the need to do this, I'm assuming it's overkill.

Draft: #50172

Option: Implement symlink support in kestrel

We have to (a) move watchers higher up the file hierarchy to detect changes to directories earlier in the path and write a bunch of abstraction-violating and error-prone code. It's doable but ugly and, as @davidfowl points out, at the wrong layer.

Draft: #50074

Option: Implement symlink support in the runtime

As complicated as the kestrel code is, it gets to make a bunch of simplifying assumptions. In the general case, we'd have to think about globbing and how to get a new IFileProvider to pointed-to files outside the scope of the current IFileProvider.

The biggest risk here is that we're quite late in the 8.0 cycle, so there wouldn't be a lot of time to let this bake. With so many file systems to consider, there's quite a large test matrix and there's likely to be a bug tail as we discover special cases.

Proposal

Personally, I think the most pragmatic solution would be to force polling for precisely certificate files (i.e. continuing to respect the user's environment variable for other files, including appsettings.json) and dropping the mtime check. In fairness, I should point out that @aelij suggested this long ago.

As mentioned above, we probably want to start pulling from IFileProvider regardless of which approach we select and I think this would be worthwhile. Frankly, the biggest change is likely to be to the tests.

Thoughts?

@adityamandaleeka
Copy link
Member

Thanks for the clear summary @amcasey. I support your proposal for .NET 8.

Longer term, improving symlink support in the framework is likely the best option but it's too late in this release for that as you noted.

I'd also be interested in whether removing the mtime checks actually works well, but would want us to understand all the debouncing logic fully if we want to go that route with confidence.

@amcasey
Copy link
Member

amcasey commented Aug 21, 2023

There's little advantage to dropping the mtime check without also forcing polling since FileSystemWatcher won't resolve symlinks. I guess the chief advantage would be being able to respect the polling environment variables.

While we're talking about runtime improvements, it would be nice if the non-polling watcher just didn't send duplicate events. 😉

@jozkee
Copy link
Member

jozkee commented Aug 21, 2023

it would be nice if the non-polling watcher just didn't send duplicate events. 😉

dotnet/runtime#24079

@amcasey
Copy link
Member

amcasey commented Aug 21, 2023

Regardless of which path we take, I'll need to update the code to stop instantiating PhysicalFileWatcher directly

I made this change here. Unfortunately, I don't think we'll be able to take it for 8.0 because it will prevent us from using polling for specific files (the file watcher is shared by several consumers). Not having it is an abstraction violation, but fixing it doesn't fix the abstraction violation because we still pass the path directly to X509Certificate, rather than retrieving a stream from the file provider.

amcasey added a commit to amcasey/aspnetcore that referenced this issue Aug 22, 2023
It's common for certificates to be rotated using symlinks and `FileSystemWatcher` doesn't resolve symlinks.  Use polling, which does, but only for certificates since it has a CPU cost.

Fixes dotnet#32351
amcasey added a commit to amcasey/aspnetcore that referenced this issue Aug 22, 2023
It's common for certificates to be rotated using symlinks and `FileSystemWatcher` doesn't resolve symlinks.  Use polling, which does, but only for certificates since it has a CPU cost.

Fixes dotnet#32351
@amcasey
Copy link
Member

amcasey commented Aug 22, 2023

PR for polling and ignoring mtime: #50251

@amcasey
Copy link
Member

amcasey commented Aug 22, 2023

It took a bit of massaging, but I got @aelij's repro script working (he was naively taking for granted that I would know when and how to log in to azure 😆). With #50251, I'm seeing

trce: Microsoft.AspNetCore.Server.Kestrel.Core.Internal.CertificatePathWatcher[15]
      Flagged 1 observers of '/certs/cert1.crt' as changed.
...
info: Microsoft.AspNetCore.Server.Kestrel[0]
      Config changed. Stopping the following endpoints: 'https://*:5001'
info: Microsoft.AspNetCore.Server.Kestrel[0]
      Config changed. Starting the following endpoints: 'https://*:5001'

🥳

@amcasey
Copy link
Member

amcasey commented Aug 22, 2023

@aelij I notice that your repro has the certs in /certs, which isn't under the /app, the content root. Is that the way things are normally laid out or was that just simpler for a toy repro? I don't believe it will cause a problem, but it suggests that a "fix" like #50246 would break people.

@pinkfloydx33
Copy link

pinkfloydx33 commented Aug 23, 2023

I don't think there's "a usual". For example at my job by convention we put things into /app/config, /app/secrets and /app/certs. But it could've been anywhere. I know some folks who just mount to /config while the program still runs from /app.

IOW mounting k8s certs within the content root is certainly possible but is no way guaranteed. The mount point may be dictated by devops, and/or those responsible for managing configuration and might not even be a developer concern.

@aelij
Copy link
Contributor Author

aelij commented Aug 23, 2023

Yes, there should be no relation to the app root.

@amcasey
Copy link
Member

amcasey commented Aug 24, 2023

It's in! @aelij I've already validated using your sample project (thanks again!), but it would be great if you could confirm that an upcoming nightly works for you.

@amcasey
Copy link
Member

amcasey commented Aug 25, 2023

Not sure why merging #50251 didn't close this...

@amcasey amcasey closed this as completed Aug 25, 2023
@aelij
Copy link
Contributor Author

aelij commented Aug 27, 2023

I can confirm it's working. Thanks @amcasey!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-kestrel partner-impact
Projects
None yet
Development

Successfully merging a pull request may close this issue.