-
Notifications
You must be signed in to change notification settings - Fork 41.1k
Investigate shutdown delay option #20995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@philwebb actually pulled that from the boot docs 🤫 |
I guess I missed those doc updates. They must have happened when I was away so I'm glad I watched the webinar! |
i think this would be a great feature given that endpoints propagation and pod deletion happens asynchronously. + not all base images have a shell in order to run sleep in the the 2.3 gracefull shutdown support is great, but i believe in k8s we will have to delay the shutdown still, in order to to make sure that new requests will not end up in an endpoint from a pod that is already gone. |
We have an operational scenario that directly open a shell on the pod and restart the application. This allows faster restart of the app since it doesn't involve k8s pod lifecycle. In such case, k8s does not know the application is restarting and it still send traffic to the pod during shutdown. If Spring Boot provides sleep(delay) on its graceful shutdown logic, I can remove our sleep logic from our library, which is great. |
This would be a handy addition, as right now I always lose some metrics when the application shuts down because Prometheus hasn't had a chance to scrape the last 'meaningful' metrics before it's killed, causing a big (fake) dip in request rate for instance. If I could keep the application around for 30s while traffic is already being diverted to other instances, this would allow me to scrape those last metrics. |
Waiting between tripping the readiness probe to REFUSING_TRAFFIC and actually starting to refuse traffic would also be good for non-kubernetes environments with standard load-balancers that have checks (monitors) to determine whether an instance should be kept in the pool or not. Right now, even with graceful shutdown activated, the load balancer is going to send a few (or many, depending on the number of requests) to endpoints that are already shutting down thus making it impossible to deploy without downtime. |
@dimovelev Outside of a K8S environment, the expectation is that the load balancer is instructed to stop routing requests to the app first and that graceful shutdown of the application instance is then initiated. This should allow existing in-flights requests to complete while any new requests are routed to a different instance. |
I understand that ideally a real load balancer should not rely solely on the readiness endpoint check. However we may have architectures and environments where you may indeed have some components that have nothing else to be based on. Also, it is reasonable to assume that, components that keep checking the readiness endpoint will do that with some frequency (say some few seconds). Then, from a feature correctness point of view, the Therefore, for me, an optional delay config would make sense for more correctness of Spring Boot as a whole. I am assuming that currently, there is no safe way for someone to implement a custom delay in a Spring Boot application. As far as I understand, one option might be to implement a |
Hi. I would like to propose a possible solution for this. The problem here is that the app is entering the graceful shutdown, thus rejecting new requests, but the pod may still be in the load balancer for a bit. This means for a brief time new requests may still be forwarded to the pod, but the application server within will reject them. The I tested a simple solution for this and did some load tests with rolling restarts in the middle and stopped having issues (before I was always able to reproduce it). My solution to the issue was to wrap the Here is an example of the code I tried: @Slf4j
@Component("livenessStateHealthIndicator")
@Profile("!dev")
public class GracefulLivenessStateHealthIndicator extends LivenessStateHealthIndicator {
private boolean shuttingDown;
public GracefulLivenessStateHealthIndicator(ApplicationAvailability availability) {
super(availability);
}
@EventListener(ContextClosedEvent.class)
public void onContextClosedEvent() {
if (shuttingDown) {
return;
}
shuttingDown = true;
try {
log.info("Waiting 10s before starting shutdown");
SECONDS.sleep(10);
} catch (InterruptedException e) {
log.error("Wait before shutdown was interrupted", e);
}
log.info("Wait before shutdown finished");
}
@Override
public Health getHealth(boolean includeDetails) {
return shuttingDown ? Health.outOfService().build() : super.getHealth(includeDetails);
}
} I should make a few notes about it:
|
Hi @philwebb. Did you guys had any chance to analyse my proposal above? |
@nhmarujo I'm afraid we've not had a chance to revisit this one yet. |
Thanks @philwebb for the prompt answer. |
Hi. Any moves on this one? 😄 |
I'm afraid not. Currently most of our focus is on ahead-of-time code generation and support for Graal native. I'm sorry it's been so long :( |
Is ok, I understand. Worth asking anyway. Thanks for the feedback! Additional note - on my "proposal" on #20995 (comment) I used the wrong probe. I should have in fact extended @Slf4j
@Component("readinessStateHealthIndicator")
public class GracefulReadinessStateHealthIndicator extends ReadinessStateHealthIndicator {
private boolean shuttingDown;
public GracefulReadinessStateHealthIndicator(ApplicationAvailability availability) {
super(availability);
}
@EventListener
public void onContextClosedEvent(ContextClosedEvent event) {
if (!KUBERNETES.isActive(event.getApplicationContext().getEnvironment()) || shuttingDown) {
//Avoid running sleep if not inside k8s or if ContextClosedEvent was already received before
return;
}
shuttingDown = true;
try {
log.info("Readiness probe set as OUT_OF_SERVICE. Delay before commencing graceful shutdown initiated");
SECONDS.sleep(10);
} catch (InterruptedException e) {
log.error("Delay before commencing graceful shutdown interrupted", e);
}
log.info("Delay before commencing graceful shutdown finished");
}
@Override
public Health getHealth(boolean includeDetails) {
return shuttingDown ? Health.outOfService().build() : super.getHealth(includeDetails);
}
} ☝️ although the other version worked on my POC, this is more accurate as it is the readiness probe responsibility to control traffic redirection to the pods. All the remaining comments on the original post still apply. |
This is still an issue. |
@sigand can you please share your solution? Thanks |
|
I've just found this: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3960-pod-lifecycle-sleep-action/README.md If I understand it correctly then latest k8s(1.32) already has sleep We're still on 1.30, so can't verify it at the moment. |
#43830 documented that lifecycle:
preStop:
sleep:
seconds: 10 No need to have a shell / |
For folks that aren't able to upgrade to Kubernetes 1.32, I humbly offer upmc-enterprises-graceful-shutdown-spring-boot-starter (an open source library I maintain) which adds an Actuator endpoint ( |
The webinar presented by @ryanjbaxter has an interesting bit of configuration to delay shutdown using a
preStop
command. This article has some interesting background.We might be able to offer a similar feature out-of-the-box in Boot and configure it automatically.
The text was updated successfully, but these errors were encountered: