MLFlow issue: Every single run is marked as FINISHED never FAILED #20827
Closed
Unanswered
Saya47
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
Figured out my code had a bug, sorry if I took anyone's time, I've struggled too long with this until I posted this then after 12 hours realized the bug. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello good day to everybody.
I track my experiments using MLFLow. My issue is that even if my code has bugs and raises during run, Lightning marks all my runs as FINISHED in MLFLow. Below I asked an LLM to demonstrate this:
The run is marked with status 3 (FINISHED) on exceptions which is really bad.
I use the status to filter out bad runs because I use MLFLow to aggregate metrics/losses across epochs/runs and resume checkpoints.
Beta Was this translation helpful? Give feedback.
All reactions