-
-
Notifications
You must be signed in to change notification settings - Fork 595
License cache not always reused #4273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We used to have a procedure to check and wait for concurrent re/building of the cache, but this code has long been removed as this was complex, brittle, slow, and error prone: not a happy combo! ScanCode release builds always come with a pre-built, built-in cache now, so this is not an issue anymore. If you are ever building an index yourself, you should do it in a single process with exclusive access to the index IMHO. |
Fix: aboutcode-org#4273 Signed-off-by: Markus Obendrauf <markus.obendrauf@tngtech.com>
Oh, thank you for the context! Is there an issue with the proposed fix in the PR? We don't have much control over how our stress tests are run, so we've needed to make these changes on our branch to fix failing tests. |
If the cache shouldn't be generated when using many threads, then why is it possible to do (as Markus demonstrated)? It would be better to fail very early in the process, unless scancode is run in "cacheless mode" (if there is such a thing). |
Fix: aboutcode-org#4273 Signed-off-by: Markus Obendrauf <markus.obendrauf@tngtech.com>
Fix: aboutcode-org#4273 Signed-off-by: Markus Obendrauf <markus.obendrauf@tngtech.com>
Fix: aboutcode-org#4273 Signed-off-by: Markus Obendrauf <markus.obendrauf@tngtech.com>
Description
The licensedcode cache is not always utilized when running multiple processes in parallel. This was noticed while running stress-tests on scancode. We observed that, when multiple tests were started in separate processes at the same time, each process would separately build its own cache instead of using the existing one. This had a considerable performance cost, eventually leading to a
LockTimeout
.The root cause is in licensedcode/cache.py: After a process obtains a lock, it should check if another thread has already built the cache, but it does not.
How To Reproduce
This was noticed when stress-testing a local test:
We ran this on 100 processes in parallel.
System configuration
The text was updated successfully, but these errors were encountered: