Skip to content

Fix performance bug with large number of unnamed parameters #2050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tttomat19
Copy link

Preamble:
Consider the following sql script: select some from table where id in (:args)
Where args is a 10k items list
It does not look good, but people do it anyways, or maybe just when the system scales up, and it slowly grows from 100 items to 10k items

The problem
In such conditions QueryMapper.getUniqueName needs to figure out the names for each of the elements in the in (:a1,:a2,:a3, :a10000) list, it does so by generating a name by counter and tries to lookup the name in the parameters hashmap
But if the argument list is so large it traverses the map over and over again
Ultimately it results in O(N^2) complexity (where operation behind is a hashmap lookup)
The symptom is a very high CPU consumption, and it is really slow

The fix
The suggested fix is just to skip most of the counter-hashmap traversal, just make the counter go forward
We still need the loop to guarantee generated names are unique

More on the problem
While 10k case is just very obvious to see and detect but also
My guess that likely a lot of code out there, having 50+ of arguments invisibly suffer and the fix should probably help them too

I'm sorry about no tests, the problem here that the nature of the bug is whether it works slow, I'm not sure if there are any load tests or something for this repo

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label May 1, 2025
tttomat19 added 4 commits May 1, 2025 03:46
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner

Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner

Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner

Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner

Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
@tttomat19 tttomat19 force-pushed the feature/fix-cpu-in-large-list branch from ff376cc to 7c609c4 Compare May 1, 2025 00:46
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner

Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
@mp911de mp911de added type: task A general task and removed status: waiting-for-triage An issue we've not yet triaged labels May 5, 2025
@schauder
Copy link
Contributor

schauder commented May 6, 2025

I don't think your proposed change fixes the problem.

The problem is if you have a large array of arguments, for the n-th argument, you are still looping over n-1 potential names, before finding a unique one.

In order to avoid that you'd need to preserve the state of name generation, between invocation.

If you can provide an actual fix, please amend the PR.

@schauder schauder added the status: waiting-for-feedback We need additional information before we can continue label May 6, 2025
@tttomat19
Copy link
Author

tttomat19 commented May 6, 2025

Hi, thank you for reply!

The state of name generation is already somewhat preserved in params hashmap in the outher method calls
After each time getUniqueName is called, the values (unmodifiable view of the params hashmap) size gets increased (it gets new wrapped view each time it enters getUniqueName)

I can see some possible case when the suggested approach will not work, but it requires actively trying to attack this code by somehow injecting sql param names with numbers before.
In most other cases this should make better, I believe
Would it be sufficient if I could demonstrate it works in common case? (I mean not the text below, but maybe like some actual tracing)

Consider the example: select * from t where id in (:args) where args is 5 in size
Current version tries names (0-indexed correspondence with args array)
0 - test for presence - args - ok
1 - test for presence - args1 - ok
2 - test for presence - args1 - nok - args2 - ok
3 - test for presence - args1 - nok - args2 - nok - args3 - ok
4 - test for presence - args1 - nok - args2 - nok - args3 - nok - args4 - ok
so it's kinda triangular/half-square shape, thus N^2

New version works the other way:
0 - test for presence - get size = 0 - args0 - ok
1 - test for presence - get size=1 - args1 - ok
2 - test for presence - get size=2 - args2 - ok
3 - test for presence - get size=3 - args3 - ok
4 - test for presence - get size=4 - args4 - ok

also it seems version should work for multiple arg arrays ok

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: feedback-provided Feedback has been provided type: task A general task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants