-
Notifications
You must be signed in to change notification settings - Fork 358
Fix performance bug with large number of unnamed parameters #2050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix performance bug with large number of unnamed parameters #2050
Conversation
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
ff376cc
to
7c609c4
Compare
On some occasions where col in (:args) contain a really lot args, 10k+ for instance, this commit fixes a performance (high CPU) bug by NOT traversing the whole map in basically O(n^2) manner Signed-off-by: Mikhail Fedorov <mfedorov761@gmail.com>
I don't think your proposed change fixes the problem. The problem is if you have a large array of arguments, for the n-th argument, you are still looping over n-1 potential names, before finding a unique one. In order to avoid that you'd need to preserve the state of name generation, between invocation. If you can provide an actual fix, please amend the PR. |
Hi, thank you for reply! The state of name generation is already somewhat preserved in params hashmap in the outher method calls I can see some possible case when the suggested approach will not work, but it requires actively trying to attack this code by somehow injecting sql param names with numbers before. Consider the example: New version works the other way: also it seems version should work for multiple arg arrays ok |
Preamble:
Consider the following sql script:
select some from table where id in (:args)
Where args is a 10k items list
It does not look good, but people do it anyways, or maybe just when the system scales up, and it slowly grows from 100 items to 10k items
The problem
In such conditions QueryMapper.getUniqueName needs to figure out the names for each of the elements in the
in (:a1,:a2,:a3, :a10000)
list, it does so by generating a name by counter and tries to lookup the name in the parameters hashmapBut if the argument list is so large it traverses the map over and over again
Ultimately it results in O(N^2) complexity (where operation behind is a hashmap lookup)
The symptom is a very high CPU consumption, and it is really slow
The fix
The suggested fix is just to skip most of the counter-hashmap traversal, just make the counter go forward
We still need the loop to guarantee generated names are unique
More on the problem
While 10k case is just very obvious to see and detect but also
My guess that likely a lot of code out there, having 50+ of arguments invisibly suffer and the fix should probably help them too
I'm sorry about no tests, the problem here that the nature of the bug is whether it works slow, I'm not sure if there are any load tests or something for this repo