Skip to content

Optimize String length computation. #1685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vbabanin
Copy link
Member

JAVA-5842
Copy link
Collaborator

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just a few minor changes requests.

/**
* Detects the position of the first NULL (0x00) byte in a 64-bit word using SWAR technique.
* <a href="https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord">
*/
private int computeCStringLength(final int prevPos) {
ensureOpen();
Copy link
Collaborator

@jyemin jyemin Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this check, as it's a private method and duplicates a check from the public method that calls it?


result:
00000000 00000000 10000000 00000000 00000000 00000000 00000000 00000000
^^^^^^^^
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
^^^^^^^^
^^^^^^^^

result:
00000000 00000000 10000000 00000000 00000000 00000000 00000000 00000000
^^^^^^^^
The high bit is set only at the 0x00 byte position.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The high bit is set only at the 0x00 byte position.
The high bit is set only at the 0x00 byte position.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you meant to center this...

@@ -182,11 +182,58 @@ public void skipCString() {
buffer.position(pos + length);
}

/**
* Detects the position of the first NULL (0x00) byte in a 64-bit word using SWAR technique.
* <a href="https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have reason to believe that this link will still be around in, say, 5 years? If not, maybe just go with Wikipedia.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems Wiki URL like https://en.wikipedia.org/wiki/SWAR is better

pos += 8;
}

// Process remaining bytes one-by-one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems one-by-one is more commonly used as one by one

long word = buffer.getLong(pos);
/*
Subtract 0x0101010101010101L to cause a borrow on 0x00 bytes.
if original byte is 00000000 -> 00000000 - 00000001 = 11111111 (borrow causes high bit set to 1).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional. A little bit less clear than

if original byte is 00000000, then 00000000 - 00000001 = 11111111 (borrow causes high bit set to 1).

result:
00000000 00000000 10000000 00000000 00000000 00000000 00000000 00000000
^^^^^^^^
The high bit is set only at the 0x00 byte position.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional. the above comment seems duplicated and could be deleted

00000000 00000000 11111111 00000000 00000001 00000001 00000000 00000111

ANDing mask with 0x8080808080808080 isolates the high bit (0x80) in positions where
the original byte was 0x00, setting the high bit to 1 only at the 0x00 byte position.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional. maybe the following verbiage is little bit clearer:

, by setting the high bit to 1 only at the 0x00 byte position.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants