Skip to content

RUST-871 Support direct serialization to BSON bytes #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 28, 2021

Conversation

patrickfreed
Copy link
Contributor

@patrickfreed patrickfreed commented Jul 22, 2021

RUST-871

This PR introduces the to_writer and to_vec methods, which can serialize a generic T directly to BSON bytes without going through Document. This should enable substantial performance improvements in the driver, as well as be generally useful in the BSON library.

As part of testing this feature, I enabled decimal128 serde testing on evergreen (RUST-446). decimal128 tests were originally added to evergreen in #279.

@@ -92,16 +92,6 @@ functions:
${PREPARE_SHELL}
.evergreen/run-tests-u2i.sh

"run serde tests":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than the serde tests being their own task in evergreen, I updated each of the other tasks to run it too with their associated feature flags on.

@@ -83,6 +89,24 @@ fn write_f128<W: Write + ?Sized>(writer: &mut W, val: Decimal128) -> Result<()>
writer.write_all(&raw).map_err(From::from)
}

#[inline]
fn write_binary<W: Write>(mut writer: W, bytes: &[u8], subtype: BinarySubtype) -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just copied from below

@@ -0,0 +1,458 @@
mod document_serializer;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend starting the review in this file and then branching out to the others as you encounter them here.

}

fn serialize_struct(self, name: &'static str, _len: usize) -> Result<Self::SerializeStruct> {
let value_type = match name {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BSON types which don't fit into serde's data model (e.g. ObjectId) are serialized as structs matching the extended JSON format for that type. The ValueSerializer is then in charge of interpreting the extended JSON format and producing actual BSON from it. The regular serializer works this way too, but it goes from Bson to Bson rather than serde data model -> BSON.

The "name" of the struct is used to incidate which BSON type, and the values are basically just the BSON type's extended JSON wrapper name. The only divergence is we also use $codeWithScope to differentiate from just a regular $code with no scope.

If the name doesn't match any of the special strings, the struct will just be serialized as a document.

@@ -698,6 +698,20 @@ impl Bson {
}
}

["$numberDecimalBytes"] => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintuitively, this function is used in the existing serializer to convert between the intermediate serialized form (i.e. and "extended document") and the final serialized form. Now that decimal128 can serialize this way, this needed to be updated to interpret it.


/// A serializer used specifically for serializing the serde-data-model form of a BSON type (e.g.
/// `Binary`) to raw bytes.
pub(crate) struct ValueSerializer<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is used to interpret any extended JSON formatted types in serde's data model and serialize them to actual BSON. It keeps track of where it's at via a state machine.

type Ok = ();
type Error = Error;

fn serialize_field<T: ?Sized>(&mut self, key: &'static str, value: &T) -> Result<()>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method handles all the state transitions.

@@ -551,8 +607,12 @@ impl Serialize for Decimal128 {
where
S: ser::Serializer,
{
let value = Bson::Decimal128(self.clone());
value.serialize(serializer)
let mut state = serializer.serialize_struct("$numberDecimal", 1)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the decimal128 flag was enabled, this used to produce actual extended JSON format (i.e. { "$numberDecimal": <decimal as a string> }), but per the bug we discussed in triage, the current implementation's to_string is lossy. For that reason, it seems safer to produce this unique format that does round trip properly until we have a complete decimal128 implementation.

For users relying on the output of this, their code will break, but that's probably better than them silently losing data.
Also note that we classify the decimal128 feature flag as "experimental" and subject to breaking changes, so this is within our contract. Code that omits the flag will not break.

@@ -61,63 +61,95 @@ fn run_test(test: TestFile) {

let canonical_bson = hex::decode(&valid.canonical_bson).expect(&description);

let bson_to_native_cb =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and updated the naming scheme of the different things here. Let me know if this makes sense. I know the old one got the point of being unreadable to me at least, so hopefully this is better.

@patrickfreed patrickfreed marked this pull request as ready for review July 22, 2021 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants