Serialize/Deserialize implementations are very slow to compile #313

Rufflewind · 2017-05-02T23:24:27Z

I’m currently trying out the pandoc_ast crate, which contains a handful of recursive algebraic data types with the Serialize and Deserialize traits derived. When I tried the minimal example below with optimizations on (--release), it took a whole minute to compile!

extern crate pandoc_ast;
fn main() {
    pandoc_ast::filter(String::new(), |p| p);
}

I cloned the pandoc_ast repo and messed around a bit: it seems that the library itself takes practically no time to compile, but if I add a concrete function that instantiates serde_json::from_str<Pandoc>, the compile time spikes to 60 seconds or so. I'm not sure if this is caused by serde, serde_derive, or serde_json.

For the smaller nonrecursive enums (e.g. pandoc_ast::MathType) with only a few variants, it takes about 2-3 seconds to instantiate. The larger ones (e.g. pandoc_ast::Block, which contains many variants as well as pandoc_ast::Inline, which itself contains even more variants) take about 50 sec.

Environment:

rustc 1.19.0-nightly (777ee2079 2017-05-01)
Linux 4.10.13
incremental compilation was not used
cargo (build|test) --release

The text was updated successfully, but these errors were encountered:

dtolnay · 2017-05-03T00:39:19Z

Thanks for reporting this! That is definitely substantially longer than expected. Looks like practically all of that time is spent in LLVM:

$ cargo rustc --release -- -Z time-passes

  time: 1.637; rss: 275MB	translation
  time: 0.000; rss: 275MB	assert dep graph
  time: 0.000; rss: 275MB	serialize dep graph
    time: 1.474; rss: 276MB	llvm function passes [0]
    time: 52.386; rss: 317MB	llvm module passes [0]
    time: 8.498; rss: 318MB	codegen passes [0]
    time: 0.001; rss: 318MB	codegen passes [0]
  time: 62.364; rss: 318MB	LLVM passes
  time: 0.000; rss: 318MB	serialize work products
    time: 0.247; rss: 170MB	running linker
  time: 0.248; rss: 170MB	linking
      Finished release [optimized] target(s) in 64.62 secs

I will try to narrow down whether a specific pattern on our end is triggering slow behavior.

dtolnay · 2017-05-03T01:33:56Z

On my computer the compile time behaves extremely predictably. It is uniformly 2.6 seconds per Deserialize impl that is instantiated. I was able to predict compile time to +/-0.1 seconds as I added and removed types in the AST. Nothing else matters—number of variants, recursive ADTs, size of types, etc.

I think this is the same root cause as serde-rs/serde#286. There is no inherent reason it needs to be slow, we just need to work through and eliminate the unnecessary instantiations listed in serde-rs/serde#286 (comment).

dtolnay · 2017-05-03T01:51:03Z

Here are the size (first column) and count (second column) of instantiations resulting from the following line.

let _ = serde_json::from_str::<Pandoc>("");

Release mode

  356849   39  <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_any
  160228   27  <serde_json::de::Deserializer<R>>::parse_value
   56303   12  <core::marker::PhantomData<T> as serde::de::DeserializeSeed<'de>>::deserialize
   44933    2  <testing::_IMPL_DESERIALIZE_FOR_Inline::<impl serde::de::Deserialize<'de> for testing::Inline>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
   42779    7  <&'a mut A as serde::de::SeqAccess<'de>>::next_element
   34129    2  <serde::de::impls::TupleVisitor3<T0, T1, T2> as serde::de::Visitor<'de>>::visit_seq
   34097    5  <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_tuple
   27098   65  <serde_json::de::Deserializer<R>>::parse_integer
   25895    2  <testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
   20787    2  <testing::_IMPL_DESERIALIZE_FOR_Citation::<impl serde::de::Deserialize<'de> for testing::Citation>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   20175    1  <<testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   19642   65  <serde_json::de::Deserializer<R>>::parse_exponent
   18695   59  <serde_json::de::Deserializer<R>>::parse_number
   17630   65  <serde_json::de::Deserializer<R>>::parse_decimal
   17287   65  <serde_json::de::Deserializer<R>>::parse_exponent_overflow
   16642   61  <serde_json::de::Deserializer<R>>::visit_f64_from_parts

Debug mode

  365748   65  <serde_json::de::Deserializer<R>>::parse_value
  340968   75  <serde::private::de::content::ContentDeserializer<E> as serde::de::Deserializer<'de>>::deserialize_any
  132646   65  <serde_json::de::Deserializer<R>>::parse_integer
  129274   65  <serde_json::de::Deserializer<R>>::parse_exponent
  125120   65  <serde_json::de::Deserializer<R>>::parse_decimal
   54869   65  <serde_json::de::Deserializer<R>>::visit_f64_from_parts
   53625   65  <serde_json::de::Deserializer<R>>::parse_number
   51207   38  <serde_json::de::SeqAccess<'a, R> as serde::de::SeqAccess<'de>>::next_element_seed
   50035   79  <core::option::Option<T>>::map
   49701   65  <serde_json::de::Deserializer<R>>::parse_long_integer
   48402   65  <serde_json::de::Deserializer<R>>::parse_exponent_overflow
   40370   82  <core::result::Result<T, E>>::map
   36836   16  <<testing::_IMPL_DESERIALIZE_FOR_Inline::<impl serde::de::Deserialize<'de> for testing::Inline>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   35970   12  <serde::private::de::content::TaggedContentVisitor<T> as serde::de::Visitor<'de>>::visit_map
   35250   75  <serde::de::value::MapDeserializer<'de, I, E>>::end
   31802   12  <<testing::_IMPL_DESERIALIZE_FOR_Block::<impl serde::de::Deserialize<'de> for testing::Block>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq

@Rufflewind would you be interested in helping us eliminate these? The number parsing code (parse_integer, parse_decimal, etc) are probably the easiest place to get started and would be super impactful. Those functions are all generic on the Visitor but do nothing with the Visitor until the number is fully parsed - so all of that should be factored out into a helper that can be instantiated once.

Rufflewind · 2017-05-03T06:36:55Z

Here’s an attempt: #314

I started off with the easiest cases where the visitor method being called is unconditionally known.

Here’s the call graph of the original code for reference. (Hopefully I didn’t miss anything?)

dtolnay · 2017-05-03T16:45:34Z

The next avenue I would look at would be reducing instantiation of Visitor methods. Currently in serde_json if you have a Deserialize impl that calls deserialize_struct, it gets forwarded to deserialize_any which means it instantiates every possible method on the Visitor. We know that structs can only be deserialized successfully from map and seq, so only those should need to be instantiated. And similar deal for almost all of the other Deserializer methods. We need to implement them explicitly in serde_json rather than forwarding, and have them instantiate only the relevant Visitor methods.

It doesn't show up in the symbol profile above because each individual method of each individual visitor is instantiated only once, but there are so many of them that I expect it adds up.

Rufflewind · 2017-05-05T04:13:26Z

Assuming I understood correctly, I tried to add a specialization called parse_aggregate which only calls visit_{seq,map} (see patch below). However, I didn’t notice any performance changes with this change (neither time nor memory usage).

--- src/de.rs
+++ src/de.rs
@@ -174,6 +174,32 @@ impl<'de, R: Read<'de>> Deserializer<R> {
         }
     }
 
+    fn parse_aggregate<V>(&mut self, visitor: V) -> Result<V::Value>
+    where
+        V: de::Visitor<'de>,
+    {
+        let value = self.parse_aggregate_inner(visitor);
+        match value {
+            Ok(value) => Ok(value),
+            Err(err) => Err(self.fix_error(err)),
+        }
+    }
+
+    fn parse_aggregate_inner<V>(&mut self, visitor: V) -> Result<V::Value>
+    where
+        V: de::Visitor<'de>,
+    {
+        let agg = match try!(self.parse_value_begin()) {
+            Value::Aggregate(agg) => agg,
+            _ => return Err(self.error(ErrorCode::ExpectedSomeValue)),
+        };
+        let ret = match agg {
+            Aggregate::Seq => visitor.visit_seq(SeqAccess::new(self)),
+            Aggregate::Map => visitor.visit_map(MapAccess::new(self)),
+        };
+        self.parse_value_end(agg, ret)
+    }
+
     fn parse_value<V>(&mut self, visitor: V) -> Result<V::Value>
     where
         V: de::Visitor<'de>,
@@ -780,9 +806,22 @@ impl<'de, 'a, R: Read<'de>> de::Deserializer<'de> for &'a mut Deserializer<R> {
         self.deserialize_bytes(visitor)
     }
 
+    #[inline]
+    fn deserialize_struct<V>(
+        self,
+        name: &'static str,
+        fields: &'static [&'static str],
+        visitor: V
+    ) -> Result<V::Value>
+    where
+        V: de::Visitor<'de>,
+    {
+        self.parse_aggregate(visitor)
+    }
+
     forward_to_deserialize_any! {
         bool i8 i16 i32 i64 u8 u16 u32 u64 f32 f64 char str string unit
-        unit_struct seq tuple tuple_struct map struct identifier ignored_any
+        unit_struct seq tuple tuple_struct map identifier ignored_any
     }
 }

Arzte · 2017-11-23T00:15:45Z

Any update on what’s needed to solve this? Willing to help however needed, as this currently effects my workflow.

dtolnay · 2017-11-30T08:45:07Z

Practically all the time is spent in LLVM passes, so we just need to find a way to make less work for LLVM. I made a little tool cargo llvm-lines to add up the number of lines of LLVM IR by function. I don't know whether this is the best way to prioritize what needs to be fixed, but it should get us started. Here is what it has to say about the json-benchmark repo.

The first column is the total number of lines of LLVM IR, the second column is the number of copies of the function that were instantiated with different generic parameters. @Coding-Doctors maybe try this on your codebase and tackle whatever is resulting in the most IR.

$ cargo llvm-lines --bin json-benchmark --no-default-features --features parse-struct,stringify-struct,all-files,lib-serde | head -30

 102456   84  <serde_json::de::Deserializer<R>>::parse_value
  24834   63  <serde_json::de::SeqAccess<'a, R> as serde::de::SeqAccess<'de>>::next_element_seed
  10044   54  <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeMap>::serialize_value
   8718   26  <serde_json::de::MapAccess<'a, R> as serde::de::MapAccess<'de>>::next_key_seed
   8029    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::User>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   7728   84  serde_json::error::Error::fix_position
   7462  395  core::ptr::drop_in_place
   6597   59  <core::result::Result<T, E>>::map
   5205    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::User>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   4839    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Status>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   4574   52  <serde_json::de::MapAccess<'a, R> as serde::de::MapAccess<'de>>::next_value_seed
   4131   51  <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeStruct>::serialize_field
   3807   13  serde::ser::Serializer::collect_seq
   3399   83  serde::de::Visitor::visit_bool
   3389    1  <json_benchmark::copy::citm_catalog::_IMPL_DESERIALIZE_FOR_CitmCatalog::<impl serde::de::Deserialize<'de> for json_benchmark::copy::citm_catalog::CitmCatalog>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   3286   84  serde_json::de::Number::visit
   3185    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Status>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   3150   83  serde::de::Visitor::visit_f64
   3092   17  <alloc::raw_vec::RawVec<T, A>>::double
   2960   78  serde::de::Visitor::visit_i64
   2910   15  <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeSeq>::serialize_element
   2895    1  <json_benchmark::copy::twitter::_IMPL_DESERIALIZE_FOR_Media::<impl serde::de::Deserialize<'de> for json_benchmark::copy::twitter::Media>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   2654   83  serde::de::Visitor::visit_unit
   2636   13  <serde::de::impls::<impl serde::de::Deserialize<'de> for alloc::vec::Vec<T>>::deserialize::SeqVisitor<T> as serde::de::Visitor<'de>>::visit_seq
   2627   29  <core::option::Option<T>>::map
   2440   52  serde::de::Visitor::visit_str
   2348   56  serde::de::Visitor::visit_map
   2072    1  <json_benchmark::copy::citm_catalog::_IMPL_DESERIALIZE_FOR_CitmCatalog::<impl serde::de::Deserialize<'de> for json_benchmark::copy::citm_catalog::CitmCatalog>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_seq
   2048   54  serde::de::Visitor::visit_u64
   1969    1  json_benchmark::copy::twitter::_IMPL_SERIALIZE_FOR_User::<impl serde::ser::Serialize for json_benchmark::copy::twitter::User>::serialize

oli-obk · 2017-11-30T08:58:31Z

Have you checked whether building with mir optimizations makes a difference? They run on generic code, instead of on the monomorphized versions

dtolnay · 2017-11-30T09:22:44Z

Good idea! I tried on rustc 1.24.0-nightly (0a2e9ade8 2017-11-29):

cargo rustc ... -- -Zmir-opt-level=3

with MIR opt level 3 and 4, with and without --release, and it did not seem to help.

dtolnay · 2017-11-30T09:22:53Z

@oli-obk suggested in IRC looking at the generated MIR and filing suggestions for MIR optimizations that could have an impact.

cargo rustc ... -- --emit=mir

(then look in target/release/deps/my_crate-111111hash111111.mir)

dtolnay · 2017-12-04T00:35:38Z

I found a 20% improvement in #388.

dtolnay · 2017-12-04T06:11:09Z

#389 was another 30% improvement.

Marwes · 2017-12-04T18:03:59Z

I tried factoring out the non-visitor parts of

json/src/de.rs

Lines 1480 to 1511 in 92ddbdf

    
           fn next_key_seed<K>(&mut self, seed: K) -> Result<Option<K::Value>> 
        
           where 
        
               K: de::DeserializeSeed<'de>, 
        
           { 
        
               let peek = match try!(self.de.parse_whitespace()) { 
        
                   Some(b'}') => { 
        
                       return Ok(None); 
        
                   } 
        
                   Some(b',') if !self.first => { 
        
                       self.de.eat_char(); 
        
                       try!(self.de.parse_whitespace()) 
        
                   } 
        
                   Some(b) => { 
        
                       if self.first { 
        
                           self.first = false; 
        
                           Some(b) 
        
                       } else { 
        
                           return Err(self.de.peek_error(ErrorCode::ExpectedObjectCommaOrEnd)); 
        
                       } 
        
                   } 
        
                   None => { 
        
                       return Err(self.de.peek_error(ErrorCode::EofWhileParsingObject)); 
        
                   } 
        
               }; 
        
               match peek { 
        
                   Some(b'"') => seed.deserialize(MapKey { de: &mut *self.de }).map(Some), 
        
                   Some(b'}') => Err(self.de.peek_error(ErrorCode::TrailingComma)), 
        
                   Some(_) => Err(self.de.peek_error(ErrorCode::KeyMustBeAString)), 
        
                   None => Err(self.de.peek_error(ErrorCode::EofWhileParsingValue)), 
        
               } 
        
           }

and

json/src/de.rs

Lines 1430 to 1459 in 92ddbdf

    
           fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>> 
        
           where 
        
               T: de::DeserializeSeed<'de>, 
        
           { 
        
               let peek = match try!(self.de.parse_whitespace()) { 
        
                   Some(b']') => { 
        
                       return Ok(None); 
        
                   } 
        
                   Some(b',') if !self.first => { 
        
                       self.de.eat_char(); 
        
                       try!(self.de.parse_whitespace()) 
        
                   } 
        
                   Some(b) => { 
        
                       if self.first { 
        
                           self.first = false; 
        
                           Some(b) 
        
                       } else { 
        
                           return Err(self.de.peek_error(ErrorCode::ExpectedListCommaOrEnd)); 
        
                       } 
        
                   } 
        
                   None => { 
        
                       return Err(self.de.peek_error(ErrorCode::EofWhileParsingList)); 
        
                   } 
        
               }; 
        
               match peek { 
        
                   Some(b']') => Err(self.de.peek_error(ErrorCode::TrailingComma)), 
        
                   Some(_) => Ok(Some(try!(seed.deserialize(&mut *self.de)))), 
        
                   None => Err(self.de.peek_error(ErrorCode::EofWhileParsingValue)), 
        
               }

but did not see any improvements.

dtolnay · 2017-12-04T18:11:37Z

Thanks @Marwes. I tried those as well and saw the same thing. It would be worth looking into why that does not result in an improvement, as it is a fairly large chunk of logic that is instantiated quite a lot. Understanding why the current implementations compile fast or the refactored implementations compile slow would help prioritize what to look at next.

Rufflewind mentioned this issue May 2, 2017

Pre-compile filter to improve compile speed oli-obk/pandoc-ast#1

Merged

dtolnay added the performance label May 3, 2017

dtolnay mentioned this issue May 3, 2017

Migration to serde from rustc-serialize causes binary to grow significantly serde-rs/serde#286

Open

Rufflewind mentioned this issue May 3, 2017

Reduce visitor-dependence in number parsers #314

Merged

indiv0 mentioned this issue Nov 20, 2017

The return of build slowness (OSX) rusoto/rusoto#774

Closed

This was referenced Nov 30, 2017

Stop instantiating fix_position so many times #386

Closed

Add and derive deserialize_from serde-rs/serde#1094

Merged

Stop instantiating parse_value in MapKey::deserialize_any #388

Merged

This was referenced Dec 4, 2017

Implement all the deserializer hints #389

Merged

Override the try macro to eliminate From calls #390

Merged

This was referenced Jun 30, 2020

Optimize compile time of Deserializer trait methods #687

Closed

refactor: Factor out whitespace skipping into helpers #690

Open

indiv0 mentioned this issue May 15, 2022

Improving Compile Times And/Or Add Pluggable Deserializable Types awslabs/aws-lambda-rust-runtime#481

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize/Deserialize implementations are very slow to compile #313

Serialize/Deserialize implementations are very slow to compile #313

Rufflewind commented May 2, 2017 •

edited

Loading

dtolnay commented May 3, 2017

dtolnay commented May 3, 2017

dtolnay commented May 3, 2017

Rufflewind commented May 3, 2017

dtolnay commented May 3, 2017

Rufflewind commented May 5, 2017

Arzte commented Nov 23, 2017 •

edited

Loading

dtolnay commented Nov 30, 2017

oli-obk commented Nov 30, 2017

dtolnay commented Nov 30, 2017

dtolnay commented Nov 30, 2017

dtolnay commented Dec 4, 2017

dtolnay commented Dec 4, 2017

Marwes commented Dec 4, 2017

dtolnay commented Dec 4, 2017

Serialize/Deserialize implementations are very slow to compile #313

Serialize/Deserialize implementations are very slow to compile #313

Comments

Rufflewind commented May 2, 2017 • edited Loading

dtolnay commented May 3, 2017

dtolnay commented May 3, 2017

dtolnay commented May 3, 2017

Release mode

Debug mode

Rufflewind commented May 3, 2017

dtolnay commented May 3, 2017

Rufflewind commented May 5, 2017

Arzte commented Nov 23, 2017 • edited Loading

dtolnay commented Nov 30, 2017

oli-obk commented Nov 30, 2017

dtolnay commented Nov 30, 2017

dtolnay commented Nov 30, 2017

dtolnay commented Dec 4, 2017

dtolnay commented Dec 4, 2017

Marwes commented Dec 4, 2017

dtolnay commented Dec 4, 2017

Rufflewind commented May 2, 2017 •

edited

Loading

Arzte commented Nov 23, 2017 •

edited

Loading