-
Notifications
You must be signed in to change notification settings - Fork 580
Serialize/Deserialize implementations are very slow to compile #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting this! That is definitely substantially longer than expected. Looks like practically all of that time is spent in LLVM: $ cargo rustc --release -- -Z time-passes
time: 1.637; rss: 275MB translation
time: 0.000; rss: 275MB assert dep graph
time: 0.000; rss: 275MB serialize dep graph
time: 1.474; rss: 276MB llvm function passes [0]
time: 52.386; rss: 317MB llvm module passes [0]
time: 8.498; rss: 318MB codegen passes [0]
time: 0.001; rss: 318MB codegen passes [0]
time: 62.364; rss: 318MB LLVM passes
time: 0.000; rss: 318MB serialize work products
time: 0.247; rss: 170MB running linker
time: 0.248; rss: 170MB linking
Finished release [optimized] target(s) in 64.62 secs I will try to narrow down whether a specific pattern on our end is triggering slow behavior. |
On my computer the compile time behaves extremely predictably. It is uniformly 2.6 seconds per Deserialize impl that is instantiated. I was able to predict compile time to +/-0.1 seconds as I added and removed types in the AST. Nothing else matters—number of variants, recursive ADTs, size of types, etc. I think this is the same root cause as serde-rs/serde#286. There is no inherent reason it needs to be slow, we just need to work through and eliminate the unnecessary instantiations listed in serde-rs/serde#286 (comment). |
Here are the size (first column) and count (second column) of instantiations resulting from the following line. let _ = serde_json::from_str::<Pandoc>(""); Release mode
Debug mode
@Rufflewind would you be interested in helping us eliminate these? The number parsing code (parse_integer, parse_decimal, etc) are probably the easiest place to get started and would be super impactful. Those functions are all generic on the Visitor but do nothing with the Visitor until the number is fully parsed - so all of that should be factored out into a helper that can be instantiated once. |
Here’s an attempt: #314 I started off with the easiest cases where the visitor method being called is unconditionally known. Here’s the call graph of the original code for reference. (Hopefully I didn’t miss anything?) |
The next avenue I would look at would be reducing instantiation of Visitor methods. Currently in serde_json if you have a Deserialize impl that calls deserialize_struct, it gets forwarded to deserialize_any which means it instantiates every possible method on the Visitor. We know that structs can only be deserialized successfully from map and seq, so only those should need to be instantiated. And similar deal for almost all of the other Deserializer methods. We need to implement them explicitly in serde_json rather than forwarding, and have them instantiate only the relevant Visitor methods. It doesn't show up in the symbol profile above because each individual method of each individual visitor is instantiated only once, but there are so many of them that I expect it adds up. |
Assuming I understood correctly, I tried to add a specialization called --- src/de.rs
+++ src/de.rs
@@ -174,6 +174,32 @@ impl<'de, R: Read<'de>> Deserializer<R> {
}
}
+ fn parse_aggregate<V>(&mut self, visitor: V) -> Result<V::Value>
+ where
+ V: de::Visitor<'de>,
+ {
+ let value = self.parse_aggregate_inner(visitor);
+ match value {
+ Ok(value) => Ok(value),
+ Err(err) => Err(self.fix_error(err)),
+ }
+ }
+
+ fn parse_aggregate_inner<V>(&mut self, visitor: V) -> Result<V::Value>
+ where
+ V: de::Visitor<'de>,
+ {
+ let agg = match try!(self.parse_value_begin()) {
+ Value::Aggregate(agg) => agg,
+ _ => return Err(self.error(ErrorCode::ExpectedSomeValue)),
+ };
+ let ret = match agg {
+ Aggregate::Seq => visitor.visit_seq(SeqAccess::new(self)),
+ Aggregate::Map => visitor.visit_map(MapAccess::new(self)),
+ };
+ self.parse_value_end(agg, ret)
+ }
+
fn parse_value<V>(&mut self, visitor: V) -> Result<V::Value>
where
V: de::Visitor<'de>,
@@ -780,9 +806,22 @@ impl<'de, 'a, R: Read<'de>> de::Deserializer<'de> for &'a mut Deserializer<R> {
self.deserialize_bytes(visitor)
}
+ #[inline]
+ fn deserialize_struct<V>(
+ self,
+ name: &'static str,
+ fields: &'static [&'static str],
+ visitor: V
+ ) -> Result<V::Value>
+ where
+ V: de::Visitor<'de>,
+ {
+ self.parse_aggregate(visitor)
+ }
+
forward_to_deserialize_any! {
bool i8 i16 i32 i64 u8 u16 u32 u64 f32 f64 char str string unit
- unit_struct seq tuple tuple_struct map struct identifier ignored_any
+ unit_struct seq tuple tuple_struct map identifier ignored_any
}
}
|
Any update on what’s needed to solve this? Willing to help however needed, as this currently effects my workflow. |
Practically all the time is spent in LLVM passes, so we just need to find a way to make less work for LLVM. I made a little tool The first column is the total number of lines of LLVM IR, the second column is the number of copies of the function that were instantiated with different generic parameters. @Coding-Doctors maybe try this on your codebase and tackle whatever is resulting in the most IR.
|
Have you checked whether building with mir optimizations makes a difference? They run on generic code, instead of on the monomorphized versions |
Good idea! I tried on rustc 1.24.0-nightly (0a2e9ade8 2017-11-29):
with MIR opt level 3 and 4, with and without |
@oli-obk suggested in IRC looking at the generated MIR and filing suggestions for MIR optimizations that could have an impact.
(then look in |
I found a 20% improvement in #388. |
#389 was another 30% improvement. |
I tried factoring out the non-visitor parts of Lines 1480 to 1511 in 92ddbdf
Lines 1430 to 1459 in 92ddbdf
|
Thanks @Marwes. I tried those as well and saw the same thing. It would be worth looking into why that does not result in an improvement, as it is a fairly large chunk of logic that is instantiated quite a lot. Understanding why the current implementations compile fast or the refactored implementations compile slow would help prioritize what to look at next. |
I’m currently trying out the
pandoc_ast
crate, which contains a handful of recursive algebraic data types with theSerialize
andDeserialize
traits derived. When I tried the minimal example below with optimizations on (--release
), it took a whole minute to compile!I cloned the
pandoc_ast
repo and messed around a bit: it seems that the library itself takes practically no time to compile, but if I add a concrete function that instantiatesserde_json::from_str<Pandoc>
, the compile time spikes to 60 seconds or so. I'm not sure if this is caused byserde
,serde_derive
, orserde_json
.For the smaller nonrecursive enums (e.g.
pandoc_ast::MathType
) with only a few variants, it takes about 2-3 seconds to instantiate. The larger ones (e.g.pandoc_ast::Block
, which contains many variants as well aspandoc_ast::Inline
, which itself contains even more variants) take about 50 sec.Environment:
cargo (build|test) --release
The text was updated successfully, but these errors were encountered: