[Serialization] Delta-encode consecutive SourceLocations in TypeLoc
Much of the size of PCH/PCM files comes from stored SourceLocations. These are encoded using (almost) their raw value, VBR-encoded. Absolute SourceLocations can be relatively large numbers, so this commonly takes 20-30 bits per location. We can reduce this by exploiting redundancy: many "nearby" SourceLocations are stored differing only slightly and can be delta-encoded. Randam-access loading of AST nodes constrains how long these sequences can be, but we can do it at least within a node that always gets deserialized as an atomic unit. TypeLoc is implemented in this patch as it's a relatively small change that shows most of the API. This saves ~3.5% of PCH size, I have local changes applying this technique further that save another 3%, I think it's possible to get to 10% total. Differential Revision: https://reviews.llvm.org/D125403
Loading
Please sign in to comment