Use boxed slices instead of Vec where appropriate
#5,879 opened on Feb 24, 2023
Description
Feature
When we store a Vec<T> inside another data structure, but never change its length, we can instead store a Box<[T]>.
Benefit
The size of Vec in memory is three machine words, but a boxed slice is only two machine words long. This doesn't really matter if the array is only stored on the stack. But if it's part of another data structure, switching to boxed slices can reduce the amount of heap memory we allocate. That may improve cache locality and have other performance benefits.
Implementation
I did a quick experiment along these lines several months ago in commit 68d2ac2a1471a096d021be7f2cb2734843648ab8. That commit has merge conflicts with main now, but it illustrates the basics of how to make this change for one type, Vec<ArgPair>. I suspect there are many other uses of Vec in Cranelift that could have equally simple changes. There may be some in Wasmtime as well, though I'm personally focused on cases in Cranelift.
Pull requests addressing this issue should ideally include some measurements showing what effect the change has, because it's theoretically possible that this could make performance worse. I think it's more likely that there won't be much visible effect at all, but these changes shouldn't make the implementation any more complicated and could potentially improve performance. We won't know unless we try it and measure the results.
There are two kinds of measurements that would be great to do, with Wasmtime built in --release mode.
-
Run
valgrind --tool=dhat target/release/wasmtime compileon some WebAssembly module, both with the version ofmainyou started from and with your patched version. I like to use our pulldown-cmark benchmark program for this because it's not too small but not too big, so even with the overhead of DHAT's instrumentation it doesn't take too long. -
Run our Sightglass benchmark suite to compare the speed of compiling some benchmark programs with and without your changes. See the instructions in that project's README. For testing this issue, the
--stop-after compilationoption is useful; these changes should have no effect on the generated code or execution time, so you don't need to measure those.
Alternatives
It's fine to keep using Vec.
There might be cases where either SmallVec, or our ListPool type from cranelift-entity, are more appropriate than a boxed slice. SmallVec is an especially good choice if most of the time the array would fit in about two machine words, but it occasionally needs to be bigger. ListPool may be a good choice if we have a lot of the same type of array allocated at the same time, and the array elements implement EntityRef, which generally means they're 32-bit indices into an array somewhere else.