描述
We spend a noticeable amount of time (in both Ruff and Red Knot) parsing, and a noticeable subset of that is in malloc. Our current AST representation is just a straight tree of nodes, which can put a lot of pressure on the memory allocator and isn't very cache-friendly when we go to consume the AST later.
In https://github.com/astral-sh/ruff/pull/12419, @MichaReiser explored using bumpalo. This would reduce malloc overhead, since all AST nodes would be allocated from the bump allocator. But the data structure itself was still directly storing the tree structure. In https://github.com/astral-sh/ruff/pull/12419#issuecomment-2262727929, Micha referenced this blog post about Zig's data-oriented "struct of arrays" representation. We should consider exploring a similar approach to see if we can achieve similar speedups and memory savings.
(This would be much easier with https://github.com/astral-sh/ruff/issues/15655, since we wouldn't have to copy/paste the data representation changes in each of the syntax nodes. But this will likely require extensive updates everywhere we consume ASTs, since all of that code currently assumes it can use direct field access. We might consider a separate first step that updates consumers to use accessor methods, which would just read the fields directly without changing the representation.)