golang/go

runtime, cmd/compile: increase tmpStringBufSize

Open

#75441 opened on Sep 12, 2025

View on GitHub
 (12 comments) (6 reactions) (0 assignees)Go (133,883 stars) (19,008 forks)batch import
ImplementationNeedsDecisionPerformancecompiler/runtimehelp wanted

Description

tmpStringBufSize is currently set to 32. The original CL (@dvyukov) says the following about the value:

Size of the buffer is 32 bytes. There is no fundamental theory
behind this number. Just an observation that on std lib
tests/benchmarks frequency of string allocation is inversely
proportional to string length; and there is significant number
of allocations up to length 32.

There are very common cases in which a slightly larger buffer would allow the operation to complete without an allocation: examples include UUIDs (36 bytes), IPv6s (up to 45 bytes), common hashes like SHA-256 (64 bytes), URLs (difficult to estimate, but anedoctally as well as from other sources the average URL length is over 32 bytes), file names and paths (same).

While this will almost certainly increase stack usage in some cases, I would pragmatically suggest to raise the value from 32 to 64 to at least cover those common identifiers. The only issue (that is already present) is that tmpStringBufSize is also used as the capacity of the rune array (32 runes = 128 bytes) in stringtoslicerune, and raising it to 64 would yield a corresponding +128 bytes increase in stack allocation in that case.

In preliminary tests below, raising the threshold from 32 to 64 bytes causes the number of concat operations eligible for stack allocation to go from 75-80% to 90-95%; while raising it to to 128 brings it to ~98%.

It is worth noting that, while the situation is different, encoding/json/v2 uses 256 bytes as the limit for string interning specifically for some of the reasons listed above.

An alternative, but would be very likely much harder to implement, would be to have the compiler just inform the runtime that the string concatenation result will not escape, and the runtime will automatically use all available stack space and (basically like an alloca) adjust the stack pointers as necessary once the result size is known.

Another alternative, likely midway between the two in terms of complexity, is to allow the compiler to decide the size of the allocation based on heuristics or prove/constprop and pass the information to the runtime at each site, possibly allowing larger temporary buffers (e.g. up to 256 bytes) in places where they are needed while keeping smaller ones where they are not.

Contributor guide