Description
Intro
I was trying to measure the performance impact of a single cache miss while reading the middle element of an array containing 1,000,000 integers. It required 0.437 ns according to benchmark's results. However, I don't know for sure, if the result is being cached on repeated iterations over s, as we're only accessing a single element, and nothing else.
It would be bad, if the data was already cached, and the benchmark was meaningless.
This is the section of the code I was analyzing.
for(auto _ : s)
{
int duh = x[500000]+x[500000]; // duh
benchmark::DoNotOptimize(duh); // duh
}
There's a little bit of need for functions that trash the data cache, instruction cache, and cause other anomalies. There's some need for methods that artificially cause branch mispredictions, L1-L2-L3 cache misses, etc.
It would be very helpful to simulate when a system is noisy, and the performance of codes in the first few cycles, when the cache isn't hot. This is a very important case for many industries.
Such conditions occur in very low latency applications like High Frequency Trades, game engine development, rendering engine development, numerical simulation codes, and being able to simulate such artificial hotspots, at the call of a single function or macro, would be extremely helpful.
Describe the solution you'd like
- Methods that would do a lot of busy work, without actually doing anything, thus trashing the instruction cache.
- Methods that would do a lot of memory access that completely overwrites everything in L1-L2-L3 cache, or any level the user may want to selectively trash. Maybe allow the user to trash only N% of the L1/L2/L3 caches etc.
- Methods that artificially cause branch mispredictions in seemingly normal code, almost out of nowhere.
- Methods that cause CPU migration, context switches, and other common hotspots that suck for performance.
There could be other artificial hotspots that might interest people in other domains, but the few above would suit the needs of those interested in numerical computations, and many others.
Describe alternatives you've created
Artificially created and used hotspots on my own, but that slows down the fast and ready-to-go nature of using benchmark.
I'm already able to do all of this through my custom code, but it would be helpful if at least some were available in the benchmark, and could be something as simple like benchmark::TrashL1Cache(), benchmark::BranchMispredictionEveryNIterations(1234) where 1234 is the number of iterations after which a branch misprediction would happen, etc
Some of the things like branch misprediction is not difficult to code at all, but it would be helpful if it was available in the stock benchmark just a function call away.
Thanks