[FR] ability to create artifical hotspots · google/benchmark#1154

(7 comments) (0 reactions) (0 assignees)C++ (1,539 forks)batch import

enhancementhelp wanted

Repository metrics

Stars: (7,968 stars)
PR merge metrics: (Avg merge 4d 2h) (19 merged PRs in 30d)

Description

Intro

I was trying to measure the performance impact of a single cache miss while reading the middle element of an array containing 1,000,000 integers. It required 0.437 ns according to benchmark's results. However, I don't know for sure, if the result is being cached on repeated iterations over s, as we're only accessing a single element, and nothing else.

It would be bad, if the data was already cached, and the benchmark was meaningless.

This is the section of the code I was analyzing.

	for(auto _ : s)
	{
		int duh = x[500000]+x[500000]; // duh
		benchmark::DoNotOptimize(duh); // duh
	}

There's a little bit of need for functions that trash the data cache, instruction cache, and cause other anomalies. There's some need for methods that artificially cause branch mispredictions, L1-L2-L3 cache misses, etc.

It would be very helpful to simulate when a system is noisy, and the performance of codes in the first few cycles, when the cache isn't hot. This is a very important case for many industries.

Such conditions occur in very low latency applications like High Frequency Trades, game engine development, rendering engine development, numerical simulation codes, and being able to simulate such artificial hotspots, at the call of a single function or macro, would be extremely helpful.

Describe the solution you'd like

Methods that would do a lot of busy work, without actually doing anything, thus trashing the instruction cache.
Methods that would do a lot of memory access that completely overwrites everything in L1-L2-L3 cache, or any level the user may want to selectively trash. Maybe allow the user to trash only N% of the L1/L2/L3 caches etc.
Methods that artificially cause branch mispredictions in seemingly normal code, almost out of nowhere.
Methods that cause CPU migration, context switches, and other common hotspots that suck for performance.

There could be other artificial hotspots that might interest people in other domains, but the few above would suit the needs of those interested in numerical computations, and many others.

Describe alternatives you've created

Artificially created and used hotspots on my own, but that slows down the fast and ready-to-go nature of using benchmark.

I'm already able to do all of this through my custom code, but it would be helpful if at least some were available in the benchmark, and could be something as simple like benchmark::TrashL1Cache(), benchmark::BranchMispredictionEveryNIterations(1234) where 1234 is the number of iterations after which a branch misprediction would happen, etc

Some of the things like branch misprediction is not difficult to code at all, but it would be helpful if it was available in the stock benchmark just a function call away.

Thanks

Contributor guide

Research direction: Study the benchmark library's architecture and investigate how to add new functions for cache flushing and branch misprediction. Look for existing hardware prefetching or cache control intrinsics.
Tech stack: cpp
Domain: performance
Issue type: Feature
Difficulty: 4
Estimated time: 3-5 days
Activity status: Needs maintainer response
Clarity: Mostly clear
Prerequisites: C++Git
Newbie friendliness: 35

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.