The Marker API of likwid-perfctr lets you count hardware events on your CPU core(s) separately for different execution regions. E.g., in order to count events for a loop, you would use it like this:
#include <likwid.h>
int main(...) {
// always required once
LIKWID_MARKER_INIT;
// ...
LIKWID_MARKER_START("loop");
for(int i=0; i<n; ++i) {
do_some_work();
}
LIKWID_MARKER_STOP("loop");
// ...
LIKWID_MARKER_CLOSE;
return 0;
}
An arbitrary number of regions is allowed, and you can use the LIKWID_MARKER_START and LIKWID_MARKER_STOP macros in parallel regions to get per-core readings. The events to be counted are configured on the likwid-perfctr command line. As with anything that is not part of the actual work in a code, one may ask about the cost of the marker API calls. Do they impact the runtime of the code? Does the number of cores play a role? Continue reading
