simplest tool to measure C program cache hit/miss and cpu time in linux?

Use perf: perf stat ./yourapp See the kernel wiki perf tutorial for details. This uses the hardware performance counters of your CPU, so the overhead is very small. Example from the wiki: perf stat -B dd if=/dev/zero of=/dev/null count=1000000 Performance counter stats for ‘dd if=/dev/zero of=/dev/null count=1000000’: 5,099 cache-misses # 0.005 M/sec (scaled from 66.58%) … Read more

What is a cache hit and a cache miss? Why would context-switching cause cache miss?

Can someone explain in an easy to understand way the concept of cache miss and its probable opposite (cache hit)? A cache miss, generally, is when something is looked up in the cache and is not found – the cache did not contain the item being looked up. The cache hit is when you look … Read more

Line size of L1 and L2 caches

Cache-Lines size is (typically) 64 bytes. Moreover, take a look at this very interesting article about processors caches: Gallery of Processor Cache Effects You will find the following chapters: Memory accesses and performance Impact of cache lines L1 and L2 cache sizes Instruction-level parallelism Cache associativity False cache line sharing Hardware complexities

Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size

The intent of these constants is indeed to get the cache-line size. The best place to read about the rationale for them is in the proposal itself: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html I’ll quote a snippet of the rationale here for ease-of-reading: […] the granularity of memory that does not interfere (to the first-order) [is] commonly referred to as … Read more

How does one write code that best utilizes the CPU cache to improve performance?

The cache is there to reduce the number of times the CPU would stall waiting for a memory request to be fulfilled (avoiding the memory latency), and as a second effect, possibly to reduce the overall amount of data that needs to be transfered (preserving memory bandwidth). Techniques for avoiding suffering from memory fetch latency … Read more

Why does the order of the loops affect performance when iterating over a 2D array?

As others have said, the issue is the store to the memory location in the array: x[i][j]. Here’s a bit of insight why: You have a 2-dimensional array, but memory in the computer is inherently 1-dimensional. So while you imagine your array like this: 0,0 | 0,1 | 0,2 | 0,3 —-+—–+—–+—- 1,0 | 1,1 … Read more

What is a “cache-friendly” code?

Preliminaries On modern computers, only the lowest level memory structures (the registers) can move data around in single clock cycles. However, registers are very expensive and most computer cores have less than a few dozen registers. At the other end of the memory spectrum (DRAM), the memory is very cheap (i.e. literally millions of times … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)