cpu-cache – Page 3 – Tarik Billa

simplest tool to measure C program cache hit/miss and cpu time in linux?

February 15, 2023 by Tarik

Use perf: perf stat ./yourapp See the kernel wiki perf tutorial for details. This uses the hardware performance counters of your CPU, so the overhead is very small. Example from the wiki: perf stat -B dd if=/dev/zero of=/dev/null count=1000000 Performance counter stats for ‘dd if=/dev/zero of=/dev/null count=1000000’: 5,099 cache-misses # 0.005 M/sec (scaled from 66.58%) … Read more

What is a cache hit and a cache miss? Why would context-switching cause cache miss?

February 8, 2023 by Tarik

Can someone explain in an easy to understand way the concept of cache miss and its probable opposite (cache hit)? A cache miss, generally, is when something is looked up in the cache and is not found – the cache did not contain the item being looked up. The cache hit is when you look … Read more

Line size of L1 and L2 caches

January 19, 2023 by Tarik

Cache-Lines size is (typically) 64 bytes. Moreover, take a look at this very interesting article about processors caches: Gallery of Processor Cache Effects You will find the following chapters: Memory accesses and performance Impact of cache lines L1 and L2 cache sizes Instruction-level parallelism Cache associativity False cache line sharing Hardware complexities

Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size

January 17, 2023 by Tarik

The intent of these constants is indeed to get the cache-line size. The best place to read about the rationale for them is in the proposal itself: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html I’ll quote a snippet of the rationale here for ease-of-reading: […] the granularity of memory that does not interfere (to the first-order) [is] commonly referred to as … Read more

Write-back vs Write-Through caching?

November 17, 2022 by Tarik

The benefit of write-through to main memory is that it simplifies the design of the computer system. With write-through, the main memory always has an up-to-date copy of the line. So when a read is done, main memory can always reply with the requested data. If write-back is used, sometimes the up-to-date data is in … Read more

How does one write code that best utilizes the CPU cache to improve performance?

October 25, 2022 by Tarik

The cache is there to reduce the number of times the CPU would stall waiting for a memory request to be fulfilled (avoiding the memory latency), and as a second effect, possibly to reduce the overall amount of data that needs to be transfered (preserving memory bandwidth). Techniques for avoiding suffering from memory fetch latency … Read more

Approximate cost to access various caches and main memory?

October 15, 2022 by Tarik

Numbers everyone should know 0.5 ns – CPU L1 dCACHE reference 1 ns – speed-of-light (a photon) travel a 1 ft (30.5cm) distance 5 ns – CPU L1 iCACHE Branch mispredict 7 ns – CPU L2 CACHE reference 71 ns – CPU cross-QPI/NUMA best case on XEON E5-46* 100 ns – MUTEX lock/unlock 100 ns … Read more

How much of ‘What Every Programmer Should Know About Memory’ is still valid?

October 13, 2022 by Tarik

Why does the order of the loops affect performance when iterating over a 2D array?

September 22, 2022 by Tarik

As others have said, the issue is the store to the memory location in the array: x[i][j]. Here’s a bit of insight why: You have a 2-dimensional array, but memory in the computer is inherently 1-dimensional. So while you imagine your array like this: 0,0 | 0,1 | 0,2 | 0,3 —-+—–+—–+—- 1,0 | 1,1 … Read more

What is a “cache-friendly” code?

September 7, 2022 by Tarik

Preliminaries On modern computers, only the lowest level memory structures (the registers) can move data around in single clock cycles. However, registers are very expensive and most computer cores have less than a few dozen registers. At the other end of the memory spectrum (DRAM), the memory is very cheap (i.e. literally millions of times … Read more