cpu-cache – Page 2 – Tarik Billa

C++ cache aware programming

July 11, 2023 by Tarik

According to “What every programmer should know about memory”, by Ulrich Drepper you can do the following on Linux: Once we have a formula for the memory requirement we can compare it with the cache size. As mentioned before, the cache might be shared with multiple other cores. Currently {There definitely will sometime soon be … Read more

How can I do a CPU cache flush in x86 Windows?

May 3, 2023 by Tarik

Fortunately, there is more than one way to explicitly flush the caches. The instruction “wbinvd” writes back modified cache content and marks the caches empty. It executes a bus cycle to make external caches flush their data. Unfortunately, it is a privileged instruction. But if it is possible to run the test program under something … Read more

What’s the difference between conflict miss and capacity miss

May 1, 2023 by Tarik

The important distinction here is between cache misses caused by the size of your data set, and cache misses caused by the way your cache and data alignment are organized. Lets assume you have a 32k direct mapped cache, and consider the following 2 cases: You repeatedly iterate over a 128k array. There’s no way … Read more

Where is the L1 memory cache of Intel x86 processors documented?

April 27, 2023 by Tarik

It is near impossible to find specs on Intel caches. When I was teaching a class on caches last year, I asked friends inside Intel (in the compiler group) and they couldn’t find specs. But wait!!! Jed, bless his soul, tells us that on Linux systems, you can squeeze lots of information out of the … Read more

Why is linear read-shuffled write not faster than shuffled read-linear write?

April 25, 2023 by Tarik

This is a complex problem closely related to architectural features of modern processors and your intuition that random read are slower than random writes because the CPU has to wait for the read data is not verified (most of the time). There are several reasons for that I will detail. Modern processors are very efficient … Read more

Do current x86 architectures support non-temporal loads (from “normal” memory)?

April 17, 2023 by Tarik

To answer specifically the headline question: Yes, recent1 mainstream Intel CPUs support non-temporal loads on normal 2 memory – but only “indirectly” via non-temporal prefetch instructions, rather than directly using non-temporal load instructions like movntdqa. This is in contrast to non-temporal stores where you can just use the corresponding non-temporal store instructions3 directly. The basic … Read more

Are CPU registers and CPU cache different? [closed]

April 16, 2023 by Tarik

Yes, CPU register is just a small amount of data storage, that facilitates some CPU operations. CPU cache, it is a high speed volatile memory which is bigger in size, that helps the processor to reduce the memory operations.

How are cache memories shared in multicore Intel CPUs?

April 9, 2023 by Tarik

In a multiprocessor system or a multicore processor (Intel Quad Core, Core two Duo etc..) does each cpu core/processor have its own cache memory (data and program cache)? Yes. It varies by the exact chip model, but the most common design is for each CPU core to have its own private L1 data and instruction … Read more

Why does the speed of memcpy() drop dramatically every 4KB?

March 8, 2023 by Tarik

Memory is usually organized in 4k pages (although there’s also support for larger sizes). The virtual address space your program sees may be contiguous, but it’s not necessarily the case in physical memory. The OS, which maintains a mapping of virtual to physical addresses (in the page map) would usually try to keep the physical … Read more

Which ordering of nested loops for iterating over a 2D array is more efficient [duplicate]

February 21, 2023 by Tarik

The first method is slightly better, as the cells being assigned to lays next to each other. First method: [ ][ ][ ][ ][ ] …. ^1st assignment ^2nd assignment [ ][ ][ ][ ][ ] …. ^101st assignment Second method: [ ][ ][ ][ ][ ] …. ^1st assignment ^101st assignment [ ][ ][ … Read more