C++ cache aware programming

According to “What every programmer should know about memory”, by Ulrich Drepper you can do the following on Linux: Once we have a formula for the memory requirement we can compare it with the cache size. As mentioned before, the cache might be shared with multiple other cores. Currently {There definitely will sometime soon be … Read more

Why is linear read-shuffled write not faster than shuffled read-linear write?

This is a complex problem closely related to architectural features of modern processors and your intuition that random read are slower than random writes because the CPU has to wait for the read data is not verified (most of the time). There are several reasons for that I will detail. Modern processors are very efficient … Read more

Do current x86 architectures support non-temporal loads (from “normal” memory)?

To answer specifically the headline question: Yes, recent1 mainstream Intel CPUs support non-temporal loads on normal 2 memory – but only “indirectly” via non-temporal prefetch instructions, rather than directly using non-temporal load instructions like movntdqa. This is in contrast to non-temporal stores where you can just use the corresponding non-temporal store instructions3 directly. The basic … Read more

Why does the speed of memcpy() drop dramatically every 4KB?

Memory is usually organized in 4k pages (although there’s also support for larger sizes). The virtual address space your program sees may be contiguous, but it’s not necessarily the case in physical memory. The OS, which maintains a mapping of virtual to physical addresses (in the page map) would usually try to keep the physical … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)