What is a Warm-Up Cache?

The warm up is just the period of loading a set of data so that the cache gets populated with valid data. If you’re doing performance testing against a system that usually has a high frequency of cache hits, without the warm up you’ll get false numbers because what would normally be a cache hit … Read more

Why is MPI considered harder than shared memory and Erlang considered easier, when they are both message-passing?

I agree with all previous answers, but I think a key point that is not made totally clear is that one reason that MPI might be considered hard and Erlang easy is the match of model to the domain. Erlang is based on a concept of local memory, asynchronous message passing, and shared state solved … Read more

Is volatile bool for thread control considered wrong?

You don’t need a synchronized variable, but rather an atomic variable. Luckily, you can just use std::atomic<bool>. The key issue is that if more than one thread accesses the same memory simultaneously, then unless the access is atomic, your entire program ceases to be in a well-defined state. Perhaps you’re lucky with a bool, which … Read more

Multi-Core and Concurrency – Languages, Libraries and Development Techniques [closed]

I’d suggest two paradigm shifts: Software Transactional Memory You may want to take a look at the concept of Software Transactional Memory (STM). The idea is to use optimistic concurrency: any operation that runs in parallel to others try to complete its job in an isolated transaction; if at some point another transaction has been … Read more

Which CPU architectures support Compare And Swap (CAS)?

Powerpc has more powerful primitives available: “lwarx” and “stwcx” lwarx loads a value from memory but remembers the location. Any other thread or cpu that touches that location will cause the “stwcx”, a conditional store instruction, to fail. So the lwarx /stwcx combo allows you to implement atomic increment / decrement, compare and swap, and … Read more

rdtsc accuracy across CPU cores

X86_FEATURE_CONSTANT_TSC + X86_FEATURE_NONSTOP_TSC bits in cpuid (edx=x80000007, bit #8; check unsynchronized_tsc function of linux kernel for more checks) Intel’s Designer’s vol3b, section 16.11.1 Invariant TSC it says the following “16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated … Read more

Does Java have support for multicore processors/parallel processing?

Does Java have support for multicore processors/parallel processing? Yes. It also has been a platform for other programming languages where the implementation added a “true multithreading” or “real threading” selling point. The G1 Garbage Collector introduced in newer releases also makes use of multi-core hardware. Java Concurrency in Practice Try to get a copy of … Read more

GPGPU vs. Multicore?

Interesting question. I have researched this very problem so my answer is based on some references and personal experiences. What types of problems are better suited to regular multicore and what types are better suited to GPGPU? Like @Jared mentioned. GPGPU are built for very regular throughput workloads, e.g., graphics, dense matrix-matrix multiply, simple photoshop … Read more

tech