x86 – Tarik Billa

x86, difference between BYTE and BYTE PTR

April 10, 2024 by Tarik

Summary: NASM/YASM requires word [ecx] when the operand-size isn’t implied by the other operand. (Otherwise [ecx] is ok). MASM/TASM requires word ptr [ecx] when the operand-size isn’t implied by the other operand. (Otherwise [ecx] is ok). They each choke on the other’s syntax. WARNING: This is very strange area without any ISO standards or easy-to-find … Read more

What is it about CMOV which improves CPU pipeline performance?

April 7, 2024 by Tarik

Could somebody please help me understand how CMOV improves branching? Well, it does NOT improve branching, it removes it. A CMOV could be seen as two instructions in one, a MOV and a NOP. Which one is executed depends on the flags. So internally it may look like if (cond) { mov dst, src } … Read more

What happens after a L2 TLB miss?

April 7, 2024 by Tarik

(Some of this is x86 and Intel-specific. Most of the key points apply to any CPU that does hardware page walks. I also discuss ISAs like MIPS that handle TLB misses with software.) Modern x86 microarchitectures have dedicated page-walk hardware. They can even speculatively do page-walks to load TLB entries before a TLB miss actually … Read more

What exactly is the base pointer and stack pointer? To what do they point?

January 17, 2024 by Tarik

esp is as you say it is, the top of the stack. ebp is usually set to esp at the start of the function. Function parameters and local variables are accessed by adding and subtracting, respectively, a constant offset from ebp. All x86 calling conventions define ebp as being preserved across function calls. ebp itself … Read more

Why does std::tuple break small-size struct calling convention optimization in C++?

January 9, 2024 by Tarik

It seems to be a matter of ABI. For instance, the Itanium C++ ABI reads: If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference. And, further: A type is considered non-trivial for the purposes of calls if it has … Read more

Crash with icc: can the compiler invent writes where none existed in the abstract machine?

January 9, 2024 by Tarik

Your program is well-formed and free of undefined behaviour, as far as I can tell. The C++ abstract machine never actually assigns to a const object. A not-taken if() is sufficient to “hide”https://stackoverflow.com/”protect” things that would be UB if they executed. The only thing an if(false) can’t save you from is an ill-formed program, e.g. … Read more

Do function pointers force an instruction pipeline to clear?

January 8, 2024 by Tarik

On some processors an indirect branch will always clear at least part of the pipeline, because it will always mispredict. This is especially the case for in-order processors. For example, I ran some timings on the processor we develop for, comparing the overhead of an inline function call, versus a direct function call, versus an … Read more

Relative performance of swap vs compare-and-swap locks on x86

January 7, 2024 by Tarik

I assume atomic_swap(lockaddr, 1) gets translated to a xchg reg,mem instruction and atomic_compare_and_swap(lockaddr, 0, val) gets translated to a cmpxchg[8b|16b]. Some linux kernel developers think cmpxchg ist faster, because the lock prefix isn’t implied as with xchg. So if you are on a uniprocessor, multithread or can otherwise make sure the lock isn’t needed, you … Read more

why are separate icache and dcache needed [duplicate]

January 7, 2024 by Tarik

The main reason is: performance. Another reason is power consumption. Separate dCache and iCache makes it possible to fetch instructions and data in parallel. Instructions and data have different access patterns. Writes to iCache are rare. CPU designers are optimizing the iCache and the CPU architecture based on the assumption that code changes are rare. … Read more

Printing out a number in assembly language?

January 6, 2024 by Tarik

Have you tried int 21h service 2? DL is the character to print. mov dl,’A’ ; print ‘A’ mov ah,2 int 21h To print the integer value, you’ll have to write a loop to decompose the integer to individual characters. If you’re okay with printing the value in hex, this is pretty trivial. If you … Read more