MOVing between two memory addresses
Your suspicion is correct, you can’t move from memory to memory. Any general-purpose register will do. Remember to PUSH the register if you are not sure what’s inside it and to restore it back once done.
Your suspicion is correct, you can’t move from memory to memory. Any general-purpose register will do. Remember to PUSH the register if you are not sure what’s inside it and to restore it back once done.
I’m referring to instructions like sizeof and cpblk – there’s no class or command that executes these instructions (sizeof in C# is computed at compile time, not at runtime AFAIK). This is incorrect. sizeof(int) will be treated as the compile-time constant 4, of course, but there are plenty of situations (all in unsafe code) where … Read more
leal, or lea full name is “Load effective address” and it does exactly this: It does an address calculation. In your example the address calculation is very simple, because it just adds a offset to ebx and stores the result in eax: eax = ebx + 0x10 lea can do a lot more. It can … Read more
Unless your 64-bit value can be encoded as a 32-bit-sign-extended immediate, you have to move it to a register first and then store. (Or do two separate 32-bit stores, or other worse workaround to get the bytes where you want them.) In NASM / Intel syntax, mov r64, 0x… picks a MOV encoding based on … Read more
I noticed in the comments that: The loop takes 5 cycles to execute. It’s “supposed” to take 4 cycles. (since there’s 4 adds and 4 mulitplies) However, your assembly shows 5 SSE movssl instructions. According to Agner Fog’s tables all floating-point SSE move instructions are at least 1 inst/cycle reciprocal throughput for Nehalem. Since you … Read more
PAUSE notifies the CPU that this is a spinlock wait loop so memory and cache accesses may be optimized. See also pause instruction in x86 for some more details about avoiding the memory-order mis-speculation when leaving the spin-loop. PAUSE may actually stop CPU for some time to save power. Older CPUs decode it as REP … Read more
Try using EMON profiling in Vtune, or some equivalent tool like oprof Vtune for Linux (you can search for the Windows version) oprofile EMON (Event Monitoring) profiling => like a time based tool, but it can tell you what performance event is causing the problem. Although, you should start out with a time based profile … Read more
It stands for “End Branch 64 bit” — or more precisely, Terminate Indirect Branch in 64 bit. Here is the operation: IF EndbranchEnabled(CPL) & EFER.LMA = 1 & CS.L = 1 IF CPL = 3 THEN IA32_U_CET.TRACKER = IDLE IA32_U_CET.SUPPRESS = 0 ELSE IA32_S_CET.TRACKER = IDLE IA32_S_CET.SUPPRESS = 0 FI FI; The instruction is otherwise … Read more
There are many valid cases for code modification. Generating code at run time can be useful for: Some virtual machines use JIT compilation to improve performance. Generating specialized functions on the fly has long been common in computer graphics. See e.g. Rob Pike and Bart Locanthi and John Reiser Hardware Software Tradeoffs for Bitmap Graphics … Read more
The meaning of test is to AND the arguments together, and check the result for zero. So this code tests if EAX is zero or not. je will jump if zero. BTW, this generates a smaller instruction than cmp eax, 0 which is the reason that compilers will generally do it this way.