Documente online.
Zona de administrare documente. Fisierele tale
Am uitat parola x Creaza cont nou
 HomeExploreaza
upload
Upload




Advance c++ Optimizing and assembly

c


Advance c++ Optimizing and assembly

Preface:
This articles aim is to provide a complete, but brief approach to c++ and assembly mixing together for greater optimization of code. I also added a extra section that shows you how you can execute 2 instruction at once(in the advance section)

Basic Stuff:



This section is for newbies to assembly programming, this sections provides simple methods of optimization nothing really advance, this section is recommended for newbies assembly programmers Also you should be familiar with at least some assembly.

I decided to go with the basics first since they or very easy deal with. I also decided to provide a small assembly tutorial for those who are not "to" familiar with assembly, and its concepts.
The first thing we need to learn is register.

EAX
EBX
ECX
EDX

Those are the four general-purpose registers; think of register as variables inside your CPU. To use this register is very simple. You will mostly be accessing these registers with inline assembly.

To do any kind of assembly programming in c++, you 22222o1410w must learn about the 'asm' statements, on visual c++ this statement is coded like this

__asm

This is call inline assembly

So you must always include the __asm statement before your assembly code.
Now to play around with registers let's make a simple program that adds a DWORD to the value of 5 with registers,

Now please note a DWORD is a 32 BIT variable, if you don't include windows.h in your program you can't use DWORD, but we know that DWORDS's are just unsigned long type variables right. So lets get on with the code

unsigned long function(unsigned long I)

return i;
}

Now this is just some simple code, no real speed gain here, this is just to show you how easy it is to mix c++ and assembly code.

C++ Protocols
C/C++ being the high level language that it is must have strict protocols that you have to follow if you don't want your c++ program to crash, learning these protocols provides a great methods for optimizing code, the first thing we need to learn is proper use of the eax register, since this is the register c++ returns values in

Example:
return 4;

This simple statement return 4; compiles into

mov eax,4

As you can the eax register holds some special uses, to apply what we learn lets create a simple function that returns 4, using nothing but assembly, please notice, I am going to use the naked specs, with my code, to learn about the naked function read my article about the naked function Naked function article

__declspec(naked) int return4()

}

Very simple, if you did something like this
if(return4() == 4)

It would actually exit the program, since eax holds the return value for functions.
Now that we know a little bit more about c++ protocols we can move on to some advance optimization

OPTIMIZE about time:


Most of your optimization with assembly code will be short and brief, nothing to tricky nor complex, the first thing I decided to optimize was a byte swap function, that turns little endian byte order, into big endian byte order, very important when doing socket programming, the tradition way to do it is htons(), so I decided to write a optimization.

__declspec(naked) int fast_ton (unsigned short v)

}

This function uses the X86 Instruction bswap, which swaps the bytes, the first thing we do is clear eax, by going xor eax, eax this makes eax zero, you might be wondering why not just mov eax,0
Well xor eax, eax is a more optimize way, second we mov ax, which is the higher 16 bits of eax,
Then we swap the bytes, with bswap eax,
After all we have one problem, eax is a 32 bit register, the value of eax is two high

Image this lets say we went fast_ton(1); , now when we swap it one will be place in the 32 bit, meaning the value of 1677723(something around there) instead of 256 which htons will return. The simple solution is to move/shift the 1 down to the 16th bit instead of the 32nd bit, which we can accomplish with the shr(Shift Right) instruction which shifts all the bits right
After that we simply return with the ret instruction.

Of course this function can be more optimize than this but for the sake of simplicity I deiced to code it like this

Some Basic Macros:
Macros are a great way to optimize code, and make it very simple to reuse code.  Also they make your code more portable. Lets write the classic variable switch

int var1;
int var2;
int temp;

temp = var1;
var1 = var2;
var3 = var1;

Not only is this very commonly use , it is very poor, when the X86 provides a simple mechanism for this with its xchg instruction and the stack, I decided to use the stack for simplicity sake.

To exchange to value with the stack is simple

int var1 = 3;
int var2 = 5;

push var1
push var2

pop var1
pop var2

this little faster. This works because the stack is FILO meaning the first one you push into the stack will be the last one out, since we push var2 in the stack last, meaning the next thing we pop will get var2's value.

 
I created a sample project with two Marcos that can be use to push values on the stack and save values to the stack since they are macros this code can be ported

#define m_save(reg) __asm push (reg)
#define m_get(reg) __asm pop (reg)
unsigned long value = 3;

int main(int argc, char *argv[])

Output:

Value = 66
Value = 3
Press any key to continue

You can use the two macros m_save, and m_get in your code just copy and paste them go on do it. Although I didn't show you how to switch variables, I showed you how you could temporally save a variable for later use

Final Optimizations

This is the last sections for the basic optimizations , this is almost like a reference since I will go though many of the c++ optimization commands.

The first one is __fastcall, when this is declare with a function it makes the first two parameters of that function go into register ecx,edx instead of the stack. Example

__declspec(naked) int __fastcall superfast_ton(int v1)

}

Very nice, eh?.

The second optimizations are with the use of the #pragma, you can use this to turn off stack probes. Example.

#pragma (check_stack) off

Second its a good thing to turn off runtime checks(most of the time)

#pragma runtime_checks( "s", off )

Next we should turn on every single optimization

#pragma optimize( "", on )

Pipe optimization:

Up until now we been programming assembly code for the newbies, now its time for something advance

executing two instruction at once.
How it works is simple, the Pentium has two pipelines one is the U-Pipe, and one is the V-Pipe, under certain circumstances(not all) it is possible for you to pair up to instruction then execute them at the same TIME
Not all instruction can take part of this event, but a few pairs can only when certain condition or met. The first thing is to learn the instruction that can not be paired with each other

Unpair able instructions
1. Shift or rotate instructions with the shift count in CL
2. Long arithmetic instructions for example, MUL, DIV
3.
Extended instructions for example, RET, ENTER, PUSHA, MOVS, REP STOS
4. Some floating point instructions for example FSCALE, FLDCW, FST
5.
Inter-segment instructions for example, PUSH sreg, CALL far

I got these lists from somewhere, but anyway, this instruction can not be executed at the same time. So which one can? That's is simple, some instruction can only be execute in the U pipe or the V pipe, Some instruction can be executed in both.

U/V Pipe Instruction
Parable instructions issued to U or V pipes (UV
1. Most 8/32 bit ALU operations for example, ADD, INC, XOR
2.
All 8/32 bit compare instructions for example, CMP, TEST
3. All 8/32 bit stack operations using registers: PUSH reg, POP reg

These instruction can execute in both pipes, the U pipe and the V pipe

U Pipe Instruction
Pair able instructions issued to U pipe (PU
These instructions must be executed in the U pipe and can be paired with a
suitable instruction in the V pipe.
1. Carry instructions for example, ADC, SBB
2. Prefixed instructions (see later on
3. Shift with immediate
4. Some floating point instructions for example, FADD, FMUL, FLD

These instruction can be only executed in the U pipe.

V Pipe Instruction Parable instructions issued to V pipe (PV
These instructions can be executed in the U pipe or in the V pipe but they
will only be paired when executed in the V pipe.
1. Simple control transfer instructions for example; CALL near, JMP near, Jcc.
This includes both the Jcc short and Jcc near (0F prefixed
2. The floating point instruction FXCH

These instruction can only be executed in the V Pipe

Here is a table containing all pair able and not pair able instruction


Special Notice Not all instructions can be paired; no pairing can be done when the following
conditions occur:

1. The next two instructions cannot be paired. (At the end of the doc you'll
find a pairing table) In general most arithmetic instructions can be paired.
2. The next two instructions have some register contention. In other words
they update/use the same registers (implicit or explicit
3. Both the instructions are not in the instruction cache. An exception to
this is when the first instruction is a one byte instruction.

This is a small short article written by Opcode Void/ [email protected], question / comments are welcome



Document Info


Accesari: 2024
Apreciat: hand-up

Comenteaza documentul:

Nu esti inregistrat
Trebuie sa fii utilizator inregistrat pentru a putea comenta


Creaza cont nou

A fost util?

Daca documentul a fost util si crezi ca merita
sa adaugi un link catre el la tine in site


in pagina web a site-ului tau.




eCoduri.com - coduri postale, contabile, CAEN sau bancare

Politica de confidentialitate | Termenii si conditii de utilizare




Copyright © Contact (SCRIGROUP Int. 2024 )