Proposal for an MMX C-Interface
The objective of this proposal is to provide a high level interface for programmers using lcc2 for accessing all new MMX instructions.
The MMX instruction set is accessible through intrinsic functions, that are recognized and inlined by the compiler.
The data type used by all MMX intrinsics is an 8 byte union, described in 'mmx.h'. The interface is designed to work at maximum speed when vectors of this datatype are used. The internal loop necessary to apply the given operation to all elements of the data vectors is generated in-line. The dimensions of both arrays should be identical.
Scalar extension is provided, i.e. one of the inputs to the MMX intrinsics can be a scalar, that will be automatically extended by the compiler to apply the mmx operation to all elements of the input vector.
Since the MMX instructions and floating point instructions are incompatible, it is assumed that a function does not mix floating point and mmx. An emms instruction will be issued in th 818h74i e function epilogue if the mmx instruction set is used.
Obviously, the assembler interface is still available, and assembler instructions can be used direcly. In this case, it is the programmer's responsability to issue the 'emms' instruction.
Instructions vary by:
Data type: packed bytes, packed words, packed doublewords or quadwords
Signed - Unsigned numbers
Wraparound - Saturate arithmetic
Scalar/Vector data
A typical MMX instruction has this syntax:
Prefix:
'_' to indicate that this is a compiler reserved word.
'p' for Packed, as Intel suggests.[1]
Instruction operation: for example - ADD, CMP, or XOR
Suffix:
US for Unsigned Saturation
S for Signed saturation
B, W, D, Q for the data type: packed byte, packed word, packed doubleword, or quadword.
'i' for 'immediate' (scalar) data. If this suffix is not present, the function operates over two arrays.
The pack operation operates with words (packed to bytes) or with dwords (packed to words).
void _stdcall _packsswb(_mmxdata *array1,_mmxdata *array2,int n);
Description
Each element of array1 will be packed with the corresponding element of array2. The result is written to array1. The number of elements of both arrays is given by 'n'.
Mode of operation:
while (n-- > 0)
void _stdcall _packsswbi(_mmxdata *array,_mmxdata *imm,int n);
Description
Each element of array1 will be packed with imm. The result is written to array1. The number of elements of array1 is given by 'n'.
Mode of operation:
while (n-- > 0)
void _stdcall _packssdw(_mmxdata *array1,_mmxdata *array2,int n);
Description
Each element of array1 will be packed with the corresponding element of array2. The result is written to array1. The number of elements of both arrays is given by 'n'.
Mode of operation:
while (n-- > 0)
void _stdcall _packssdwi(_mmxdata *array,_mmxdata *imm,int n);
Description
Each element of array1 will be packed with the corresponding element of array2. The result is written to array1. The number of elements of both arrays is given by 'n'.
Mode of operation:
while (n-- > 0)
void _stdcall _packuswb(_mmxdata *array1,_mmxdata *array2,int n);
Description
Each element of array1 will be packed with the corresponding element of array2. The result is written to array1. The number of elements of both arrays is given by 'n'.
Mode of operation:
while (n-- > 0)
void _stdcall _packuswbi(_mmxdata *array,_mmxdata *imm,int n);
Description
Each element of array1 will be packed with imm. The result is written to array1. The number of elements of array is given by 'n'.
Mode of operation:
while (n-- > 0)
Packed add byte
void _stdcall _paddb(_mmxdata *array1,_mmxdata *array2,int n);
Description
Each element of array1 will be added with each corresponding element of array2. The result is written to array1. The number of elements of both arrays is given by 'n'.
Mode of operation:
while (n-- > 0)
void _stdcall _paddbi(_mmxdata *array1,_mmxdata *imm,int n);
Description
Each element of array1 will be added with imm. The result is written to array1. The number of elements of array is given by 'n'.
Mode of operation:
while (n-- > 0)
Packed add word
void _stdcall _paddw(_mmxdata *array1,_mmxdata *array2,int n);
Description
Each element of array1 will be added with each corresponding element of array2. The result is written to array1. The number of elements of both arrays is given by 'n'.
Mode of operation:
while (n-- > 0)
void _stdcall _paddwi(_mmxdata *array1,_mmxdata *imm,int n);
Description
Each element of array1 will be added with imm. The result is written to array1. The number of elements of array is given by 'n'.
Mode of operation:
while (n-- > 0)
Packed add double word
void _stdcall _paddd(_mmxdata *array1,_mmxdata *array2,int n);
Description
Each element of array1 will be added with each corresponding element of array2. The result is written to array1. The number of elements of both arrays is given by 'n'.
Mode of operation:
while (n-- > 0)
void _stdcall _padddi(_mmxdata *array,_mmxdata *imm,int n);
Each element of array1 will be added with imm. The result is written to array1. The number of elements of array is given by 'n'.
Mode of operation:
while (n-- > 0)
Packed add byte with saturation
a) Signed variants
void _stdcall _paddsb(_mmxdata *array1,_mmxdata *array2,int n);
void _stdcall _paddsbi(_mmxdata *array1,_mmxdata *array2,int n);
b) Unsigned variant
void _stdcall _paddusb(_mmxdata *array1,_mmxdata *array2,int n);
void _stdcall _paddusbi(_mmxdata *array1,_mmxdata *array2,int n);
Description
Each element of array1 will be added with each corresponding element of array2. The result is written to array1. The number of elements of both arrays is given by 'n'.
For the signed operation, if the result of the add is saturated to 0x7f or to 0x80 in case of overflow/underflow respectively.
For the unsigned operation, the saturation values are 0xFF and 0x00 in case of overflow/underflow.
Packed add word with saturation
void _stdcall _paddsw(_mmxdata *array1,_mmxdata *array2,int n);
void _stdcall _paddswi(_mmxdata *array1,_mmxdata *imm,int n);
Description
Same operation as in paddsb above. The saturation values are 0x7FFF and 0x8000 for the signed operation, and 0xFFFF and 0x00 for signed / unsigned operations.
void _stdcall _pand(_mmxdata *array1,_mmxdata *array2,int n);
void _stdcall _pandi(_mmxdata *array1,_mmxdata *imm,int n);
The bitwise logical AND operation is done between each 64 bit element of the arrays. The result is written to the array1.
void _stdcall _pandn(_mmxdata *array1,_mmxdata *array2,int n);
void _stdcall _pandni(_mmxdata *array1,_mmxdata *imm,int n);
First a bitwise logical NOT on the 64 bits of each element is performed, inverting each bit of the source operand(array2). Then,the bitwise logical AND operation is done between each 64 bit element of the arrays. The result is written to the array1.
void _stdcall _replicatebyte(_mmxdata *dst,unsigned char c);
void _stdcall _replicateword(_mmxdata *dst,unsigned short w);
void _stdcall _replicatedword(_mmxdata *dst,unsigned int i);
This instructions replcate either a byte, a word or a double word into the mmx data pointed to by the 'dst' argument. Its use is essentially meant for comparisons.
The first 64 bits of the first argument will be filled with the given integer, either as bytes, words, or double words.
Example:
_replicatebyte(&mmxdata,' ');
Then, mmxdata will contain 8 spaces, and can be later used as an argument for comparison functions.
int _stdcall _reduceBooleanb(_mmxdata *map,int n);
int _stdcall _reduceCmpeqb(_mmxdata *map,_mmxdata *imm,int n);
int _stdcall _reduceGtb(_mmxdata *map,_mmxdata *imm,int n);
int _stdcall _reduceLtb(_mmxdata *map,_mmxdata *imm,int n);
This instructions add a boolean vector counting the non zero members and return a 32 bit integer with the result.
_reduceBooleanb, sums all true bytes (11111111) in a logical vector that is the result of a previous comparison.
_reduceCmpeqb makes a comparisons and then adds the hits
_reduceLtb and _reduceGtb test for Greater than or less than, and add up the 'true' bytes.
'True' bytes are those set to all ones (11111111b, or 0xFFH or 255 decimal) by a previous mmx logical operation.
Example:
If the mm data element space contains a set of 8 space bytes (32), the following will count the number of spaces in the character vector 'data':
_reduceCmpeqb(data,&space,len/8);
Introduction
INSTRUCTION SYNTAX
Description of the interface
Pack with signed saturation
Pack with unsigned saturation
Packed Add
Packed Add with saturation
Packed And.
Packed And. Not
|