Windows Platform Design Notes
Designing Hardware for the Microsoft Windows Family of Operating Systems
Microsoft DirectX VA: Video Acceleration API/DDI
Abstract: This document describes an Application Programming Interface (API) and a corresponding Device Driver Interface (DDI) for hardware acceleration of digital video decoding processing, with support of alpha blending for such purposes as DVD subpicture support. It provides an interface definition focused on support of MPEG-2 "main profile" video (formally ITU-T H.262 | ISO/IEC 13818-2), but is also intended to support other key video codecs (e.g., ITU-T Recommendations H.263 and H.261, and MPEG-1 and MPEG-4).
DirectX® VA Version - Oct. 2Jan ,
200 Gary
Sullivan ([email protected])
Disclaimer: The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.
Microsoft Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. The furnishing of this document does not give you any license to the patents, trademarks, copyrights, or other intellectual property rights except as expressly provided in any written license agreement from Microsoft Corporation.
Microsoft does not make any representation or warranty regarding specifications in this document or any product or item developed based on these specifications. Microsoft disclaims all express and implied warranties, including but not limited to the implied warranties or merchantability, fitness for a particular purpose and freedom from infringement. Without limiting the generality of the foregoing, Microsoft does not make any warranty of any kind that any item developed based on these specifications, or any portion of a specification, will not infringe any copyright, patent, trade secret or other intellectual property right of any person or entity in any country. It is your responsibility to seek licenses for such intellectual property rights where appropriate. Microsoft shall not be liable for any damages arising out of or in connection with the use of these specifications, including liability for lost profit, business interruption, or any other damages whatsoever. Some states do not allow the exclusion or limitation of liability or consequential or incidental damages; the above limitation may not apply to you.
DirectX, Microsoft, Win32, Windows, and Windows NT are trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries. Other product and company names mentioned herein may be the trademarks of their respective owners.
© 2000 - 2001 Microsoft Corporation. All rights reserved.
Contents
TOC \o "1-6" \h \z Overview
Summary
Purpose and Standards Addressed
Requirements, Performance Verification, and Schedule
Uncompressed Picture Formats
Accelerator Sequentiality Requirements
Software Decoder Sequentiality Requirements
Using Four Uncompressed Surfaces for Decoding
Using Five or More Uncompressed Surfaces for Decoding
Terminology
Abbreviations and Symbols
Bit Numbering Convention
Definitions
Stages
Frame Buffer Organization
Operation for Current Standards
ITU-T H.261
MPEG-1
MPEG-2 (a.k.a. H.262)
ITU-T H.263
MPEG-4
Prediction Principles
Stages of Prediction
Macroblock portions
Prediction Planes
DirectX VA Data Structures and Program Constructs
Encryption Support
Global Restricted Mode Information
Probing and Locking of Configurations
Buffer Description List
Compressed Picture Decoding (bDXVA_Func=1)
Connection Configuration
Minimal Interoperability Configuration Set
Additional Encouraged Configuration Set
Compressed Picture Parameters
Buffer Structure for Macroblocks of a Picture
Macroblock Control Commands
Inverse Quantization, Pre-IDCT Saturation, Mismatch Control, Intra DC Offset, IDCT, Picture Reconstruction, and Reconstruction Clipping
Off-Host IDCT
Single-Coefficient Data Format
Four-Grouped Coefficient Data Format
Host-Based IDCT
The 16-bit Method
The 8-8 Overflow Method
Deblocking Filter Control
Read-Back Command Buffers
Off-Host VLD Bitstream Decoding Operation
Inverse Quantization Matrix Buffers
Slice Control Buffers
Bitstream Data Buffer Contents
Alpha Blend Data Loading (bDXVA_Func=2)
Connection Configuration
Minimal Interoperability Configuration Set
Loading an AYUV Alpha Blending Surface
Loading a 16-entry AYUV palette
Loading an AYUV surface
Loading an IA44/AI44 Alpha Blending Surface
Loading a DPXD Alpha Blending Surface
Loading Highlight Data
Loading DCCMD Data
Alpha Blend Combination (bDXVA_Func=3)
Connection Configuration
Minimal Interoperability Configuration Set
Alpha Blend Combination Buffers
Example: MPEG-2 Pan-Scan Operation
Example: DVD 4:3 Pan-Scan Within 16:9 Pictures
Example: DVD 704-Wide Non-Pan-Scan Picture Operation
Example: DVD 352-Wide Picture Operation
Example: DVD 720-Wide Picture Operation
Example: DVD 16:9 Letterbox Height in 4:3 Picture Operation
Picture Resampling Control (bDXVA_Func=4)
Connection Configuration
Minimal Interoperability Configuration Set
Picture Resampling Control Buffers
Restricted Modes
Addition of New Restricted Profiles
Non-Restricted Operation
H261_A Restricted Profile
H261_B Restricted Profile
H263_A Restricted Profile
H263_B Restricted Profile
H263_C Restricted Profile
H263_D Restricted Profile
H263_E Restricted Profile
H263_F Restricted Profile
MPEG1_A Restricted Profile
MPEG2_A Restricted Profile
MPEG2_B Restricted Profile
MPEG2_C Restricted Profile
MPEG2_D Restricted Profile
IAMVideoAccelerator Operation
The IAMVideoAccelerator Interface Itself
Mapping DirectX VA to IAMVideoAccelerator
Restricted Mode Profile and Configuration Establishment
DirectX VA IAMVideoAccelerator Operational Specification
Operational Correspondence with Motion Compensation Device Driver
This document describes an Application Programming Interface (API) and a corresponding Device Driver Interface (DDI) for acceleration of digital video decoding processing, with support of alpha blending for such purposes as DVD subpicture support. It provides an interface definition focused on support of MPEG-2 "main profile" video (formally ITU-T H.262 | ISO/IEC 13818-2), but is also intended to support other key video codecs (e.g., ITU-T Recommendations H.263 and H.261, and MPEG-1 and MPEG-4). This design is limited to decoding support, but encoding features may be defined as the interface is further extended in the future. The interface is designed to extract the most basic computationally-intensive building-blocks of these various codec designs and support their acceleration in hardware.
Graphics hardware drivers should implement support for this interface to provide a generic form of access to the acceleration capabilities of their hardware implementations. Somewhat similar vendor-specific capabilities have been defined in the past by several companies for use with their graphics hardware. However, the intent of this specification is to establish a common interface to provide cross-vendor compatibility between software application programs and advanced graphics acceleration capabilities. The establishment of a common interface is expected to increase the capability of computing systems to support video, increase the demand for software applications that provide this capability, and increase the demand for high-performance graphics capabilities.
The interface defined in this specification is capable of supporting today's dominant standard motion-compensated video codecs, including the H.26x series from the ITU-T and the MPEG-x series from ISO/IEC JTC1. It is primarily designed to support the current most commercially-dominant video codec known as MPEG-2, more formally known as ITU-T Recommendation H.262 or ISO/IEC 13838-2.
The syntax specified herein is designed to provide very direct support of that standard with very low "translation" requirements for converting its syntax to the format defined herein. Additional features have been added as necessary to support the other standards in a maximally-consistent manner, and encryption is supported for applications which may require it. By extracting the core common basic operations of these various standards and confining some standard-specific processing to the host CPU, this interface enables support of video codec hardware acceleration with a high degree of cross-standard flexibility while requiring minimal standard-specific customization in the acceleration hardware.
This specification defines a language of video decoder acceleration which permits one or more stages to be divided among one or more devices. Currently, this document describes a division between a Host CPU and hardware Accelerator which executes the motion-compensated prediction (MCP) and/or inverse discrete-cosine transform (IDCT) stages of ITU-T H.261, MPEG-1 (ISO/IEC 11172-2), MPEG-2 (ISO/IEC 13818-2 | ITU-T Rec. H.262), ITU-T H.263, and a subset of MPEG-4 (ISO/IEC 14496-2). The overall goal is to offload simple (yet frequently-executed) operations wasteful of CPU resources onto a low-cost accelerator designed for this specific purpose, while allowing implementations to leave the complex, less frequently-executed operations such as bitstream parsing and variable-length decoding (VLD) on the host CPU.
The specification as described herein is for the decoding of a single video stream. No support is provided within the Microsoft® DirectX® VA API to explicitly support multiple video streams. Support of multiple video streams would therefore require a separate DirectX® VA session operation for each such video stream (e.g., a separate pair of output and input pins for the video decoder and acceleration driver for use in filter graph operation).
A verification of compliance with the contents of this specification will be required of any accelerator driver exposing the interface described herein. This will include:
o Verification that the operations carried out by hardware using this API will yield the intended mathematical results, and
o Verification that the performance obtained by a software decoder using this API will be of adequate speed and reliability to justify use of this API rather than host-software implementation of its functionality.
Until such verification is conducted, Microsoft considers the implementation to be strictly for testing purposes and not suitable for deployment with a driver that exposes the interface specified herein.
Support of this interface and performance verification for that support is expected to be required for certification of implementations for decoders and graphics accelerators in the intermediate-term future. The precise test suites and performance metrics are yet to be defined. The intermediate-term future requirements as specified herein are expected to be defined as requirements applicable in mid 2001 on all Microsoft Windows® operating systems (i.e., Windows 2000, Windows Millennium Edition, and Windows 2000 "Whistler") for any graphics acceleration hardware claiming support for hardware video decoding acceleration and for any software decoders claiming to use such graphics acceleration hardware support. Any longer-term future requirements as specified herein are expected to be defined as requirements applicable approximately one year after the intermediate-term requirements. The deadline dates of these expected requirements may be adjusted as events progress.
Anticipated encryption, configuration, and restricted mode support requirements are discussed below in the sections that describe the encryption support, configurations and restricted modes. Pixel format support requirements are discussed below in this section.
For applications to use the uncompressed decoded pictures produced through DirectX VA, these pictures must be produced in some known format. The list of uncompressed pixel formats supported by any DirectX VA accelerator shall contain at least one member of the following list of pixel formats [Reference: YUV Formats]:
For decoding compressed 4:2:2 video:
a. "YUY2", in which the data can be treated as an array of unsigned char in which the first byte contains the first sample of Y, the second byte contains the first sample of Cb, the third byte contains the second sample of Y, the fourth byte contains the first sample of Cr; etc. (such that if addressed as an array of two little-endian WORD type variables, the first WORD contains Y0 in the LSBs and Cb in the MSBs and the second WORD contains Y1 in the LSBs and Cr in the MSBs). This is the preferred DirectX VA 4:2:2 pixel format, and is expected to be an intermediate-term requirement for DirectX VA accelerators supporting 4:2:2 video.
b. "UYVY", which is the same as "YUY2", except for swapping of the byte order in each word (such that if addressed as an array of two little-endian WORD type variables, the first WORD contains Cb in the LSBs and Y0 in the MSBs and the second WORD contains Cr in the LSBs and Y1 in the MSBs).
For decoding compressed 4:2:0 video:
a. "YUY2" as above, except that two lines of output Cb and Cr samples are produced for each actual line of 4:2:0 Cb and Cr samples, the second of each pair of such two lines being either a duplicate of the first or an average of the first and third. The ability to "front-end" convert from whichever format is in use to this format is expected to be an intermediate-term requirement.
b. "UYVY" as above, except that two lines of output Cb and Cr samples are produced for each actual line of 4:2:0 Cb and Cr samples, the second of each pair of such two lines being a either a duplicate of the first or an average of the first and third.
c. "YV12", in which all Y samples are found first in memory as an array of unsigned char (possibly with a larger stride for memory alignment), followed immediately by all Cr samples (with half the stride of the Y lines, and half the number of lines), then followed immediately by all Cb samples in a similar fashion.
d. "IYUV", which is the same as YV12, except for swapping the order of the Cb and Cr planes.
e. "NV12" (not found in the referenced web page), in which all Y samples are found first in memory as an array of unsigned char with an even number of lines (possibly with a larger stride for memory alignment), followed immediately by an array of unsigned char containing interleaved Cb and Cr samples (such that if addressed as a little-endian WORD type, Cb would be in the LSBs and Cr would be in the MSBs) with the same total stride as the Y samples. This is the preferred 4:2:0 pixel format, and is expected to be an intermediate term requirement for DirectX VA accelerators supporting 4:2:0 video.
f. "NV21" (a neologism not found on the referenced web page), as in NV12 except that Cb and Cr samples are swapped so that the chroma array of unsigned char would have Cr followed by Cb for each sample (such that if addressed as a little-endian WORD type, Cr would be in the LSBs and Cb would be in the MSBs).
g. "IMC1" (a neologism not found on the referenced web page), as in YV12, except that the stride of the Cb and Cr planes is the same as the stride in the Y plane. The Cb and Cr planes are also restricted to fall on memory boundaries which are a multiple of 16 lines (a restriction which has no effect on usage for the standard formats, since the standards all use 16 16 macroblocks).
h. "IMC2" " (a neologism not found on the referenced web
page), as in IMC1, except that Cb and Cr lines are interleaved at half-stride
boundaries. In other words, each
full-stride line in the chrominance area starts with a line of CbCr,
followed by a line of CrCb starts
that starts at the next half-stride boundary. (This is a more address-space efficient
format than IMC1, cutting the chrominance address space in half, and thus
cutting the total address space by 25%.) This runs a close second in preference relative to NV12, but NV12
appears to be more popular.
i. "IMC3" (a neologism not found on the referenced web page), as in IMC1, except for swapping Cb and Cr.
j. "IMC4" (a neologism not found on the referenced web page), as in IMC2, except for swapping Cb and Cr.
Our primary goal in specifying these formats is to document the formats in use, to enable use of the decompressed video for subsequent processing which would manipulate the video as DirectDraw surface textures. Our secondary goal is to simplify software operation by cutting down the proliferation of different formats over time - eliminating the use of different formats that unnecessarily duplicate the same functionality.
The primary burden for ensuring that race conditions do not cause undesirable behavior in the operation of the interface is placed on the host software decoder rather than the hardware accelerator. The only requirement imposed herein on the accelerator in this regard is to be able to properly report when queried whether the display of an uncompressed surface is pending or in progress and to be able to properly report when queried whether requested operations have been completed.
Certain requirements are imposed herein to provide an assurance of proper operation of sequential operations in the decoding process. These concern the handling of uncompressed surfaces for decoding and display. The examples provided herein concern the handling of picture decoding for conventional I-, B-, and P-structured video frames (without using a deblocking filter). The same principles apply in an obvious way to other scenarios.
The guiding principles are simple: Don't write over what you need for referencing or display, and avoid race conditions. This principle can be decomposed into two rules:
No picture can be overwritten that has been submitted for display unless it has already been shown on the display and also removed from the display (to avoid tearing artifacts on the display), and
No picture can be overwritten that is needed as a reference for the creation of other pictures that have not yet been created.
This results a requirement that the software decoder must query the status of the accelerator to avoid race conditions, and also results in a requirement that the decoder must use a sufficient number of uncompressed picture surfaces to ensure that space is available for all necessary operations.
This results in a need for at least four uncompressed picture surfaces for I, B, P picture processing (more are generally encouraged, and more are necessary for some operations such as the use of front-end alpha blending).
The following two subsections describe the use of four or more uncompressed surfaces for video decoding with B pictures, concluding that at least four uncompressed surfaces are generally needed and five or more are recommended for applications without critical delay-minimization requirements. Using extra surfaces can greatly reduce the need to wait for operational dependencies to be resolved.
Note that for compressed buffers as well as uncompressed surfaces, it is generally better to cycle through using all of the available buffers that have been allocated rather than to keep re-using the same one or the same subset of the allocated buffers, possibly causing bottlenecks which require delays to be added to wait on unnecessary dependencies. In some cases the allocation of multiple buffers by a driver may be meant to indicate that cycling through use of these buffers for double or triple buffering is the proper way to operate without artifacts. (This applies to alpha blend data loading in particular.)
We show in Figure 1 a hypothetical case of a video decoder that requires one frame time to decode each picture and is decoding a bitstream which contains a steadily-increasing number of B pictures (starting from zero B pictures after an initial I picture) between pairs of P pictures. In this example, a letter is used to show the type of each picture (I, B, or P), a subscript is used to show the temporal display order of each picture, and a superscript is used to show which buffer is used for holding the picture. Each B picture requires two prior pictures in bitstream order in order to be decoded. As a consequence, the decoder cannot begin displaying pictures with their proper timing until after the second picture has been decoded - i.e., until during the third time slice of decoding. Somewhere during this time slice, the display of pictures with their proper timing can begin.
The initiation of the display of a picture does not in general perfectly coincide with that picture appearing on the display. Instead, the display may continue to show a picture that is prior to the one that has been commanded for display (until the proper time arrives to switch to the new picture). Thus, for optimal performance, surface 0 (which holds the first I picture) should not be overwritten for use by the B picture which arrives three frame times later, even though the I picture is not needed by that B picture for referencing. Instead, a fourth surface (surface 3) should ideally be used to hold that B picture. This avoids the need to check whether the display period of the first I picture has been completed before decoding the B picture.
Application of the two rules given above imply that the first three decoded pictures must be placed in different surfaces, because none of them has been displayed until some time during the third period (period 2). Then the fourth decoded picture should ideally be placed in a fourth surface because the display of the first displayed picture may not yet be over until some time during the fourth period (period 3).
A significant bottleneck in the process illustrated in Figure 1 occurs upon encountering the 10th decoded picture (B19) as a result of having more than two B pictures in a row. When the third or subsequent B picture in a contiguous series is encountered, the time lag tolerance between the display of one B picture and the use of a surface to hold the next decoded B picture is eliminated. In this situation, the host decoder must check the display status of the B picture displayed in the previous period (in this case, B17) to ensure that it has been removed from the display (waiting for this to happen if necessary), and then must immediately use the same surface for the next B picture to be decoded (in this case, surface 1 used for B19). It cannot decode the new B picture into either of the surfaces being used to hold its reference I or P pictures (in this case, surfaces 0 and 2 used for P06 and P210), and cannot decode the new B picture into the surface being displayed during the same interval of time (in this case, surface 3 used for B38). So it must use the surface that was displayed in the immediately preceding period (in this case, surface 1).
Decoding Process: |
|
|
|
|
|
|
Display Process: |
|
|
|
|
||
Frames Decoded: | ||||||
Decoding Process: |
|
|
|
|
|
|
Display Process: |
|
|
|
|
|
|
Frames Decoded: | ||||||
Decoding Process: |
|
|
|
|
|
|
Display Process: |
|
|
|
|
|
|
Frames Decoded: | ||||||
| ||||||
Decoding Process: |
|
|
|
|
| |
Display Process: |
|
|
|
|
| |
Frames Decoded: |
Figure 1 - Four-Frame Decoder Buffering
(superscript is buffer number, subscript is frame display index)
More than four buffers can be used - allowing the time lag between the start of the display of a buffer and new writes to that buffer to increase from a minimum of one display period to two or more. This can provide more of an allowance for jitter in the timing of the decoding process. This can also enable output processing on the decoded pictures to perform a three-field deinterlace operation as part of the display process (since not only the current picture would be available for display, but the prior picture would also be available and could be used to provide context and/or allow a one-field delay in the actual display process). It can also provide some immunity against badly-designed hardware drivers that may not be written with a proper awareness of whether a picture is actually still being displayed or not (although one should properly be able to depend on hardware drivers not to exhibit such behavior). Although four buffers is the minimum for effective use of DirectX VA with B pictures, using five or more buffers is encouraged - particularly in scenarios in which very low delay is not a requirement. DirectX VA decoders for I, B, P-structured video decoding are therefore expected to set their minimum and maximum requested uncompressed surface allocation counts to at least 4 and 5, respectively, when allocating uncompressed surfaces. Using one or more extra uncompressed surfaces may often be a good way to achieve smooth, reliable, tear-free operation.
Binary value (left bit is MSB)
" " Decimal value
0x Hexadecimal value
LSB Least Significant Bit
MSB Most Significant Bit
Division with rounding away from zero
Division with truncation toward zero
Bit numbering in this specification is done per Microsoft prevailing convention, which is to number the LSB as bit 0 and the MSB as bit N-1 for an N bit quantity.
NOTE: This numbering convention differs from the one used in some of the video coding standards, in which the lowest-numbered bit would be the MSB.
However, when a sequence of fields of less than 8 bits is shown in a structure in this document, the sequencing follows the convention used in the video coding standards, which is to say that elements listed first are considered to lie closer to the MSB. For example, if we have an array of five single-bit elements to place in sequence into a structure description, they will be listed in decreasing index order in the structure description.
The schizophrenia inherent in this approach means that the ultimate authority for exactly where each bit goes in this interface specification should be considered to be the ".h" file associated with this interface, not this document.
reference block - block area extracted from reference frame buffer.
prediction block - block filtered from reference block.
prediction plane - array of samples formed prior to combining of macroblock prediction. Each plane represents a set of prediction blocks, usually collected from one frame location. Planes are combined to form a single macroblock prediction.
Prediction macroblock macroblock prediction including all color channels
component - one of three color channels .
Host CPU - programmable processor which controls overall function of video decode (high level operations).
Accelerator - functional unit which executes simple but high rate operations such as IDCT, MCP, display format conversion.
motion vector arithmetic - operations which convert motion vectors to prediction block addresses.
prediction address - location of the prediction block within an implementation-specific design.
composite prediction block - prediction block whose attributes apply to both luminance and chrominance.
component prediction block - prediction block whose attributes apply to either luminance or chrominance.
intra - representation of picture content without prediction using any previously-decoded picture as a reference.
inter - representation of picture content by first encoding a prediction of an area of the picture using some previously-decoded picture and then optionally adding a signal representing the deviation from that prediction.
residual difference decoding - decoding of the waveform which represents the error signal which has been encoded to represent whatever signal remains after motion-compensated prediction as appropriate. This may entail simply an "intra" representation of a non-predicted waveform or an "inter" difference after prediction.
ReservedBits - Any field in this specification having the name ReservedBits as its name or part of its name is not presently used in this specification and shall have the value zero.
should - used to describe encouraged (but not required) behavior.
shall - used to describe required behavior.
The stages depicted in the figure below comprise the MCP and IDCT Accelerator.
Figure 2 -- Decoder Stages
All picture buffers are assumed to have frame-organized buffers as per MPEG-2 specification. Address Units shall be given in frame coordinates unless otherwise specified. It is possible to losslessly convert prediction blocks described in frame coordinates to field coordinates via an implementation-specific translation layer. For example, a single frame motion prediction can be broken into two separate, top and bottom macroblock-portion predictions.
The three video component channels are accessed by interfaces defined in this specification. Motion vectors for the two chrominance components are derived from those sent for luminance components. Conversion of these motion vectors to different coordinate systems is a responsibility of the Accelerator.
Figure 3 -- Host and Accelerator System
The basic operations of H.261, MPEG-1, MPEG-2 (H.262), H.263, and MPEG-4 are listed in chronological order below with a brief description of how they can be realized using this specification.
Formally titled "Video Codec for Audiovisual Services at px64 kbit/s," ITU-T Recommendation H.261 was the first digital video coding standard to deliver a practical design with relatively high compression (the first digital video coding standard, H.120, is largely forgotten today), and it contains the same basic design to be later used in the other video codec standards, using 8-bit samples with Y, Cb, and Cr components, 4:2:0 sampling, 16×16 "macroblock"-based motion compensation, 8×8 IDCT, zig-zag inverse scanning of coefficients, scalar quantization, and variable-length coding of coefficients based on a combination of zero-valued run-lengths and quantization index values. All H.261 prediction blocks use forward-only prediction from the previous picture. H.261 does not have half-sample accurate prediction filters, but instead uses a type of low-pass filter called the "loop filter" (Section 3.2.3 of the H.261 specification) which can be turned off or on during motion compensation prediction for each macroblock.
Annex D Graphics: H.261 later included an interesting trick mode called "Annex D Graphic Transfer." No special support is provided for this feature in this accelerator interface. This feature can be supported using this specification by reading four decoded pictures from the accelerator back onto the host and interleaving them there for display as a higher-resolution graphic picture.
MPEG-1 video, formally ISO/IEC 11172-2, was developed not long after H.261 and borrowed significantly from it. It has no loop filter, and instead has a simple half-sample filter which attempts to resolve sub-pixel movement between frames. Two additional prediction modes, bi-directional and backward prediction were added which require one additional reference frame to be buffered. The bi-directional prediction averages forward- and backward-predicted prediction blocks. The arithmetic for averaging forward and backward prediction blocks is similar to that for creating a half-sampled "interpolated" prediction block. The basic structure is otherwise the same as H.261.
MPEG-2, formally titled "Information Technology - Generic Coding of Moving Pictures and Associated Audio Information: Video," ITU-T Recommendation H.262 | ISO/IEC 13818-2, added only a basic 16×8 shape to the existing tools of MPEG-1 (from a very low layer perspective). From a slightly higher layer perspective, MPEG-2 added many additional ways to combine predictions referenced from multiple fields in an attempt to deal with interlaced video characteristics.
MPEG-2 frame structured pictures:
Frame MC: has a 16×16 prediction block shape, much the same as MPEG-1 predictions. This is the only progressive-style motion within MPEG-2 (as specified by the motion_type parameter in MPEG-2). There are either one prediction plane (forward-only or backward-only) or two (bi-directional) as determined by the macroblock_type MPEG-2 parameter. Reference blocks are formed from contiguous frame lines from the frame buffer. The frame buffer is selected by the semantics of the decoding process. Half-sample interpolation (MPEG-2 Section 7.6.4) and bi-directional interpolation (MPEG-2 Section 7.6.7.1) have identical averaging operations as in the MPEG-1 case.
Field (16×8) MC: Each plane (forward and backward directions) consists of a top 16×8 prediction block, and a bottom 16×8 prediction block. The reference block corresponding to each prediction block may be extracted from the top field or bottom field of a reference frame, as determined by the MPEG-2 parameter motion_vertical_field_select r s . There are either one set of two prediction blocks for one prediction plane (forward-only or backward-only, for which there would be a total of two prediction blocks per macroblock), or sets for two prediction planes (bi-directional, for which there would then be a total of four prediction blocks in a macroblock).
Dual Prime: like the Field MC case above, each plane (parity) consists of a top and bottom 16×8 shape. The same and opposite parity planes are combined together via the averaging operation identical to bi-directional interpolation in (1, 2, 4, 5). Unlike the other motion types (1, 2, 4, 5), a Dual Prime macroblock always consists of two sets of prediction blocks (same and opposite parity), for a total of four prediction blocks per macroblock.
MPEG-2 field structured pictures:
Field (16×16) MC: Resembles Frame MC in that each prediction has a 16 16 shape, however the reference block data is formed from sequential top or bottom lines only, not a mixture of alternating top and bottom lines (as in progressive motion). As with all motion types in field-structured pictures, the reconstructed macroblock is likewise stored in the current frame buffer as only sequential top or bottom lines. Top or bottom field destination is determined by the MPEG-2 variable picture_structure.
16×8 MC: Although the basic prediction block shapes of this type of motion are the same as for the other 16×8 shapes (2,3), it is not partitioned into the macroblock in the same manner. The two partitions correspond to upper and lower halves of a macroblock prediction plane, rather than top and bottom fields within a macroblock. (See Figure X). Particular attention should be paid to the fact that the anchor point for the lower 16×8 half is the upper left hand corner of the 16×8 lower portion, not the upper left hand corner of the whole macroblock as is the case with all other types of motion.
Dual Prime: At the lowest layer there is virtually no distinction between Dual Prime and the field structured field motion compensation with bi-directional prediction. The differences are manifested in the frame buffer selections from which reference blocks are formed. Dual Prime motion type in field structured pictures always consists of two 16×16 prediction blocks (same and opposite parity predictions).
Formally titled "Video Coding for Low Bit Rate Communication," ITU-T Recommendation H.263 is a more recent video coder with improved compression performance relative to H.261, MPEG-1, and MPEG-2. It contains a "baseline" mode of operation supporting only the most basic form of H.263. It also contains a large number of optional enhanced modes of operation for various purposes. (It originally had five optional modes in the first version approved in late 1995, but more were added in a second version technically completed in 1997, and a few more are being finalized in 2000.) Baseline H.263 prediction operates in this interface using a subset of the MPEG-1 features. The baseline mode contains no bi-directional prediction - only forward prediction.
Rounding Control: Several H.263 optional modes require rounding control. This is supported by the bRcontrol flag in this specification.
Motion vectors over picture boundaries: Several H.263 optional modes allow motion vectors that address locations outside the boundaries of a picture as defined in H.263 Annex D. This is supported in this specification by a bPicExtrapolation flag indicating whether the accelerator needs to support such motion. There are two basic ways of supporting such motion: 1) clipping the value of the address on each sample fetch to ensure that it stays within picture boundaries, and 2) padding the picture by duplicated samples to widen the actual memory area used by one macroblock width and height across each border of the picture. The accelerator can use either method, as the result is the same.
Bi-directional motion prediction: The bi-directional motion prediction used in some optional H.263 prediction operations uses a different rounding operator than MPEG-1 - using division with truncation rather than rounding. This is supported by the bBidirectionalAveragingMode flag in this specification.
Four-MV motion compensation (4MV): Although each macroblock in H.263 is 16×16 in size, some optional modes (e.g., Annexes F and J) of H.263 allow four motion vectors to be sent for a single macroblock, with one macroblock for each 8×8 luminance block within the macroblock. The corresponding 8×8 chrominance area uses a single derived motion vector.
Overlapped Block Motion Compensation (OBMC): H.263 Annex F contains Overlapped Block Motion Compensation (OBMC) for luminance samples in addition to 4MV support. OBMC prediction is supported in this specification by allowing sending twelve motion vectors for forward prediction of a macroblock.
OBMC prediction blocks can be realized in hardware with the tools given in this specification as a combination of predictions organized into three planes: a current plane ("0"), an upper/lower plane ("1"), and a left/right plane ("2"). The three planes can serve as temporary storage for the blocks q(x,y), r(x,y), and s(x,y) defined in H.263 Section F.3. After each of the three planes have been filled out for all four blocks, they would then be combined according to the formula in H.263 Section F.3 and weighted by their respective H matrices given in H.263 Figures F.2, F.3, and F.4.
As an example, an OBMC luminance macroblock prediction may be comprised of eight top/bottom prediction blocks of 8×4 shape, eight left/right blocks of 4x8 shape, and four current blocks of 8×8 shape. If all four of the plane 0 motion vectors have the same motion vector (i.e., when not in a 4MV macroblock), a single 16×16 macroblock prediction can be used to fill the entire 16×16 plane 0.
In terms of the representation of the OBMC process in this specification, ten motion vectors are sent for the macroblock. The first four are for the Y0, Y1, Y2, and Y3 blocks in the current macroblock, remote vectors for the left and right halves of the top of the macroblock are then sent, then remote vectors for the top and bottom halves of the left side of the macroblock, and finally the remote vectors for the top and bottom halves of the right side of the macroblock. (H.263 does not use distinct remote vectors for the left and right halves of the bottom of the macroblock - it instead reuses the vectors for the current macroblock.)
Figure 4: H.263 Registration of One 8×8 Block in the OBMC Prediction Planes
P-B frames (Annex G & M): In this mode, macroblocks for a P frame and a pseudo B-frame are multiplexed together into the unique "PB-frame" picture coding type. The B-frame portion of each macroblock borrows from information encoded for the P-frame portion of the macroblock: the B-portion forward and backward motion vectors are scaled from the P-portion vector, and the reconstructed P-portion macroblock serves as backward reference for the B-portion. The PB includes only a pseudo B-frame since the backward macroblock can only refer to the reconstructed P macroblock contained within the same PB macroblock. However, as with traditional B-frame semantics, the B macroblock within a PB-frame can refer to any location within the forward reference frame.
This limitation of the backward reference creates smaller size backward prediction block shapes as depicted in H.263 Figure G.2. This specification supports PB frames by labeling them as a distinct macroblock type with two motion vectors.
Deblocking Filter (Annex J): Special commands are defined in this specification to accelerate deblocking filters, whether used in the loop as with Annex J or used outside of the loop as would be the case for deblocking H.261 pictures or H.263 baseline pictures. The Host CPU shall create deblocking commands which observe GOB or slice segments if necessary.
Reference Picture Selection (Annexes N and U): Multiple reference frames are supported by the Accelerator using the picture index selection field of each prediction block.
Scalability (Annex O): Temporal, SNR, and Spatial scalability are supported by the tools of this specification. H.263 Spatial scalability B frames are very similar in this interface to MPEG-1 B-frames. Spatial scalability requires upsampling the lower-layer reference picture and then using the upsampled picture as the reference (in all other aspects it is essentially the same as SNR and Temporal scalability). The appropriate bi-directional averaging rounding control should be set to truncation for H.263 (MPEG-1 and MPEG-2 use upward-biased averaging, and H.263 uses downward truncation averaging).
Reference Picture Resampling (Annex P): The simple form of this annex is supported by reference buffer resampling. For fancier forms of Annex P resampling, the reconstructed frames which serve as reference frames must be resampled by external means and stored as reference frame buffers addressable by the Accelerator.
Reduced-Resolution Update mode (Annex Q): The H.263 Reduced-Resolution Update mode is not currently supported by this specification, as it has unusual residual upsampling requirements, a different form of deblocking filter, and a different form of Advanced Prediction OBMC. However, its operation with the Deblocking Filter mode active and the Advanced Prediction mode inactive could perhaps be supported in this interface with host-based IDCT processing.
Independent Segment Decoding (Annex R): There is no low-level or accelerator awareness of independent segment borders. Some forms of Annex R can be supported without any special handling (e.g., baseline plus Annex R is trivial). Forms of Annex R which require picture segment extrapolation can be supported by decoding each segment as a picture and then constructing the complete output picture from these smaller pictures.
Other H.263 Optional Features: Other H.263 optional features can be supported without any impact on the accelerator interface. For example, Annexes I, K, S, and T can be handled easily by altering the host side of the processing without any impact on the accelerator.
Specific IDCT: This interface allows specification of support of a specific IDCT which is expected to be added as a new H.263 optional feature in the near future.
MPEG-4 was based heavily on H.263 for progressive-scan coding and MPEG-2 for support of interlace and color sampling formats other than 4:2:0. The features in this specification which support H.263 and MPEG-2 can be used to support MPEG-4. A parameter BPP is provided to support more than eight bits per pixel. The features most unique to MPEG-4 such as shape coding, object orientation, face modeling, mesh objects, sprites, etc. are not supported in the current interface, although they may be added in some future variation on the theme.
Block motion-compensated prediction (MCP) is the basic tool which gives MPEG and the H.26x family of codecs a gain over pure still-frame coding methods such as JPEG. Generic instances of prediction exist at many stages of the codec, but MCP is the most processing intensive. Motion vectors, DCT coefficients, and other elements not directly part of the MCP process also employ prediction to make the transmitted form of those elements more compact. These instances of prediction are not covered by this specification, and are considered to be executed on the Host CPU processor or bitstream parser/variable-length-decoding unit.
In a generic prediction coding scheme, previously transmitted and decoded elements serve as the prediction for current elements, and differences between the prediction and the actual current element values are sent as prediction error. The transmitted difference information updates the prediction to the correct value, or in the case of MCP, to a value which is a close enough approximation to a desired value. Previously decoded frames are used to predict or "guess" what future frames should look like. The difference data (known prediction error) then corrects the guess, attempting to bring the combined prediction + prediction error (reconstructed) image as close to the original (prior to compression) future frame as possible.
The reconstructed current elements are in turn stored to serve as prediction for future elements. This recursive loop is occasionally broken by various types of resets specific to the element being predicted. The resets are described by the semantics of the decoding process. For example, motion vectors and DC coefficient predictions are reset at slices, while the whole temporal frame prediction chain is reset by an intra refresh frame. The temporal basic prediction loop fits this description, but works with modified data taken from whole frames which serve as prediction. Data from the reference frames which serve as prediction are modified in order to account for movement or other types of change which occur over time.
Figure 5 -- Generic MCP Signal Flow
For purposes of this specification, the formation of a macroblock prediction (via MCP) shall be described as a series of discrete phases. This realization will help explain why this specification models the MCP process in the way that it does.
Figure 6 -- Signal Flow of MoComp3 Prediction Blocks
Stage 1. Form reference frame.
NOTE: This may include resampling the input picture as necessary.
Figure 7 -- Evolution of Prediction Data within MCP
Stage 2. Reference block
A reference block is not necessarily the same as a prediction block. It most likely consists of extra samples which are needed in the prediction filtering stages. The reference block is not defined in this specification since it is likely to have properties which reflect implementation-specific means of maintaining picture buffers. Unless half-sample filtering is executed in the memory unit, the reference block for a 16×16 half-sample filtered macroblock will have a 17x17 shape. The size of the reference block is both a function of the prediction block dimensions and filter attributes of the prediction block. In this specification, a reference block shall refer to a block of data extracted from a reference frame buffer for use in motion compensated prediction (MCP).
Stage 3. Prediction block
The outcome of a filtered reference block.
Stage 4. Combined macroblock prediction
The outcome of an averaging process between one or more prediction blocks.
Macroblocks are broken into regular segments in an attempt compartmentalize areas with different characteristics.
Figure 8 -- Basic Macroblock Portioning Schemes
In the MPEG-2 case, the top and bottom portion of a macroblock represent lines from two different fields captured at different instances in time, as much as 1/50-th of a second apart. Thus the top and bottom portion could have totally non-correlated content if significant movement has taken place between the two fields for the frame area covered by the macroblock. An additional 16×8 scheme is added in field-structured pictures to provide finer vertical granularity of prediction.
Figure 9 - The Two MPEG-2 Macroblock 16×8 Portions
Partitioning of macroblocks also provides finer granularity of prediction, to better accommodate edges and smaller objects with divergent motion velocities. The prediction block itself is a crude approximation of shape, and represents a kind of average motion vector for all samples which belong to the portion of the macroblock which the prediction block represents. Ideally, each sample (or subsample) would have its own motion vector, but this would consume considerable bits and extra overhead in processing. The prediction block remains a reasonable approximation.
Prediction blocks contribute to only one portion of a macroblock. A whole prediction block covers the 16×16 area of a macroblock. This is the case with all H.261 and MPEG-1 predictions. MPEG-2 introduced the 16×8 prediction shape to address the dual field/frame nature of macroblocks. The 16×8 shape was also borrowed for use in MPEG-2 field-structured pictures to create a finer-granularity of prediction. The 8x8 shape is deployed in H.263 (Advanced Prediction), H.264, and MPEG-4. The finest granularity to date is the draft of the future "H.26L"'s 4x4 prediction block shape.
Chrominance prediction blocks usually have half the size in both horizontal and vertical directions as their corresponding luminance prediction blocks, and share the same vector used for luminance since the standards addressed by this specification always model motion as component planar co-located. Chrominance vectors are therefore scaled from luminance vectors by some means, to account for the difference in the respective luminance and chrominance sample dimensions. MPEG-2's 16×8 luminance prediction blocks have corresponding 8x4 chrominance shapes. Exceptions are often made when the luminance prediction block becomes too small. For example, in the H.263 Advanced Prediction mode, the chrominance prediction block remains 8x8 in shape, and the chrominance motion vector is derived from a scaled average of the four 8x8 luminance motion vectors. In most contexts of this document, we shall only refer to the luminance dimensions of a prediction block, plane, or macroblock - the chrominance dimensions shall be inferred unless explicitly stated.
Figure 10 MPEG-2 Macroblock Prediction Planes
The above figure illustrates the conceptual macroblock prediction planes which exist prior to forming the final macroblock prediction. MPEG-2 has two planes: forward & backward (bi-directional prediction), or same-parity & opposite-parity (Dual Prime). In the MPEG-1 and MPEG-2 case, planes are combined together via the simple averaging operation. More sophisticated prediction schemes such as H.263's Overlapped Block Motion Compensated Prediction have three planes. Future coding methods may employ up to eight or more prediction planes which can be combined with more sophisticated filters.
The security needs of some applications may require encryption of some of the data used in video decoding. To support such applications, this specification allows for encryption of DirectX VA data to be applied to three of the data structures defined for its operation:
o The macroblock control command data structures,
o The residual difference block data structures, and
o Bitstream buffer contents
However, encryption is an optional aspect of DirectX VA operation for the intermediate-term future. All accelerators shall be capable of operation without encryption in use. Some encryption protocol is expected to be made a requirement in the longer-term future.
Good obfuscation of host-based software and data for processing clear-text data and performing the encryption process is essential to protecting the content. Such protection is equally important for the accelerator-side decryption process (although the decryption may be a hardware-based operation).
In order for the host decoder software to be able to use encryption, it must determine what types of encryption are supported by the accelerator. The basic information regarding what types of encryption are supported by the accelerator is contained in a list of encryption-type GUIDs supplied to the host as video accelerator type GUIDs. (The "no encryption" GUID DXVA_NoEncrypt shall not be sent in this list, as support for it is required and therefore implicit.)
The host chooses the type of encryption protocol to apply and indicates this choice to the accelerator via GUIDs as described in Section 3.2. In a typical encryption scenario, two more steps need to take place before encrypted data can be successfully transferred:
The host decoder may require verification that the accelerator is authorized to receive the data. This may be done by having the accelerator pass a signed data structure to the host to prove that it holds an authorized public key, private key pair.
The host decoder would then send an encrypted "content key" to the accelerator.
The precise number of steps for initializing the encryption protocol may depend on the particular type of encryption to be used. Each data set exchanged between the host and accelerator for the initialization of the encryption process shall be prefixed by the encryption protocol type GUID, to distinguish data for one type of encryption from data for another (in the event of the use of one type of encryption for one type of DirectX VA buffer, and some other type of encryption for another type of DirectX VA buffer).
The encryption data sets are exchanged with an indication that the operation being performed is an encryption protocol, as specified by:
DXVA_EncryptProtocolFunc |
// 24b 8b |
EncryptProtocolFlag: This is a 24-bit indication that the operation being performed is an encryption protocol transaction. EncryptProtocolFlag shall be 0xFFFF00 when sent by the host software decoder and shall be 0xFFFF08 when sent by the accelerator.
bDXVA_Func: This is an indication of the bDXVA_Func DirectX VA function to which the encryption protocol applies. As currently defined herein, the only value of bDXVA_Func that requires encryption support is "1".
The actual data set that is passed between the host and accelerator shall contain the following header information:
DXVA_EncryptProtocolHeader |
// 32b // 3 * 32b alignment // 128b |
dwFunction: Contains a DXVA_EncryptProtocolFunction indicating that the contents pertain to the encryption protocol.
guidEncryptProtocol: Contains the GUID associated with the encryption protocol.
In order for a decoder to operate using this API, an understanding must be reached between the decoder and the video accelerator for two distinct aspects of operation:
What type of video data format is to be decoded. This is codified herein using the DXVA_ConnectMode data structure.
How the API will be configured to operate, as to what intermediate data formats are used and which aspects of the processing are to reside on the host and which on the accelerator. This established by the negotiation of a connection configuration for each DXVA function to be used.
The DirectX VA global connection parameter is the connection restricted mode:
DXVA_ConnectMode |
// 128b // 16b |
guidMode: A GUID associated with the restricted mode profile to be used.
wRestrictedMode: The numeric identifier of the connection restricted mode (see Section 4).
The process of establishing the configuration for each DirectX VA function (a specific value of bDXVA_Func) that needs configuration is performed by:
Optionally probing to determine whether a configuration is accepted by the accelerator, and
Locking in a specific configuration if it is supported.
Probing for support of a specific configuration is performed by sending a probing command to the accelerator for the particular bDXVA_Func to be probed with a configuration. Along with the probing command is sent a configuration data structure (specific to the DXVA_Func value) which describes the configuration being probed to determine support. The accelerator then returns a S_OK or S_FALSE indication for whether the specified configuration is supported by the accelerator or not. The accelerator may also return a suggested alternative configuration.
Locking in a specific configuration is performed by sending a locking command to the accelerator for the particular bDXVA_Func to be locked into a specific configuration. Along with the locking command is sent a configuration data structure (specific to the bDXVA_Func value) which describes the configuration to be locked in if supported. The accelerator then returns a S_OK or S_FALSE indication for whether the specified configuration is supported by the accelerator. If the indication is S_OK, the specified configuration is locked in for use. If the indication is S_FALSE, a suggested alternative configuration is returned.
The decoder may send a locking command without first sending a probing command for the specified configuration. If the accelerator has returned a S_OK in a probing command for a specific configuration, it shall return a S_OK to a locking command for that same configuration, unless otherwise noted herein.
Once a locking command has been sent and acknowledged for use with a returned Yes indication, the specified configuration is locked in, and no further probing or locking commands shall be sent by the decoder for the same value of bDXVA_Func.
In order to ensure that all DirectX VA software decoders shall be capable of operation with all DirectX VA accelerators, a minimal interoperability configuration set, is defined as a set of configurations that must all be supported by any software decoder that wishes to use a particular bDXVA_Func. At least one member of that minimal interoperability configuration set shall be supported by every accelerator that indicates support for the bDXVA_Func by exposing an associated video accelerator GUID. In some cases an additional encouraged configuration set may also be defined.
Probing and locking are controlled by DXVA_ConfigQueryOrReplyFunc, which is passed as the first member of the data structure sent in a probing or locking command.
DXVA_ConfigQueryOrReplyFunc |
// 24b 8b |
QueryOrReplyFlag: The type of query or response indicated, defined as follows:
0xFFFFF1 : Sent by the software decoder as a probing command,
0xFFFFF5 : Sent by the software decoder as a locking command,
0xFFFFF8 : Sent by the accelerator with a S_OK response to a probing command with a copy of the probed configuration,
0xFFFFF9 : Sent by the accelerator with a S_OK response to a probing command, but also suggesting an alternative configuration,
0xFFFFFC : Sent by the accelerator with a S_OK response to a locking command with a copy of the locked configuration,
0xFFFFFB : Sent by the accelerator with a S_FALSE response to a probing command with a suggested alternative configuration, or
0xFFFFFF : Sent by the accelerator with a S_FALSE response to a locking command with a suggested alternative configuration.
The least significant four bits of QueryOrReplyFlag can be interpreted as follows:
Bit 3: Sent by host decoder = "0", by accelerator = "1",
Bit 2: Associated with a probe = "0", with a lock = "1",
Bit 1: Success = "0", failure = "1", and
Bit 0: Duplicate configuration structure = "0", new structure = "1".
bDXVA_Func: The bDXVA_Func function to which the accompanying configuration applies.
DirectX VA primarily operates by passing buffers of data from the host to the accelerator. The DXVA function parameter (bDXVA_Func) determines what types of buffers may be passed and what specific DXVA task is to be performed. A variety of enumerated buffer types are defined herein, specifically the types are:
Picture decoding parameter buffers
Macroblock control command buffers (closely associated with and having a 1:1 correspondence with residual difference block data buffers)
Residual difference block data buffers
Deblocking Filter control command buffers (with or without a restriction on the effect of the filter)
Inverse quantization matrix buffers (only used with off-host VLD processing)
Slice control buffers (closely associated with and having a 1:1 correspondence with bitstream data buffers)
Bitstream data buffers
AYUV alpha blending sample buffers
IA44/AI44 alpha blending surface buffers
DPXD alpha blending surface buffers
Highlight data buffers
DCCMD data buffers
Alpha blend combination buffers
Picture resampling control buffers
Read-back command buffers containing commands to read macroblocks of the resulting picture back to the host.
When a set of buffers is passed from the host to the accelerator, some additional data about these buffers is also passed. This additional data is termed buffer description list. A buffer description list consists of an ordered list, starting from the first buffer of the first type as enumerated above, proceeding to the next buffer of the same type, and so on, then proceeding to the first buffer of the next type, etc. Each member of the list consists of a data structure defined as follows:
DXVA_BufferDescription |
// 32b // 32b // 32b // 32b // 32b // 32b // 32b // 32b // 32b // 32b |
dwTypeIndex: The type number of the relevant buffer from the list above in this section.
dwBufferIndex: The sequence number of the buffer within the buffers of the same type passed in the same buffer description list.
dwDataOffset: The offset of the relevant data from the beginning of the buffer (in bytes). Shall be zero.
dwDataSize: The amount of relevant data in the buffer (in bytes). Thus the location of the last byte of content in the buffer is given by dwDataOffset + dwDataSize - 1.
dwFirstMBaddress: The macroblock address of the first macroblock in the buffer in raster scan order (0 being the address of the top left macroblock, PicWidthInMBminus1 being the address of the top right macroblock, and PicHeightInMBminus1 * PicWidthInMB being the address of the bottom left macroblock, and PicHeightInMBminus1 * PicWidthInMB + PicWidthInMBminus1 being the address of the bottom right macroblock). Shall be zero if the data buffer is among the following types: picture decoding parameters, inverse quantization matrix, slice control, bitstream data, AYUV, IA44/AI44, DPXD, Highlight, or DCCMD. If the data buffer is a residual difference block data buffer, dwFirstMBaddress shall have the same value as for the corresponding macroblock control command buffer.
dwNumMBsInBuffer: The number of macroblocks of data in the buffer. This count includes skipped macroblocks. Shall be zero if the data buffer is among the following types: picture decoding parameters, inverse quantization matrix, AYUV, IA44/AI44, DPXD, Highlight, or DCCMD. If the data buffer is a macroblock control command buffer, dwNumMBsInBuffer shall be equal to the total of the values of MBskipsFollowing plus the total number of macroblock control commands in the macroblock control command buffer. If the data buffer is a residual difference block data buffer, dwNumMBsInBuffer shall have the same value as for the corresponding macroblock control command buffer. If the data buffer is a slice control command buffer, dwNumMBsInBuffer shall be equal to the total of the values of wNumberMBsInSlice in the slice control buffer. If the data buffer is a bitstream data buffer, dwNumMBsInBuffer shall have the same value as for the corresponding slice control command buffer.
dwWidth, dwHeight, dwStride: The width, height, and stride of the data in the buffer. Shall be zero unless the data buffer is among the following types: IA44/AI44, or DPXD. For the applicable buffer types, dwStride is determined from the buffer allocation setup performed by the accelerator.
NOTE: The decoder must take care to honor the stride specified for IA44/AI44 or DPXD.
The first function type defined for DirectX VA operation is the decoding of some or all of a compressed picture.
The DirectX VA connection configuration data structure for compressed picture decoding shall be defined as:
DXVA_ConfigPictureDecode |
// 32b // 3 * 32b alignment // 128b // 128b // 128b // 8b // 8b // 8b // 8b // 8b // 8b // 8b // 8b // 8b // 8b // 8b // 8b |
dwFunction: Contains a DXVA_ConfigQueryOrReplyFunc describing the configuration data structure.
guidConfigBitstreamEncryption: Indicates a GUID associated with the encryption protocol type for bitstream data buffers. The value DXVA_NoEncrypt (a GUID name defined in the associated header file) indicates that encryption is not applied. Shall be DXVA_NoEncrypt if bConfigBitstreamRaw is "0".
guidConfigMBcontrolEncryption: Indicates a GUID associated with the encryption protocol type for macroblock control data buffers. The value DXVA_NoEncrypt (a GUID name defined in the associated header file) indicates that encryption is not applied. Shall be DXVA_NoEncrypt if bConfigBitstreamRaw is "1".
guidConfigResidDiffEncryption: Indicates a GUID associated with the encryption protocol type for residual difference decoding data buffers (buffers containing spatial-domain data or sets of transform-domain coefficients for accelerator-based IDCT). The value DXVA_NoEncrypt (a GUID name defined in the associated header file) indicates that encryption is not applied. Shall be DXVA_NoEncrypt if bConfigBitstreamRaw is "1".
bConfigBitstreamRaw: A value of "1" specifies that the data for the pictures will be sent in bitstream buffers as raw bitstream content, and a value of "0" specifies that picture data will be sent using macroblock control command buffers. Shall be "0" if bConfigResidDiffHost is "1" or if bConfigResidDiffAccelerator is "1". An intermediate-term requirement is to support "0". Additional support of "1" is desired.
bConfigMBcontrolRasterOrder: A value of "1" specifies that the macroblock control commands within each macroblock control command buffer shall be in raster-scan order, and a value of "0" indicates arbitrary order. For some types of bitstreams, forcing raster order will either greatly increase the number of required macroblock control buffers that must be processed or will require host reordering of the control information. Support of arbitrary order can thus be advantageous for the decoding process. For example, H.261 CIF-resolution decoding can require 36 macroblock control buffers per picture if raster-scan order is necessary within each buffer (H.263 Annex K's arbitrary slice ordering and rectangular slice modes have similar repercussions.) An intermediate-term requirement is to support "0". Support of "1" is allowed in the near term, but is considered a less preferred lower level of capability.
bConfigResidDiffHost: A value of "1" specifies that some residual difference decoding data may be sent as blocks in the spatial domain from the host, and a value of "0" specifies that spatial domain data will not be sent. Shall be "0" if bConfigBitstreamRaw is "1". An intermediate-term requirement is to support "1", which is the preferred value.
bConfigSpatialResid8: Indicates the word size used for representing residual difference spatial-domain blocks for predicted (i.e., non-intra) pictures when using host-based residual difference decoding (i.e. when bConfigResidDiffHost is equal to "1").
If bConfigSpatialResid8 is "1" and bConfigResidDiffHost is "1", this indicates that the host will send residual difference spatial-domain blocks for non-intra macroblocks using 8 bit signed samples and for intra macroblocks in predicted (i.e. non-intra) pictures in a format depending on bConfigIntraResidUnsigned as follows:
If bConfigIntraResidUnsigned is "0", spatial-domain blocks for intra macroblocks are sent as 8 bit signed integer values relative to a constant reference value of 2(BPP-1), and
If bConfigIntraResidUnsigned is "1", spatial-domain blocks for intra macroblocks are sent as 8 bit unsigned integer values relative to a constant reference value of 0.
If bConfigSpatialResid8 is "0" and bConfigResidDiffHost is "1", this indicates that the host will send residual difference spatial-domain blocks of data for non-intra macroblocks using 16 bit signed samples and for intra macroblocks in predicted (i.e. non-intra) pictures in a format depending on bConfigIntraResidUnsigned as follows:
If bConfigIntraResidUnsigned is "0", spatial domain blocks for intra macroblocks are sent as 16 bit signed integer values relative to a constant reference value of 2(BPP-1), and
If bConfigIntraResidUnsigned is "1", spatial domain blocks for intra macroblocks are sent as 16 bit unsigned integer values relative to a constant reference value of 0.
bConfigSpatialResid8 shall be "0" if bConfigResidDiffHost is "0". This specification does not contain a preference for one particular value of bConfigSpatialResid8 when bConfigResidDiffHost is "1".
NOTE: For intra pictures, spatial-domain blocks shall always be sent using 8-bit samples if BPP is "8" and using 16-bit samples if BPP > 8. If bConfigIntraResidUnsigned is "0", these samples are sent as signed integer values relative to a constant reference value of 2(BPP-1), and if bConfigIntraResidUnsigned is "1", these samples are sent as unsigned integer values relative to a constant reference value of 0.
bConfigResid8Subtraction: A value of "1" when bConfigSpatialResid8 is "1" indicates that 8-bit difference overflow blocks are subtracted rather than added. Shall be "0" unless bConfigSpatialResid8 is "1". If "1", this indicates that any overflow blocks will be subtracted rather than added. An intermediate-term requirement is to support "1" if bConfigSpatialResid8 is "1". This ability to subtract differences rather than add them allows 8-bit difference decoding to be fully compliant with the full 255 range of values required in video decoder specifications, since +255 cannot be represented as the addition of two signed 8-bit numbers but any number in the range 255 can be represented as the difference between two signed 8-bit numbers (+255 = +127 minus -128).
bConfigSpatialHost8or9Clipping: A value of "1" with bConfigSpatialResid8 equal to "0" and bConfigResidDiffHost equal to "1" indicates that spatial-domain blocks for intra macroblocks shall be clipped to an 8-bit range on the host and that spatial-domain blocks for non-intra macroblocks shall be clipped to a 9-bit range on the host, and a value of "0" indicates that no such clipping is necessary by the host. Shall be "0" unless bConfigSpatialResid8 is equal to "0" and bConfigResidDiffHost is equal to "1". An intermediate-term requirement is to support "0". Nearer-term support of "1" is allowed but less preferred, and is considered a lower level of accelerator capability.
bConfigSpatialResidInterleaved: A value of "1" when bConfigResidDiffHost is "1" and the YUV format is "NV12" or "NV21" indicates that any spatial-domain residual difference data shall be sent in a chrominance-interleaved form matching the YUV format chrominance interleaving pattern. Shall be "0" unless bConfigResidDiffHost is "1" and the YUV format is "NV12" or "NV21". An intermediate-term requirement is to support "0". Nearer-term support of "1" is allowed but less preferred, and is considered a lower level of accelerator capability.
bConfigIntraResidUnsigned: Indicates the method of representation of spatial-domain blocks of residual difference data for intra blocks when using host-based difference decoding (i.e. when bConfigResidDiffHost is equal to "1").
If bConfigIntraResidUnsigned is equal to "0" with bConfigResidDiffHost equal to "1", this indicates that spatial-domain residual difference data blocks for intra macroblocks shall be sent as follows:
If bConfigIntraResidUnsigned is equal to "1" with bConfigResidDiffHost equal to "1", this indicates that spatial-domain residual difference data blocks for intra macroblocks shall be sent as follows:
bConfigIntraResidUnsigned shall be "0" unless bConfigResidDiffHost is "1". Nearer-term support of bConfigIntraResidUnsigned equal to "1" is allowed but less preferred than bConfigIntraResidUnsigned equal to "0", and is considered a lower level of accelerator capability.
bConfigResidDiffAccelerator: A value of "1" indicates that transform-domain blocks of coefficient data may be sent from the host for accelerator-based IDCT, and a value of "0" specifies that accelerator-based IDCT will not be used. If both bConfigResidDiffHost and bConfigResidDiffAccelerator are "1", this indicates that some residual difference decoding will be done on the host and some on the accelerator, as indicated by macroblock-level control commands. Shall be "0" if bConfigBitstreamRaw is "1". Support for bConfigResidDiffAccelerator equal to "1" is desired, but there is not expected to be an intermediate-term requirement for this support. Support for bConfigResidDiffAccelerator equal to "1" with bConfigResidDiffHost also equal to "1" indicates that the residual difference decoding can be shared between the host and accelerator on a macroblock basis, and is considered an even higher level of accelerator capability than bConfigResidDiffAccelerator equal to "1" with bConfigResidDiffHost equal to "0".
bConfigHostInverseScan: A value of "1" indicates that the inverse scan for transform-domain block processing will be performed on the host, and absolute indices will be sent instead for any transform coefficients, and a value of "0" indicates that inverse scan will be performed on the accelerator. Shall be "0" if bConfigResidDiffAccelerator is "0". Shall be "1" if bConfig4GroupedCoefs is "0". An intermediate-term expected requirement is to support "1" if bConfigResidDiffAccelerator is "1". Nearer-term support of "0" with bConfig4GroupedCoefs equal to "1" is allowed but less preferred (due to its lack of flexibility to support arbitrary scan patterns and our perception that host inverse scan adds little burden to the host software decoder processing), and is considered a lower level of accelerator capability.
bConfigSpecificIDCT: A value of "1" indicates use of the IDCT specified in
Annex W of ITU-T Recommendation H.263, and a value of "0" indicates that any
compliant IDCT can be used for off-host IDCT. (NOTE: The referenced draft annex does not yet have final
approval in the ITU-T. If a change takes
place in the ITU, this bit will refer to the final version, not the earlier
draft versions. It
should also be noted that the The referenced
draft H.263 annex does
not comply with the IDCT requirements of MPEG-2 corrigendum 2 and thus
bConfigSpecificIDCT shall not be "1" for use with MPEG-2 video.) Shall be zero if bConfigResidDiffAccelerator
is "0" (indicating purely host-based residual difference decoding). An intermediate-term expected requirement is
to support "0" if bConfigResidDiffAccelerator is "1". Additional support of "1" is desired. Additional values may be defined in the
future.
bConfig4GroupedCoefs: A value of "1" indicates that transform coefficients for off-host IDCT will be sent using the DXVA_TCoef4Group data structure rather than the DXVA_TCoefSingle data structure. Shall be "0" if bConfigResidDiffAccelerator is "0" or if bConfigHostInverseScan is "1". An intermediate-term expected requirement is to support "0" if bConfigResidDiffAccelerator is "1". Nearer-term support of "1" with bConfigHostInverseScan equal to "0" is allowed but less preferred, and is considered a lower level of accelerator capability.
It is a design requirement for all DirectX VA decoders to interoperate with all DirectX VA accelerators. This requires that every DirectX VA decoder be capable of operation with any member of a set of connection configurations, and that every DirectX VA accelerator be capable of operation with at least one connection configuration member of that set. That minimal interoperability set is defined as follows:
All members of this most basic set have the following aspects in common:
a) guidConfigBitstreamEncryption = DXVA_NoEncrypt
b) guidConfigMBcontrolEncryption = DXVA_NoEncrypt
c) guidConfigResidDiffEncryption = DXVA_NoEncrypt
d) bConfigBitstreamRaw = 0
e) bConfigHostInverseScan = 0
f) bConfigSpecificIDCT = 0
The first two members of this set (among which accelerators are encouraged to support the variation with bConfigSpatialHost8or9Clipping = 0) are defined by:
a) bConfigMBcontrolRasterOrder = 1
b) bConfigResidDiffHost = 1
c) bConfigSpatialResid8 = 0
d) bConfigResid8Subtraction = 0
e) bConfigSpatialHost8or9Clipping = 0 or 1
f) bConfigSpatialResidInterleaved = 0
g) bConfigIntraResidUnsigned = 0
h) bConfigResidDiffAccelerator = 0
The third member of the set (not particularly encouraged for accelerator implementation) is defined by:
a) bConfigMBcontrolRasterOrder = 1
b) bConfigResidDiffHost = 1
c) bConfigSpatialResid8 = 1
d) bConfigResid8Subtraction = 0
e) bConfigSpatialHost8or9Clipping = 0
f) bConfigSpatialResidInterleaved = 1
g) bConfigIntraResidUnsigned = 0
h) bConfigResidDiffAccelerator = 0
The fourth member of this set (not particularly encouraged for accelerator implementation) is defined by:
a) bConfigMBcontrolRasterOrder = 1
b) bConfigResidDiffHost = 1
c) bConfigSpatialResid8 = 0
d) bConfigResid8Subtraction = 0
e) bConfigSpatialHost8or9Clipping = 1
f) bConfigSpatialResidInterleaved = 0
g) bConfigIntraResidUnsigned = 1
h) bConfigResidDiffAccelerator = 0
Two additional members of this set (among which accelerators are encouraged to support the variation with bConfigResid8Subtraction = 1) are defined by:
a) bConfigMBcontrolRasterOrder = 1
b) bConfigResidDiffHost = 1
c) bConfigSpatialResid8 = 1
d) bConfigResid8Subtraction = 0 or 1
e) bConfigSpatialHost8or9Clipping = 0
f) bConfigSpatialResidInterleaved = 0
g) bConfigIntraResidUnsigned = 0
h) bConfigResidDiffAccelerator = 0
One additional member of this set is defined only for the MPEG2_C and MPEG2_D restricted mode profiles as defined in Section 4 and indicated in DXVA_ConnectMode. No other profiles include this configuration in their minimal interoperability set. This additional member is defined by:
a) bConfigMBcontrolRasterOrder = 1
b) bConfigResidDiffHost = 0
c) bConfigResidDiffAccelerator = 1
a) bConfigHostInverseScan = 0
b) bConfig4GroupedCoefs = 1
Some additional configurations are encouraged to be supported in software decoders, as we believe these configurations will exist in hardware shipping in large quantities and as we believe that these configurations can provide a significant performance benefit relative to those in the minimal configuration set. These include:
All members of this set have the following aspects in common:
a) guidConfigBitstreamEncryption = DXVA_NoEncrypt
b) guidConfigMBcontrolEncryption = DXVA_NoEncrypt
c) guidConfigResidDiffEncryption = DXVA_NoEncrypt
d) bConfigMBcontrolRasterOrder = 0
e) bConfigResidDiffHost = 0
f) bConfigSpatialResid8 = 0
g) bConfigResid8Subtraction = 0
h) bConfigSpatialHost8or9Clipping = 0
i) bConfigSpatialResidInterleaved = 0
j) bConfigSpecificIDCT = 0
For good support of off-host bitstream processing acceleration, the first member of this set is defined by:
a) bConfigBitstreamRaw = 1
b) bConfigResidDiffAccelerator = 0
c) bConfigHostInverseScan = 0
d) bConfig4GroupedCoefs = 0
For good support of off-host IDCT acceleration, the second member of this set is defined by:
a) bConfigBitstreamRaw = 0
b) bConfigResidDiffAccelerator = 1
c) bConfigHostInverseScan = 1
d) bConfig4GroupedCoefs = 0
For support of off-host IDCT as expected to be found in some implementations, the third member of this set (not particularly encouraged for accelerators relative to the second set member) is defined by:
a) bConfigBitstreamRaw = 0
b) bConfigResidDiffAccelerator = 1
c) bConfigHostInverseScan = 0
d) bConfig4GroupedCoefs = 1
Accelerators supporting the first of these encouraged configurations are strongly encouraged to also support the second of them, in order to provide flexibility in the manner in which their acceleration capabilities can be used.
The following variables shall be sent once per picture:
DXVA_PictureParameters |
// 16b // 16b // 16b // 16b // 16b // 16b 8b 8b 8b 8b 8b 8b 8b 8b 8b 8b 8b 8b 8b 8b 8b 8b // 8b 8b 8b 8b 8b 8b 8b 8b 8b 8b (alignment) // 16b // 16b 8b 8b |
wDecodedPictureIndex: Specifies destination frame buffer for the decoded macroblocks.
wDeblockedPictureIndex: Specifies destination frame buffer for the deblocked output picture when bPicDeblocked is "1". Has no meaning and shall be "0" if bPicDeblocked is "0". May be the same as wDecodedPictureIndex.
wForwardRefPictureIndex: Specifies the frame buffer index of the picture to be used as a reference picture for "forward prediction" of the current picture. Shall not be the same as wDecodedPictureIndex. Shall be 0xFFFF if bPicIntra is "1".
wBackwardRefPictureIndex: Specifies the frame buffer index of the picture to be used as a reference picture for "backward prediction" of the current picture. Shall not be the same as wDecodedPictureIndex if backward reference motion prediction is used. Shall be 0xFFFF if bPicBackwardPrediction is "0".
wPicWidthInMBminus1: Specifies the width of the current picture in units of macroblocks, minus 1. A derived term called PicWidthInMB is formed by adding one to PicWidthInMBminus1.
wPicHeightInMBminus1: Specifies the height of the current picture in units of macroblocks, minus 1. A derived term called PicHeightInMB is formed by adding one to PicHeightInMBminus1.
bMacroblockWidthMinus1: Specifies the destination luminance sample width of a macroblock. This is equal to "15" for MPEG-1, MPEG-2, H.263, and MPEG-4. A derived term called MacroblockWidth is formed by adding one to bMacroblockWidthMinus1.
bMacroblockHeightMinus1: Specifies the destination luminance sample height of a macroblock. This is equal to "15" for MPEG-1, MPEG-2, H.261, H.263, and MPEG-4. A derived term called MacroblockHeight is formed by adding one to bMacroblockHeightMinus1.
bBlockWidthMinus1: Specifies the block width of an residual difference block. This is equal to "7" for MPEG-1, MPEG-2, H.261, H.263, and MPEG-4. Shall be "7" if bConfig4GroupedCoefs is "1". Residual difference blocks within a macroblock are sent in the order specified as in MPEG-2 Figures 6-10, 6-11, and 6-12 (raster-scan order for Y, followed by all 4:2:0 blocks of Cb in raster-scan order, followed by 4:2:0 blocks of Cr, followed by 4:2:2 blocks of Cb, followed by 4:2:2 blocks of Cr, followed by 4:4:4 blocks of Cb, followed by 4:4:4 blocks of Cr). A derived term called WT is formed by adding one to BlockWidthMinus1.
bBlockHeightMinus1: Specifies the block height of an IDCT block. This is equal to "7" for MPEG-1, MPEG-2, H.261, H.263, and MPEG-4. Shall be "7" if bConfig4GroupedCoefs is "1". A derived term called HT is formed by adding one to BlockHeightMinus1.
bBPPminus1: Specifies the number of bits per pixel for the video sample values. This shall be at least "7" (indicating 8-bit pixels). It is equal to "7" for MPEG-1, MPEG-2, H.261, and H.263. A larger number of bits per pixel is supported in some operational modes of MPEG-4. A derived term called BPP is formed by adding one to bBPPminus1.
bPicStructure: This parameter has the same meaning as the picture_structure parameter defined in Section 6.3.10 and Table 6-14 of MPEG-2, and indicates whether the current picture is a top-field picture (value '01'), a bottom-field picture (value '10'), or a frame picture (value '11'). In progressive-scan frame-structured coding such as in H.261, bPicStructure shall be '11'. A derived parameter PicCurrentField is defined as "0" unless bPicStructure is '10' (bottom field), in which case it is "1".
bSecondField: Indicates whether, in the case of field-structured coding (when bPicStructure is '01' or '10'), the current field is the second field of a picture. This is used to determine whether the opposite-parity field used as a reference for the opposite-parity lines for motion compensation prediction is the opposite-parity field of the reference picture or the opposite-parity field of the current picture. If bSecondField is "1", the current field is the second field of a picture and the field used as a reference for the opposite-parity lines for motion compensation are the opposite-parity lines of the current picture. (In both cases the field used as a reference for the same-parity lines for motion compensation are the same-parity lines of the reference picture.) Otherwise, bSecondField shall be "0".
bPicIntra: Indicates whether motion compensated prediction is needed for this picture. If bPicIntra is "1", all macroblocks are sent with IntraMacroblock equal to "1" (i.e. no motion compensated prediction is performed for the picture). Otherwise, some macroblocks of the picture may have IntraMacroblock equal to "0".
bPicBackwardPrediction: Indicates whether any macroblocks of the current picture include backward prediction. If bPicIntra is "1", bPicBackwardPrediction shall be "0". If bPicBackwardPrediction is "0", MotionBackward shall be "0" in all macroblocks of the picture. If bPicBackwardPrediction is "1", some macroblocks of the picture may have MotionBackward equal to "1".
bBidirectionalAveragingMode: This flag indicates the rounding method for combining prediction planes in bi-directional motion compensation (used for B pictures and Dual-Prime motion). The value "0" indicates MPEG-1 and MPEG-2 rounded averaging (//2), and the value "1" indicates H.263 truncated averaging (/2). bBidirectionalAveragingMode shall be "0" if no bidirectional averaging is needed.
bMVprecisionAndChromaRelation: This two-bit field indicates the precision of luminance motion vectors and how chrominance motion vectors shall be derived from luminance motion vectors:
'00' indicates that luminance motion vectors have half-sample precision and that chrominance motion vectors are derived from luminance motion vectors according to the rules in MPEG-2,
'01' indicates that luminance motion vectors have half-sample precision and that chrominance motion vectors are derived from luminance motion vectors according to the rules in H.263,
'10' indicates that luminance motion vectors have full-sample precision and that chrominance motion vectors are derived from luminance motion vectors according to the rules in H.261 Section 3.2.2 (dividing by two and truncating toward zero to full-sample values), and
'11' is reserved.
bChromaFormat: Affects number of prediction error blocks expected by the Accelerator. This variable is defined in Section 6.3.5 and Table 6-5 of MPEG-2. For MPEG-1, MPEG-2 "Main Profile," H.261 and H.263 bitstreams, this value shall always be set to '01', indicating "4:2:0" format. If '10' this indicates "4:2:2", and "11" indicates "4:4:4" sampling. Shall be equal '01' if bConfig4GroupedCoefs is "1" (since bConfig4GroupedCoefs does not include the EOB indication needed within coefficient data in 4:2:2 and 4:4:4 formats).
NOTE: Horizontal chroma siting differs slightly between H.261, H.263, and MPEG-1 versus MPEG-2 and MPEG-4. This difference is assumed to be small enough to ignore.
bPicScanFixed: When using accelerator-based IDCT processing of residual difference blocks, a value of "1" for this flag indicates that the inverse-scan method is the same for all macroblocks in the picture, and a value of "0" indicates that it is not. Shall be "1" if bConfigHostInverseScan is "1" or if bConfigResidDiffAccelerator is "0".
bPicScanMethod: When bPicScanFixed is "1", this field indicates the fixed inverse scan method for the picture. When bPicScanFixed is "0", this field has no meaning and shall be '00'. If bPicScanFixed is "1" this field shall have one of the following values:
If bConfigHostInverseScan is "0", bPicScanMethod shall be as follows:
'00' = Zig-zag scan (MPEG-2 Figure 7-2),
'01' = Alternate-vertical (MPEG-2 Figure 7-3),
'10' = Alternate-horizontal (H.263 Figure I.2 Part a),
If bConfigHostInverseScan is "1", bPicScanMethod shall be as follows:
'11' = Arbitrary scan with absolute coefficient address.
bPicReadbackRequests: Indicates whether read-back control requests are issued for the current picture to read back the values of macroblocks in the final decoded (ad deblocked, if deblocking is applied with wDeblockedPictureIndex equal to wDecodedPictureIndex) picture. A value of "1" indicates that read-back requests are present, and "0" indicates that they are not.
bRcontrol: This flag is defined in H.263 Section 6.1.2. It defines the rounding method to be used for half-sample motion compensation. A value of "0" indicates the half-sample rounding method found in MPEG-1, MPEG-2, and the first version of H.263. A value of "1" indicates the rounding method which includes a downward averaging bias which can be selected in some optional modes of H.263 and MPEG-4. It is meaningless for H.261, since H.261 has no half-sample motion compensation. It shall be set to "0" for all MPEG-1, and MPEG-2 bitstreams in order to conform with the rounding operator defined by those standards.
bPicSpatialResid8: A value of "1" indicates that spatial-domain difference blocks for host-based residual difference decoding can be sent using 8-bit samples, and a value of "0" indicates that they cannot. Shall be "0" if bConfigResidDiffHost is "0" or if BPP > 8. Shall be "1" if BPP is "8" and bPicIntra is "1" and bConfigResidDiffHost is "1". Shall be "1" if bConfigSpatialResid8 is "1". If equal to "1", bPicSpatialResid8 indicates that spatial-domain intra macroblocks are sent as 8-bit values (which are either signed or unsigned as determined by bConfigIntraResidUnsigned) and that spatial-domain non-intra macroblock differences are sent as signed 8-bit difference values relative to some motion compensated prediction. bPicSpatialResid8 differs from bConfigSpatialResid8 in that it is an indication for a particular picture, not a global indication for the entire video sequence. In some cases such as in an intra picture with BPP equal to "8", bPicSpatialResid8 will be "1" even though bConfigSpatialResid8 may be "0".
bPicOverflowBlocks: A value of "1" indicates that spatial-domain difference blocks for host-based residual difference decoding of a picture may be sent using "overflow" blocks, and a value of "0" indicates that they are not. Shall be "0" if bConfigResidDiffHost is "0" or if bConfigSpatialResid8 is "0" or if BPP > 8. bPicOverflowBlocks is an indication of whether any overflow blocks may be present for the particular picture. In an intra picture with BPP equal to "8", bPicOverflowBlocks shall be "0" as no overflow blocks are needed in this case.
bPicExtrapolation: This flag indicates whether motion vectors over picture boundaries are allowed as specified by H.263 Annex D and MPEG-4. This requires either allocation of picture planes which are two macroblocks wider (one extra macroblock at the left and another at the right) and two macroblocks taller (one extra macroblock at the top and another at the bottom) than the decoded picture size, or clipping of the address of each individual pixel access to within the picture boundaries. Macroblock addresses in this specification are for macroblocks in the interior of the picture, not including padding.
bPicDeblocked: Indicates whether deblocking commands are sent for this picture for creating a deblocked output picture in the picture buffer indicated in wDeblockedPictureIndex. If bPicDeblocked is "1", deblocking commands are sent and the deblocked frame shall be generated, and if bPicDeblocked is "0", no deblocking commands are sent and no deblocked picture shall be generated.
bPicDeblockConfined: Indicates whether deblocking filter command buffers contain commands which confine the effect of the deblocking filter operations to within the same set of macroblocks as are contained in the buffer.
bPic4MVallowed: Specifies whether four forward-reference motion vectors per macroblock are allowed as used in H.263 Annexes F and J.
bPicOBMC: Specifies whether motion compensation for the current picture operates using overlapped block motion compensation (OBMC) as specified in H.263 Annex F. Shall be zero if bPic4MVallowed is "0".
bPicBinPB: Specifies whether bi-directionally-predicted macroblocks in the picture use "B in PB" motion compensation, which restricts the bi-directionally predicted area for each macroblock to the region of the corresponding macroblock in the backward reference picture, as specified in Annexes G and M of H.263.
bMV_RPS: Specifies use of motion vector reference picture selection. If "1", this indicates that a reference picture index is sent for each motion vector rather than just forward and possibly backward motion picture indexes for the picture as a whole. If bMV_RPS is "1", the parameters wForwardRefPictureIndex and wBackwardRefPictureIndex have no meaning and shall be zero.
wBitstreamFcodes: When bConfigBitstreamRaw is "1", this parameter contains four motion vector f_code values as defined in MPEG-2. Each f_code value takes four bits. These values are packed into a sixteen bit word as in MPEG-2 as follows:
Bits 12-15 (the MSBs): f_code[0][0]: The forward horizontal f_code
Bits 8-11: f_code[0][1]: The forward vertical f_code
Bits 7-4: f_code[1][0]: The backward horizontal f_code
Bits 0-3 (the LSBs): f_code[1][1]: The backward vertical f_code
If any f_code value is unnecessary or irrelevant due to the structure of the bitstream data, due to bConfigBitstreamRaw being "0", or due to the f_code parameter not being needed in the relevant video coding bitstream syntax (such as in H.261 or H.263), then the f_code value shall be 0xF.
NOTE: MPEG-1 bitstreams provide this information in a different form. Therefore for MPEG-1 bitstreams, f_code[0][0] and f_code[0][1] shall be equal to MPEG-1's forward_f_code and f_code[1][0] and f_code[1][1] shall be equal to MPEG-1's backward_f_code.
wBitstreamPCEelements: When bConfigBitstreamRaw is "1", this parameter contains a set of flags necessary for the bitstream decoding process of MPEG-2 video. It is not used and shall be "0" when bConfigBitstreamRaw is "0" and for non-MPEG-2 video. The bits in this parameter are defined by their correspondence with bitstream elements of the MPEG-2 picture coding extension as follows:
Bits 14 and 15: IntraDCprecision = intra_dc_precision
Bits
12 and 13: AnotherPicStructure
= picture_structure.
Shall
be equal to bPicStructure.
Bit 11: TopFieldFirst = top_field_first
Bit 10: FrameDCTprediction = frame_pred_frame_dct
Bit 9: ConcealmentMVs = concealment_motion_vectors
Bit 8: QuantScaleType = q_scale_type
Bit 7: IntraVLCformat = intra_vlc_format
Bit 6: AlternateScan = alternate_scan
Bit 5: RepeatFirstField
= repeat_first_field
(not
needed by the accelerator)
Bit 4: Chroma420type = chroma_420_type
(not
needed by the accelerator and restricted by
MPEG-2
to be equal to progressive_frame)
Bit 3: ProgressiveFrame = progressive_frame
Bits 0, 1, and 2 (the LSBs): ReservedBits
bBitstreamConcealmentNeed: Specifies, when bConfigBitstreamRaw is "1", whether there is a significant likelihood of the bitstream data containing errors. Shall be "0" if bConfigBitstreamRaw is "0". Video accelerators must be designed not to crash or lock up, regardless of the content of the data given to them. However, it may be helpful for a video accelerator to have information about the host's assessment of the likelihood of syntactical errors in order to determine whether there is a need to be able to invoke some more complex error concealment algorithms which might slow down the bitstream decoding process. Allowed values for this parameter are as follows (all other values are reserved):
"0": indicates that the bitstream is unlikely to contain any significant amount of errors in its syntactical format.
"1": indicates that the bitstream may contain some errors, but these errors are likely to be infrequent (e.g., some error once or twice per hour).
"2": indicates that the bitstream is likely to contain some errors, and that these errors are likely to occur with a frequency that could have some impact on the user experience (e.g., some error every five to ten minutes), but that these errors are not likely to be a constant presence.
"3": indicates that the bitstream is likely to contain relatively significant, serious, and frequent syntactical format errors (e.g., one per minute or more frequent errors).
bBitstreamConcealmentMethod: Specifies, when bConfigBitstreamRaw is "1", a preferred default method for error concealment processing. Shall be "0" if bConfigBitstreamRaw is "0". Allowed values for this parameter are as follows (all other values are reserved):
0: Preferred concealment method unknown or unspecified.
1: Preferred concealment method is to use spatial intra-picture concealment within the picture.
2: Preferred concealment method is to use the forward-motion reference picture for inter-picture concealment (to be used most typically in a P picture or in a B picture that is temporally closer to its forward-motion reference picture than to its backward-motion reference picture).
3: Preferred concealment method is to use the backward-motion reference picture for inter-picture concealment (to be used most typically in a B picture that is temporally closer to its backward-motion reference picture than to its forward-motion reference picture).
A decoded picture shall contain one or more macroblock control command buffers if it does not contain bitstream data buffers. The decoding process for every macroblock shall be addressed (only once) in some buffer of each type that is used. For every macroblock control command buffer, there shall be a corresponding IDCT residual coding buffer containing the same set of macroblocks. If one or more deblocking filter control buffers are sent, the set of macroblocks in each deblocking filter control buffer shall be the same as the set of macroblocks in corresponding macroblock control and IDCT residual coding buffers.
The processing of the picture requires that the motion prediction for each macroblock must precede the addition of the IDCT residual data. This can be accomplished either by processing the motion prediction commands first and then reading this data back in from the destination picture buffer while processing the IDCT residual coding commands, or by processing these two buffers in a coordinated fashion - adding the residual data to the prediction before writing the result to the destination picture buffer. The motion prediction command and IDCT residual coding command for each macroblock affect only the rectangular region within that macroblock.
A deblocking filter command for a macroblock may require access to read the reconstructed values of two rows and two columns of samples neighboring the current macroblock at the top and left as well as reconstructed values within the current macroblock. It can result in modification of one row and one column of samples neighboring the current macroblock at the top and left as well as up to three rows and three columns within the current macroblock. The filtering process for a given macroblock may therefore require the prior reconstruction of other macroblocks. Two different types of deblocking filter buffers are defined herein: 1) a buffer type which requires access and modification of the value of reconstructed samples for macroblocks outside the current buffer (when bPicDeblockConfined is "0"), and 2) a buffer type which does not (when bPicDeblockConfined is "1"). To process the first of these two types of deblocking command buffer, the accelerator must ensure that the reconstruction has been completed for all buffers which affect macroblocks to the left and top of the macroblocks in the current buffer before processing the deblocking commands in the current buffer. Processing the second of these two types requires only prior reconstruction values within the current buffer. The deblocking post-processing can be conducted either by processing the motion prediction and IDCT residual coding commands for the entire buffer or frame first, followed by reading back in the values of some of the samples and modifying them as a result of the deblocking filter operations, or by processing the deblocking command buffer in a coordinated fashion with the IDCT residual coding bufer - performing the deblocking before writing the final output values to the destination picture buffer. Note also that the destination picture buffer for the deblocked picture may differ from that of the reconstructed picture prior to deblocking, in order to support "outside the loop" deblocking as a post-processing operation which does not affect the sample values used for prediction of the next picture.
if(bPicIntra) NumMV = 0; else if(PicOBMC) else DXVA_MB_Control else if(HostResidDiff) ReservedBits2 else if(bChromaFormat == '01') for(i=0; i<6; i++) bNumCoef[i] else if(bPicIntra != 0) if(bMV_RPS) for(i=0; i<NumMV; i++) bRefPicSelect[i] ReservedBits4 } |
// Alignment to 16 Bytes // 16b Which macroblock // 16b Control bits 8b // 24b Where in buffer // 16b Which blks coded // 16b Which blks oflow // 32b // 48b 8b * 6 blocks = 48b // 16 b // 32 b // 16b * NumMV vectors // 16b * NumMV vectors 8b * NumMV vectors 16 Byte Align |
wMBaddress: Specifies the macroblock address of the current macroblock in raster scan order (0 being the address of the top left macroblock, PicWidthInMBminus1 being the address of the top right macroblock, and PicHeightInMBminus1 * PicWidthInMB being the address of the bottom left macroblock, and PicHeightInMBminus1 * PicWidthInMB + PicWidthInMBminus1 being the address of the bottom right macroblock).
wMBtype: Specifies the type of macroblock being processed as described below:
bit 15: MvertFieldSel[3] (The MSB),
bit 14: MvertFieldSel[2],
bit 13: MvertFieldSel[1],
bit 12: MvertFieldSel[0]: Specifies vertical field selection for corresponding motion vectors sent later in the macroblock control command, as specified in further detail below. For frame-based motion with a frame picture structure (e.g., for H.261 and H.263), these bits shall all be zero. The use of these bits is the same as that specified for the corresponding bits in Section 6.3.17.2 of MPEG-2.
bit 11: ReservedBits.
bit 10: HostResidDiff: Specifies whether spatial-domain residual difference decoded blocks are sent or whether transform coefficients are sent for off-host IDCT for the current macroblock. Shall be "0" if bConfigResidDiffHost is "0". Shall be "1" if bConfigResidDiffAccelerator is "0".
bits 9 and 8: MotionType: Specifies the motion type in the picture, as specified in further detail below. For frame-based motion with a frame picture structure (e.g., for H.261), these bits shall be '10' (frame motion). The use of these bits is defined in Table 1 below and corresponds directly to the use of frame_motion_type or field_motion_type bits in Section 6.3.17.1 and Tables 6-17 and 6-18 of MPEG-2 when these bits are present in MPEG-2.
bits 7 and 6: MBscanMethod: Shall equal bPicScanMethod if bPicScanFixed is "1". If bConfigHostInverseScan is "0", MBscanMethod shall be as follows:
'00' = Zig-zag scan (MPEG-2 Figure 7-2),
'01' = Alternate-vertical (MPEG-2 Figure 7-3),
'10' = Alternate-horizontal (H.263 Figure I.2 Part a),
If bConfigHostInverseScan is "1", MBscanMethod shall be equal to:
'11' = Arbitrary scan with absolute coefficient address.
bit 5: FieldResidual: A flag indicating whether the IDCT blocks use a field IDCT structure as specified in MPEG-2.
bit 4: H261LoopFilter: A flag specifying whether the H.261 loop filter (Section 3.2.3 of H.261) is active for the current macroblock prediction. The H.261 loop filter is a separable ¼, ½, ¼ filter applied both horizontally and vertically to all six blocks in an H.261 macroblock except at block edges where one of the taps would fall outside the block. In such cases the filter is changed to have coefficients 0, 1, 0. Full arithmetic precision is retained with rounding to 8-bit integers at the output of the 2-D filter process (half-integer or higher values being rounded up).
bit 3: Motion4MV: A flag indicating that forward motion uses a distinct motion vector for each of the four luminance blocks in the macroblock, as used in H.263 Annexes F and J. Motion4MV shall be "0" if MotionForward is "0" or bPic4MVallowed is "0".
bit 2: MotionBackward: A flag used as specified for the corresponding macroblock_motion_backward parameter in MPEG-2. If bPicBackwardPrediction is "0", MotionBackward shall be "0". Further information on the use of this flag is given below.
bit 1: MotionForward: A flag used as specified for the corresponding macroblock_motion_forward in MPEG-2. Further information on the use of this flag is given below.
bit 0: IntraMacroblock: (The LSB) A flag indicating that the macroblock is coded as "intra", and no motion vectors are used for the current macroblock. Corresponds to macroblock_intra in MPEG-2. Further information on the use of this flag is given below.
MBskipsFollowing: Specifies the number of "skipped macroblocks" to be generated following the current macroblock. Each skipped macroblock shall be generated in a manner mathematically equivalent to incrementing the value of wMBaddress and then repeating the same macroblock control command.
Any macroblock control command with a non-zero value for MBskipsFollowing contains the explicit specification of how motion-compensated prediction is to be performed for each macroblock to be skipped, and is equivalent (except for the value of MBskipsFollowing) to an explicit "non-skip" specification of the generation of the first of the series of skipped macroblocks. Thus, whenever MBskipsFollowing is not "0", the following parameters shall all be equal to "0":
Motion4MV,
IntraMacroblock,
wPatternCode,
wPCOverflow (if present)
bNumCoef[i] (if present)
Furthermore, the generation of following skipped macroblocks shall be restricted by the decoder to not include a "wrap" to a new row of macroblocks in the picture. In other words, a separate macroblock control command must be sent to generate the first (but not necessarily the last) macroblock of each row of macroblocks.
NOTES:
The generation of a skipped macroblock in this context differs somewhat from that in MPEG-2 Section 7.6.6. In this specification, the manner in which the skipped macroblocks are generated is specified in a separate macroblock control command, rather than being inferred from the type of the preceding non-skipped macroblock and the type of picture (e.g., in MPEG-2 the method of generating skipped macroblocks depends on whether the picture is a P picture or B picture). To illustrate this point, we provide the following example. Suppose that in an MPEG-2 bitstream, macroblock 0 is coded with a residual difference, macroblock 1 is skipped, macroblock 2 is coded with a residual difference, macroblocks 3, 4, and 5 are skipped, and then macroblock 6 is coded with a residual difference. These seven macroblocks would require the generation of (at least) five DirectX VA macroblock control commands. The minimal five control commands would be characterized by the folllowing:
a. the first would have wMBaddress = "0" and MBskipsFollowing = "0",
b. the second would have wMBaddress = "1" and MBskipsFollowing = "0",
c. the third would have wMBaddress = "2" and MBskipsFollowing = "0",
d. the fourth would have wMBaddress = "3" and MBskipsFollowing = "2", and
e. the fifth would have wMBaddress = "6" and MBskipsFollowing = "0".
The following conditions are required as specified above:
a. skipped macroblocks have no residual differences,
b. skipped macroblocks can be generated by repeating the operation of a macroblock control command with an incremented wMBaddress, and
c. macroblock skipping is restricted to not wrap to a new row of macroblocks,
Because of the three conditions listed in note 2, an accelerator may actually implement motion compensation (when Motion4MV is "0") as the application of the specified motion vectors to a rectangle of width equal to MacroblockWidth·(MBskipsFollowing+1) in the luminance component (and to a similarly-specified rectangle in the chrominance components), rather than as MBskipsFollowing+1 repetitions of the same macroblock control operation.
The generation of macroblocks indicated as skipped in H.263 with Advanced Prediction mode (Annex F) active requires coding some "skipped" macroblocks as non-skipped macroblocks using this specification - in order to specify the OBMC effect within these macroblocks.
MBdataLocation: An index into the IDCT residual coding block data buffer, indicating the location of the residual difference data for the blocks of the current macroblock, expressed as a multiple of 32 bits. Shall be "0" for the first macroblock in the macroblock control command buffer. May contain any value if wPatternCode is When wPatternCode is "0", decoders are recommended but not required to set this parameter either to "0" or to the same value as in the next macroblock control command
wPatternCode: When using host-based residual difference decoding, bit 11-i of wPatternCode (where bit 0 is the LSB) indicates whether a residual difference block is sent for block i, where i is the index of the block within the macroblock as specified in MPEG-2 Figures 6-10, 6-11, and 6-12 (raster-scan order for Y, followed by 4:2:0 blocks of Cb in raster-scan order, followed by 4:2:0 blocks of Cr, followed by 4:2:2 blocks of Cb, followed by 4:2:2 blocks of Cr, followed by 4:4:4 blocks of Cb, followed by 4:4:4 blocks of Cr). The data for the coded blocks (those blocks having bit 11-i equal to 1) is found in the residual coding buffer in the same indexing order (increasing i). For 4:2:0 MPEG-2 data, the value of wPatternCode corresponds to shifting the decoded value of CBP left by six bit positions (those lower bit positions being for the use of 4:2:2 and 4:4:4 chroma formats).
If bConfigSpatialResidInterleaved is "1", host-based residual differences are sent in a chroma-interleaved form matching that of the YUV pixel format in use. In this case each Cb and spatially-corresponding Cr pair of blocks is treated as a single residual difference data structure unit. This does not alter the value or meaning of wPatternCode, but it implies that both members of each pair of Cb and Cr data blocks are sent whenever either of these data blocks has the corresponding bit set in wPatternCode. If the bit in wPatternCode for a particular data block is zero, the corresponding residual difference data values shall be sent as zero whenever this pairing necessitates sending a residual difference data block for a block with a wPatternCode bit equal to zero.
wPC_Overflow: When using host-based residual difference decoding with bPicOverflowBlocks equal to "1" and IntraMacroblock equal to "0" (the 8-8 overflow method), wPC_Overflow contains the pattern code of the overflow blocks as specified in the same manner as for wPatternCode. The data for the coded overflow blocks (those blocks having bit 11-i equal to 1) is found in the residual coding buffer in the same indexing order (increasing i).
bNumCoef[i]: Indicates the number of coefficients in the residual difference data buffer for each block i of the macroblock, where i is the index of the block within the macroblock as specified in MPEG-2 Figures 6-10, 6-11, and 6-12 (raster-scan order for Y, followed by Cb, followed by Cr). Used only when HostResidDiff is "0" and bChromaFormat is '01' (4:2:0). (If used in 4:2:2 or 4:4:4 formats, it would increase the size of typical macroblock control commands past a critical memory alignment boundary, so only EOB is used for determining the number of coefficients in each block in non-4:2:0 cases.) The data for these coefficients is found in the residual difference buffer in the same order.
wTotalNumCoef: Indicates the total number of coefficients in the residual difference data buffer for the entire macroblock. Used only when HostResidDiff is "0" and bChromaFormat is not '01' (4:2:0).
MVector[i].horz, MVector[i].vert: Specifies the value of a motion vector in horizontal and vertical dimensions. The two-dimensional union of these two values is referred to as MVvalue[i]. Each dimension of each motion vector contains a signed integer motion offset in half-sample units. Both elements shall be even if bMVprecisionAndChromaRelation is '10' (H.261-style motion supporting only integer-sample offsets).
bRefPicSelect[i]: Specifies the reference picture buffer used in prediction for MVvalue[i] when motion vector reference picture selection is in use.
Valid combinations of IntraMacroblock, MotionForward, MotionBackward, PicMotionType, MvertFieldSel, and MVector[i] are shown in Table 1 below. The values specified in the table are for when H261LoopFilter, Motion4MV and bPicOBMC are all zero.
Table 1 - Use of Macroblock Parameters for Various Motion Types
IntraMacroblock, MotionForward, MotionBackward |
MotionType (meaning dep on picture type) |
MVector[0] MvertFieldSel[0] (1st, dir1) |
MVector[1] MvertFieldSel[1] (1st, dir2) |
MVector[2] MvertFieldSel[2] (2nd, dir1) |
MVector[3] MvertFieldSel[3] (2nd, dir2) |
Frame Structured Pictures (bPicStructure = '11') |
|||||
1,0,0 (intra) |
'00' (intra) | ||||
0,0,0 (no motion) |
'10' (no motion) | ||||
'10' (frame MC) |
PMV[0][0] | ||||
'10' (frame MC) |
PMV[0][1] | ||||
'10' (frame MC) |
PMV[0][0] |
PMV[0][1] | |||
'01' (field MC) |
PMV[0][0] sel[0][0] |
PMV[1][0] sel[1][0] | |||
'01' (field MC) |
PMV[0][1] sel[0][1] |
PMV[1][1] sel[1][1] |
|||
'01' (field MC) |
PMV[0][0] sel[0][0] |
PMV[0][1] sel[0][1] |
PMV[1][0] sel[1][0] |
PMV[1][1] sel[1][1] |
|
'11' (dual-prime) |
PMV[0][0] 0 (top) |
vector'[2][0][0], vector'[2][0][1]<<1 1 (bot) |
PMV[0][0] |
vector'[3][0][0], vector'[3][0][1]<<1 |
|
Field Structured Pictures (bPicStructure == '01' or '10') |
|||||
1,0,0 (intra) |
'00' (intra) | ||||
0,0,0 (no motion) |
'01' (no motion) |
PicCurrentField | |||
'01' (field MC) |
PMV[0][0] sel[0][0] | ||||
'01' (field MC) |
PMV[0][1] sel[0][1] | ||||
'01' (field MC) |
PMV[0][0] sel[0][0] |
PMV[0][1] sel[0][1] | |||
'10' (16x8 MC) |
PMV[0][0] sel[0][0] |
PMV[1][0] sel[1][0] | |||
'10' (16x8 MC) |
PMV[0][1] sel[0][1] |
PMV[1][1] sel[1][1] |
|||
'10' (16x8 MC) |
PMV[0][0] sel[0][0] |
PMV[0][1] sel[0][1] |
PMV[1][0] sel[1][0] |
PMV[1][1] sel[1][1] |
|
'11' (dual-prime) |
PMV[0][0] PicCurrentField |
vector'[2][0] !PicCurrentField |
NOTES:
This is an extremely important table. It must be inspected very closely and fully understood. Experience has shown that there will be bugs in any implementation that fails to properly account for every utilized macroblock type in the table (especially for field-structured pictures or dual-prime motion).
In a number of places in the above table the variable PMV is used rather than the raw motion vector value. This is to distinguish between PMV, which is in frame coordinates, from a motion vector which may be in field coordinates (i.e., at half-vertical resolution). In all such cases above, PMV refers to the value of PMV after updating by the current motion vector.
The definitions of vector'[2][0] and vector'[3][0] are found in MPEG-2 Section 7.6.3.6. The shift operation shown indicates that the vertical component is modified to frame coordinates.
In both "no motion" cases (0,0,0) above, the macroblock parameters emulate a forward prediction macroblock (0,1,0) with a zero-valued motion vector. (See also MPEG-2 Section 7.6.3.5.)
The remaining allowed cases are as follows:
H261LoopFilter == 1 && bPicOBMC == 0 && Motion4MV == 0: This indicates that one forward motion vector is sent in MVector[0] and the H.261 loop filter as specified in Section 3.2.3 of H.261 is active for the forward prediction in the macroblock. MotionForward shall be "1" in this case and IntraMacroblock and MotionBackward shall both be "0".
bPicOBMC == 0 && Motion4MV == 1: This indicates that four forward motion vectors are sent in MVector[0, 1, 2, 3]. MotionForward shall be "1" in this case and IntraMacroblock shall be "0". If MotionBackward is "1", a fifth motion vector is sent for backward prediction in MVector[4].
bPicOBMC == 1 && Motion4MV == 0: This indicates that ten forward motion vectors are sent in MVector[0, 1, 2, ., 9] for specification of OBMC motion, and that the values of the first four such motion vectors are all equal. If MotionBackward is "1", an eleventh motion vector is sent for backward prediction in MVvalue[10].
bPicOBMC == 1 && Motion4MV == 1: This indicates that ten forward motion vectors are sent in MVvalue[0, 1, 2, ., 9] for specification of OBMC motion, and that the values of the first four such motion vectors may differ from each other. If MotionBackward is "1", an eleventh motion vector is sent for backward prediction in MVvalue[10].
NOTE: No current configuration of the H.263 standard would exercise the case in which bPicOBMC 1, Motion4MV 1, and MotionBackward
NOTE: The average operator is mathematically identical ((s1+s2)//2) for MPEG-1, MPEG-2 half-sample prediction filtering, bi-directional averaging, and Dual Prime same-opposite parity combining. This operator is the same as the C-language expression: (s1+s2+1)>>1. The H.263 bi-directional averaging operator does not seed with an offset of "+1" prior to downshifting. The bBidirectionalAveragingMode parameter determines which of these methods is used.
The interface supports three low-level methods of handling IDCT. In all cases the basic inverse quantization process, pre-IDCT range saturation, MPEG-2 mismatch control (if necessary), and intra DC offset (if necessary) are performed on the host, and the final picture reconstruction and reconstruction clipping is done on the accelerator. The first method is to pass macroblocks of transform coefficients to the accelerator for external IDCT, picture reconstruction, and reconstruction clipping. The second and third involve performing an IDCT on the host and passing blocks of spatial-domain results for external picture reconstruction and reconstruction clipping.
The inverse quantization, pre-IDCT saturation, mismatch control, intra DC offset, IDCT, picture reconstruction, and reconstruction clipping processes are defined as the following:
Performing
inverse quantization as necessary (including application of any inverse
quantization weighting matrices) to create a set of IDCT coefficient values
Saturating each
reconstructed coefficient value
Mismatch control
(described as needed for MPEG-2 only, and performed by the host) is performed
by adding up the saturated values of all coefficients in the macroblock
(equivalent to XORing their LSBs). If
the sum is even, then the saturated value of the last coefficient
NOTE: MPEG-1 has a different form of mismatch control that consists of altering
the value by plus or minus 1 for each coefficient that would otherwise be even
after inverse quantization. H.263 does
not require mismatch control in the sense described herein. In any case, where applicable, mismatch
control is the host's responsibility.
Adding an intra
DC offset if necessary (performed by the host) so that all intra blocks
correspond to adding a difference to a spatial reference prediction value of 2(BPP-1). Such an offset is necessary for all of the
referenced video coding standards (H.261&3 and MPEG-1&2&4) except
when HostResidDiff is "1" and bConfigIntraResidUnsigned is "1", and has the
value
Performing the unitary separable transformation (either on the host or the accelerator):
where:
C(u) = 1 for u 0, otherwise
C(v) = 1 for v 0, otherwise
x and y are the horizontal and vertical spatial coordinates in the pixel domain,
u and v are the transform-domain horizontal and vertical frequency coordinates, and
WT and HT are the width and height of the transform block (normally both are eight).
NOTE: The accuracy of this IDCT process shall conform to that required in the H.261&3 and MPEG-1&2&4 video coding standards (all have very similar requirements).
Adding the spatial-domain residual information to the motion-compensated prediction value for non-intra blocks or to the constant reference value for intra blocks (this constant being 2(BPP-1) except when HostResidDiff is "1" and bConfigIntraResidUnsigned is "1", in which case it is "0") to perform picture reconstruction (on the accelerator), and finally
Clipping the picture reconstruction to a range of [0, (2BPP)-1] to store as the final resulting picture sample values (on the accelerator).
The transfer of macroblock IDCT coefficient data for off-host IDCT processing consists of a buffer of index and value information. Index information is sent as 16 bit words (although only 6 bit quantities are really needed for 8 8 transform blocks), and transform coefficient value information is sent as signed 16 bit words (although only 12 bits are needed for the usual case of 8 8 transform blocks and BPP=8).
Transform coefficients are sent in either DXVA_TCoefSingle or DXVA_TCoef4Group data structures. If bConfig4GroupedCoefs is "0", coefficients are sent singly using DXVA_TCoefSingle structures. If bConfig4GroupedCoefs is "1", coefficients are sent in groups of four using DXVA_TCoef4Group structures. These two alternative data structures contain two similar basic elements which are defined below.
TCoefIDX: specifies the index of the coefficient in the block, as determined from bConfigHostInverseScan. There are two basic ways that TCoefIDX can be used:
Run-length ordering: When bConfigHostInverseScan is "0", MBscanMethod indicates a zig-zag, alternate-vertical, or alternate-horizontal inverse scan. In this case, TCoefIDX contains the number of zero-valued coefficients which precede the current coefficient in the specified scan order, subsequent to the last transmitted coefficient for the block (or relative to the start of the block if none preceding).
Arbitrary ordering: When bConfigHostInverseScan is "1", MBscanMethod indicates arbitrary ordering. In this case, TCoefIDX simply contains the raster index of the coefficient within the block (i.e., TCoefIDX = u + v WT
TCoefIDX shall never be greater than or equal to WT HT
TCoefValue: The value of the coefficient in the block. TCoefValue shall be clipped to the appropriate range as specified in Section 3.4.2 above by the host prior to passing the coefficient value to the accelerator for inverse DCT operation. MPEG-2 mismatch control, if necessary, is also the responsibility of the host, not the accelerator (this may require the creation of extra "phantom" nonzero coefficients).
The DXVA_TCoefSingle data structure, used whenever bConfig4GroupedCoefs is "0", is defined as:
DXVA_TCoefSingle
TCoefEOB: Indicates whether the current coefficient is the last one associated with the current block of transform coefficients. TCoefEOB is the LSB of the first word of the DXVA_TCoefSingle data structure. A value of "1" is indicates that the current coefficient is the last one for the block, and a value of "0" indicates that it is not.
The DXVA_TCoef4Group data structure, used only when bConfig4GroupedCoefs is "1" and bConfigHostInverseScan is "0", is defined as:
DXVA_TCoef4Group
In the DXVA_TCoef4Group data structure, groups of four transform coefficients are sent together at a time along with run-length values. The ith element of each array in DXVA_TCoef4Group contains element 3-i of the actual coefficient or run-length list (so the first coefficient or index goes into element 3, the next in element 2, etc.). If only NC<4 nonzero coefficients need to be sent, then TCoefIDX[i] shall be 63 (hexadecimal 0x3F) and TCoefValue[i] shall be equal to TCoefValue[4 NC] for i=0 to 3 NC.
IDCT may be performed on the host, with the result passed through the API. There are two supported schemes for sending the results - the 16-bit method and the 8-8 overflow method. An indication of which is used is sent in bConfigSpatialResid8.
When sending data using the 16-bit method, blocks of 16-bit data are sent sequentially. Each block of spatial-domain data consists of WT HT values of DXVA_Sample16
A DXVA_Sample16 is a 16-bit signed integer. If BPP is greater than 8, only the 16-bit method is allowed. If bPicIntra is "1" and BPP is "8", the 16-bit method is not allowed. If IntraMacroblock is "0", the 16-bit samples are sent as signed quantities relative to the motion-compensated prediction values. If IntraMacroblock is "1", the 16-bit samples are sent as follows:
If bConfigIntraResidUnsigned is "1", the samples are sent as unsigned quantities relative to the constant reference value of "0",
If bConfigIntraResidUnsigned is "0", the samples are sent as signed quantities relative to the constant reference value of 2(BPP-1).
Blocks of data are sent sequentially, in the order specified by scanning wPatternCode for 1-valued bits from MSB to LSB.
NOTES:
No clipping of these values can be assumed to have been performed on the host, unless bConfigSpatialHost8or9Clipping is "1". Although only a BPP+1 bit range is needed to adequately represent the spatial-domain difference data, the output of some IDCTs will produce numbers beyond this range unless they are clipped. The accelerator shall work properly with at least a 15-bit range of values.
Although video coding standards typically specify clipping of a difference value prior to adding it to a prediction value (e.g., 9 bit clipping in 8 bit-per-sample video), this clipping stage is actually unnecessary since it has no effect on the resulting decoded output picture. We therefore do not assume that this clipping occurs herein unless necessary for the accelerator hardware as indicated by bConfigSpatialHost8or9Clipping being set to "1".
If BPP is "8", the 8-bit difference method may be used. Its use is required if bPicIntra is "1" and BPP is "8". In this case, each spatial-domain difference value is represented using only eight bits. When sending data using the 8-bit method, blocks of 8-bit data are sent sequentially. Each block of 8-bit spatial-domain residual difference data consists of WT HT values of DXVA_Sample8.
If IntraMacroblock is "0" the 8-bit samples are signed differences to be added or subtracted (as determined from bConfigResid8Subtraction and whether the sample is in a first pass block or overflow block) relative to a motion compensation prediction. If IntraMacroblock is "0" and the difference to be represented for some pixel in a block is too large to represent using only eight bits, a second "overflow" block of samples can be sent.
If IntraMacroblock is "1", the 8-bit samples are set as follows:
If bConfigIntraResidUnsigned is "1", the 8-bit samples are sent as unsigned quantities relative to the constant reference value of "0",
If bConfigIntraResidUnsigned is "0", the 8-bit samples are sent as signed quantities relative to the constant reference value of 2(BPP-1).
Overflow blocks shall not be sent if IntraMacroblock is "1".
Blocks of data are sent sequentially, in the order specified by scanning wPatternCode for 1-valued bits from MSB to LSB, and then all necessary 8-bit overflow blocks are sent as specified by wPC_Overflow. Such overflow blocks are subtracted rather than added if bConfigResid8Subtraction is "1".
The first pass of 8-bit differences for each non-intra macroblock is added. If bPicOverflowBlocks is "0" or IntraMacroblock is "1", there is no second pass. If bPicOverflowBlocks is "1", IntraMacroblock is "0", and bConfigResid8Subtraction is "1", the second pass of 8-bit differences for each non-intra macroblock is subtracted. If bPicOverflowBlocks is "1", IntraMacroblock is "0", and bConfigResid8Subtraction is "0", the second pass of 8-bit differences for each non-intra macroblock is added.
If any sample is non-zero in both an original 8-bit block and in a corresponding 8-bit overflow block, then:
If bConfigResid8Subtraction is "0", the sign of the sample shall be the same in both blocks.
If bConfigResid8Subtraction is "1", the sign of the sample in the original 8-bit block shall be the same as the sign of minus one times the value of the sample in the corresponding overflow block.
This allows the sample to be added to the prediction picture with 8-bit clipping of the result after each of the two passes.
NOTE: The 8+8 method (using 8-bit differences with overflow blocks with bConfigResid8Subtraction equal to "0") cannot represent a residual difference value of +255 if IntraMacroblock is "0", which makes it not strictly compliant with video coding standards. However, this format is supported herein as it is 1) used in some existing implementations, 2) more efficient than 16-bit sample use in terms of the amount of data needed to represent a picture, and 3) not considered to normally result in any meaningful degradation of video quality.
Deblocking filter control commands, if present, are sent for each luminance block in a macroblock and are sent once for each pair of chrominance blocks. The commands are sent in raster scan order within the macroblock, with all blocks for luminance sent before any blocks for chrominance, then one chrominance 4:2:0 command, then one chrominance 4:2:2 command if needed, then two chrominance 4:4:4 commands if needed (the same filtering is applied to both chrominance components). The filtering for each block is specified by specification of the deblocking to occur across its top edge, followed by specification of the deblocking to occur across its left edge. Deblocking is specified for chrominance only once - and the same deblocking commands are used for both the Cb and Cr components. For example, deblocking of a 16x16 macroblock which contains 4:2:0 data using 8x8 blocks is specified by sending four sets of two (top and left) edge filtering commands for the luminance blocks, followed by one set of two edge filtering commands for the chrominance. The data structure for specification of each deblocking edge is as follows:
DXVA_DeblockingEdgeControl
EdgeFilterStrength: This parameter specifies the strength of the filtering to be performed as specified in H.263 Annex J.
EdgeFilterOn: This flag shall be "1" if the edge is to be filtered, and "0" if not.
The actual edge filtering for the edges with EdgeFilterOn equal to "1" shall be performed precisely as specified in H.263 Annex J with the specified value of EdgeFilterStrength and with clipping at the output to the range of [0, (2BPP)-1]. As specified in H.263 Annex J, all top-edge filtering for all blocks shall be performed "prior" to any left edge filtering for any blocks (in the sense that the values of the samples used for top-edge filtering shall be those reconstructed values prior to any deblocking filtering for left-edge filtering).
If the buffer type indicates that sample values of macroblocks outside of the current deblocking filter command buffer are not affected, the EdgeFilterOn parameter shall be zero for all edges at the left and top of the region covered by the macroblocks with deblocking filter commands in the buffer.
One read-back command buffer is present when bPicReadbackRequests is "1", and this commands the accelerator to return resulting final (after deblocking if applicable) picture macroblock data to the host. If an encryption protocol is in use, the accelerator may respond to read-back requests by returning an error indication, returning garbage data, or returning encrypted data (as may be specified by the encryption protocol).
The buffer passed to the accelerator shall contain read-back commands containing a single parameter per macroblock to be read:
wMBaddress: (16 bits) Specifies the macroblock address of the current macroblock in raster scan order (0 being the address of the top left macroblock, PicWidthInMBminus1 being the address of the top right macroblock, and PicHeightInMBminus1 * PicWidthInMB being the address of the bottom left macroblock, and PicHeightInMBminus1 * PicWidthInMB + PicWidthInMBminus1 being the address of the bottom right macroblock).
If BPP is "8", the data shall be returned in the form of 8 bit unsigned values (thus Black is nominally Y=16, Cb=Cr=128 and White is nominally Y=235, Cb=Cr=128), and if BPP is greater than 8, the data shall be returned in the form of 16-bit unsigned values.
The data is returned from the accelerator to the host in the form of:
First, a copy of the read-back command buffer itself, followed by padding to the next 32 Byte alignment boundary, and then
The macroblock data values, returned in the order sent in the read-back command buffer, in the form of WT HT samples per block for each block in each macroblock. Residual difference blocks within a macroblock shall be returned in the order specified as in MPEG-2 Figures 6-10, 6-11, and 6-12 (raster-scan order for Y, followed by all 4:2:0 blocks of Cb in raster-scan order, followed by 4:2:0 blocks of Cr, followed by 4:2:2 blocks of Cb, followed by 4:2:2 blocks of Cr, followed by 4:4:4 blocks of Cb, followed by 4:4:4 blocks of Cr).
When variable-length decoding of raw bitstream data is performed on the accelerator, the data sent by the host for the decoding of the picture is divided into three types of buffers:
Inverse quantization matrix buffers, which provide information about how to perform inverse quantization of the bitstream data,
Slice control buffers, each of which provides information about the location of start codes and data within a corresponding bitstream data buffer, and
Bitstream data buffers, which contain raw streams of data encoded according to a specific video coding specification.
An inverse quantization matrix buffer is sent to initialize inverse quantization matrices for off-host bitstream decoding. Inverse quantization matrix buffers provide information about how to decode all current and subsequent video in the bitstream, until a new inverse quantization matrix buffer is provided. (Thus, inverse quantization matrices are persistent.) No more than one inverse quantization matrix buffer shall be sent from the host to the accelerator at a time.
DXVA_QmatrixData |
// 8b // WT·HT·16b |
bNewQmatrix[i]: Indicates whether a new inverse quantization matrix of type i is present in the buffer. If bNewQmatrix[i] is "1", a new inverse quantization matrix of type i follows in the inverse quantization matrix buffer. Type i=0 is for intra luminance quantization, i=1 is for inter luminance quantization, i=2 is for intra chrominance quantization, and i=3 is for inter chrominance quantization. No default values of inverse quantization matrices may be assumed by the accelerator in the absence of any prior value sent by the host. The value of bNewQmatrix[i] shall not be zero for both i=0 and i=1.
If the value of bNewQmatrix[i] is "0" for i=2 or i=3, then:
If bNewQmatrix[i-2] is "0", the previous inverse quantization matrix for i shall continue to be used by the accelerator, and
If bNewQmatrix[i-2] is "1", the inverse quantization matrix for i shall be set equal to the new inverse quantization matrix for i-2.
If the relevant video coding specification does not need inverse quantization matrices (e.g., H.261 and H.263), no inverse quantization matrix buffer shall be sent. If the relevant video coding specification does need inverse quantization matrices, some value must be provided for these inverse quantization matrices by the host prior to, or in conjunction with, the transferal of any bitstream data buffers at the start of the video decoding process.
NOTE: No default value is assumed for quantization matrices in the absence of any prior value sent from the host. The quantization matrix values must be sent explicitly, even if they contain values that are available by default in the relevant video coding specification.
Qmatrix[i]: An inverse quantization matrix. Present only when bNewQmatrix[i] is equal to "1". The matrix consists of WT·HT unsigned words (in which only the lower eight bits of each word are used for the dominant video coding standards). The order of the data values within the inverse quantization matrix shall be as specified in the relevant video coding specification.
NOTE: For MPEG-2 bitstreams, the data values within Qmatrix[i] are in zig-zag inverse scan order, as specified in subclause 7.3.1 and Figure 7-2 of MPEG-2.
Slice control buffers shall be provided to guide the operation of off-host VLD bitstream processing. The host software decoder shall determine the location of "slice-level" resynchronization points in the bitstream. A slice is defined to be a multi-macroblock layer which includes a resynchronization point in the bitstream data. In H.261 bitstreams, a GOB shall be considered a slice. In H.263 bitstreams, a sequence of one or more GOBs starting with a GOB start code and containing no additional GOB start codes shall be considered a slice. A slice control buffer shall contain one or more DXVA_SliceInfo data structures, according to the contents of a corresponding bitstream data buffer.
DXVA_SliceInfo |
// 16b // 16b // 32b // 32b // 8b // 8b // 16b // 16b // 16b // 16b |
wHorizontalPosition, wVerticalPosition: The horizontal and vertical position of the first macroblock of the slice expressed in units of macroblocks with zero indicating the left-most or top-most macroblock of the picture, respectively.
dwSliceBitsInBuffer: The total number of bits in the corresponding bitstream data buffer which contain data for the current slice. This shall be a multiple of eight for MPEG-1, MPEG-2, MPEG-4, and in the slice structured mode of H.263, since slice start codes in these cases are byte aligned.
dwSliceDataLocation: The location of the first byte containing data for the slice (e.g., the location of a slice start code) in the bitstream data buffer. Shall be zero if the start of the slice is not within the corresponding bitstream data buffer.
bStartCodeBitOffset: The number of MSBs of the byte at dwSliceDataLocation which do not contain data for the slice (zero for MPEG-1, MPEG-2, and MPEG-4 since slice start codes in these specifications are byte aligned, but possibly nonzero for H.261 and H.263 which do not force byte alignment of GOB start codes). Shall be in the range of 0.7. Shall be zero if the start of the slice is not within the corresponding bitstream data buffer. MSBs marked irrelevant for the current slice by bStartCodeBitOffset may contain data for a previous slice in the case of start codes which are not byte aligned, as may occur for example in H.263 baseline mode.
wMBbitOffset: The number of bits of slice header data prior to the first bit of macroblock layer data in the bitstream buffer. For example, if wMBbitOffset is 83, the macroblock layer data for the slice starts after 83 bits slice header data. Shall be zero if the start of the slice is not within the corresponding bitstream data buffer.
wNumberMBsInSlice: The number of macroblocks of data in the slice, including skipped macroblocks. May be zero if this number cannot be readily determined from the header of the picture and the headers and initial macroblock data of the current and next slice in the bitstream, such as in the rectangular slice and arbitrary slice ordering submodes of the slice mode of H.263. Shall not be zero for H.261, MPEG-1, MPEG-2, MPEG-4, or when not using the rectangular slice or arbitrary slice submodes of the slice structured mode in H.263.
wQuantizerScaleCode: The quantizer scaling code from the slice level of the bitstream, as specified in the particular video coding specification (ranging from "1" to "31" for H.261, H.262/MPEG-2, H.263, MPEG-1, and MPEG-4).
wBadSliceChopping: A parameter containing a value defined as follows:
"0" : All bits for the slice are located within the corresponding bitstream data buffer.
"1" : The bits for the start of the slice are within the corresponding bitstream data buffer, and the bits for the end of the slice are not (because the bitstream data buffer is full).
"2" : The bits for the start of the slice are not within the corresponding bitstream data buffer (because the previous bitstream data buffer was full), but the bits for the end of the slice are within the corresponding bitstream data buffer.
"3" : The bits for the start of the slice are not within the corresponding bitstream data buffer (because the previous bitstream data buffer was full), and the bits for the end of the slice are also not within the corresponding bitstream data buffer (because the corresponding bitstream data buffer is also full).
Nonzero values of wBadSliceChopping should generally be avoided by the host software decoder.
If a bitstream buffer is used, the buffer simply contains raw bytes from a video bitstream to support off-host decoding including low-level bitstream parsing with variable-length decoding.
Certain restrictions are imposed on the contents of bitstream data buffers, in order to ensure that they are used in a manner consistent with the intended design:
Except for MPEG-1 and MPEG-2 use, the first bitstream data buffer for each picture shall start with all data if any (subsequent to the end of all data for any prior picture) that precedes the first slice for the current picture in the bitstream (sequence header, picture header, etc.).
For MPEG-1 and MPEG-2 use, the first bitstream data buffer for each picture shall start with the slice start code of the first slice of the picture (no sequence header, picture header, etc., since all relevant data is provided elsewhere in parameters of this specification).
If the start of a slice of bitstream data is located within a particular bitstream data buffer, the end of that slice shall also be located within that same data buffer unless the buffer which contains the start of the slice is completely full to its allocated size.
The decoder should manage the filling of the bitstream data buffers in a manner to generally avoid the straddling of slices across different buffers.
The second type of DirectX VA function is the loading of information for specifying an alpha blending surface to be blended with video data.
The DirectX VA connection configuration data structure for alpha blend data loading shall be defined as:
DXVA_ConfigAlphaLoad |
// 32b // 3 * 32b alignment // 8b |
dwFunction: Contains a DXVA_ConfigQueryOrReplyFunc describing the configuration data structure.
bConfigDataType: Specifies the type of alpha blend data to be used, as defined by:
"0": 16-entry AYUV palette with IA44 alpha blending surface,
"1": 16-entry AYUV palette with AI44 alpha blending surface,
"2": 16-entry AYUV palette with DPXD, Highlight, and DCCMD data,
"3": AYUV graphic surface.
Support for bConfigDataType being "0" or "2" are allowed for near-term accelerator implementations, but other formats are preferred and both formats "1" and "3" are expected to be required in the intermediate term.
It is a design requirement for all DirectX VA decoders to interoperate with all DirectX VA accelerators. This requires that every DirectX VA decoder be capable of operation with any member of a set of connection configurations, and that every DirectX VA accelerator be capable of operation with at least one connection configuration member of that set. The minimal interoperability configuration set for alpha blend data loading consists of all defined values of bConfigDataType unless otherwise indicated by the restricted mode profile.
An AYUV alpha blending surface is defined as an array of samples of 32 bits each. This surface can be used as the source for blending a graphic with decoded video pictures.
Each AYUV sample is structured as follows:
DXVA_AYUVsample |
// 8b (the MSBs) // 8b // 8b // 8b (the LSBs) |
bSampleAlpha8: The opacity of the sample. The value "0" indicates that the sample is transparent (so that the other entries have no effect on the resulting blended picture) and the value "255" indicates that the sample is opaque (so that the other entries completely determine the value of the resulting blended picture sample). For non-zero values of SampleAlpha8, the blend specified is to add (SampleAlpha8+1) times the graphic value to (255 SampleAlpha8) times the picture value and to the constant 128, and right-shift the result by eight places. For a zero value of SampleAlpha8, the blend specified is to use the picture value without alteration.
bY_Value, bCbValue, bCrValue: A color in Y, Cb, Cr color space (per ITU-T Rec. 601) as an unsigned quantity. Thus with BPP=8, the color Black is nominally specified by Y=16, Cb=Cr=128, and the color White is nominally specified by Y=235, Cb=Cr=128.
The width and height of the AYUV alpha blending surface are specified in the associated buffer description list.
A 16-entry YUV palette is defined as an array of 16 DXVA_AYUVsamples. Such a palette is used along with an IA44 or AI44 alpha blending surface. The palette array is sent to the accelerator in an AYUV alpha blending sample buffer (buffer type 8). In this case the SampleAlpha8 for each sample has no meaning and shall be "0".
The palette can be used to create the source for blending a graphic with decoded video pictures. This palette can be used to create the graphic source along with either:
an IA44/AI44 alpha blending surface or
a DPXD alpha blending surface, a highlight buffer, and DCCMD data.
Rather than loading just a 16-entry palette, an entire image graphic can simply be loaded directly as an AYUV image to specify the graphic content. In this case, the AYUV graphic is sent to the accelerator in an AYUV alpha blending sample buffer (buffer type 8).
An index-alpha 4-4 (IA44) alpha blending surface is defined as an array of 8-bit samples, each of which is structured as follows:
DXVA_IA44sample |
// 4b (the MSBs) // 4b (the LSBs) |
An alpha-index 4-4 (AI44) alpha blending surface is defined as an array of 8-bit samples, each of which is structured as follows:
DXVA_AI44sample |
// 4b (the MSBs) // 4b (the LSBs) |
SampleIndex4: The index into the 16-entry palette for the sample.
SampleAlpha4: The opacity of the sample. The value "0" indicates that the sample is transparent (so that the palette entry for SampleIndex4 has no effect on the resulting blended picture) and the value "15" indicates that the sample is opaque (so that the palette entry for SampleIndex4 completely determines the resulting blended picture). For non-zero values of SampleAlpha4, the blend specified is to add (SampleAlpha4+1) times the graphic value to (15 SampleAlpha4) times the picture value and to the constant 8, and right-shift the result by four places. For a zero value of SampleAlpha4, the blend specified is to use the picture value without alteration.
The width and height of the IA44 alpha blending surface are specified in the associated buffer description list.
A decoded PXD (DPXD) alpha blending surface is defined as a packed frame-structured array of two-bit samples, each of which is to be used as an index into a four-color table determined by highlight and DCCMD data. The result of the combination of DPXD, highlight, and DCCMD is equivalent to an IA44 surface, and is used with a 16-entry YUV palette for blending. If treated as an array of bytes, the index of the first two-bit sample shall be placed in the MSBs of the first byte of DPXD data, the next sample in the next two bits, the third sample in the next two bits, the fourth sample in the LSBs, the fifth sample in the MSBs of the next byte, and so on.
The DPXD alpha blending surface may be created from the PXD information on a DVD, which is in a run-length encoded format. The creation of DPXD for DirectX VA from PXD on a DVD thus requires the host to perform the run-length decoding of the raw data from the disk.
The stride of the surface shall be interpreted as the stride in bytes, not in two-bit sample units. However, the width and height shall be in two-bit sample units.
NOTE: The PXD on a DVD is in a field-structured interlaced format. The DPXD alpha blending surface defined herein is not. The host is therefore responsible for interleaving the data from the two fields if forming DPXD from DVD PXD data.
NOTE: For further clarification of DVD subpicture definition and data field interpretation, see DVD Specifications for Read-Only Disk: Part 3 - Video Specification (v. 1.11, May '99).
The Highlight data is formatted in a manner compatible with the DVD ROM specification, and is to be applied along with DCCMD data to a DPXD surface to create an alpha blending surface. The Highlight data buffer contents shall be as follows:
DXVA_Highlight |
// 16b // 16b // 16b // 128b |
wHighlightActive: Indicates whether or not a rectangular highlight area is currently activated ("0" indicates inactive, and "1" indicates active). If inactive, the highlight data shall have no effect on the content of the blended picture.
wHighlightIndices: Contains the four-bit palette index of the highlighted rectangular area on the subpicture for each of the two-bit indices in the DPXD. The four MSBs shall be for index 3, the next four bits shall be for index 2, the next four bits shall be for index 1, and the four LSBs shall be for index 0.
wHighlightAlphas: Contains the four-bit opacity of the highlighted rectangular area on the subpicture for each of the two-bit indices used by the DPXD. The four MSBs shall be for index 3, the next four bits shall be for index 2, the next four bits shall be for index 1, and the four LSBs shall be for index 0. The interpretation of the four-bit opacity values shall be as described above for SampleAlpha4.
HighlightRect: Contains the definition of the area of the highlight rectangle as a RECT data type.
The following restrictions apply to the RECT parameters:
"left" and "top" shall be greater than or equal to zero,
"right" and "bottom" shall be greater than "left" and "top", respectively, and
"right" shall not exceed 720, and "bottom" shall not exceed 576.
NOTE: There appears to be a difference between the way the DVD specification defines a subpicture rectangular area and the convention typically used in Microsoft applications. We follow the Microsoft convention for this specification, which is to say that a rectangle of width 10 and height 10 in the upper left corner of the picture is defined by Top = 0, Left = 0, Right = 10, Bottom = 10. The convention in the DVD specification appears to instead use Right = 9 and Bottom = 9.
The DCCMD (display control command) data is formatted in a manner compatible with the DVD ROM specification, and is to be applied along with Highlight data to a DPXD surface to create an alpha blending surface. The DCCMD data buffer contents shall consist of data formatted as a list of DVD display control commands (DCCMDs).
NOTE: For further clarification of DVD subpicture definition and data field interpretation, see DVD Specifications for Read-Only Disk: Part 3 - Video Specification (v. 1.11, May '99).
Alpha blend combination takes the last loaded alpha blend source information and combines it with a reference picture to create a blended picture for display. It uses an alpha blend combination data buffer to control this process.
The DirectX VA connection configuration data structure for alpha blend combination shall be defined as:
DXVA_ConfigAlphaCombine |
// 32b // 3 * 32b alignment // 8b // 8b // 8b // 8b // 8b |
dwFunction: Contains a DXVA_ConfigQueryOrReplyFunc describing the configuration data structure.
bConfigBlendType: Specifies the type of alpha blend combinations to be performed, as defined by:
"0": Front-end buffer-to-buffer blend,
"1": Back-end hardware blend.
Support for bConfigBlendType equal to "1" is allowed for near-term accelerator implementations, but support of bConfigBlendType being "0" is expected to be required in the intermediate term.
bConfigPictureResizing: Specifies whether the PictureSourceRect16thPel for subpicture blending may differ in width and height from the PictureDestinationRect (adjusted for the 1/16th-sample scaling of PictureSourceRect16thPel) and whether the parameters in PictureSourceRect16thPel may not be multiples of 16, thus requiring the source picture to be resampled by the accelerator. "1" indicates that resampling (whether for resizing and/or for sub-pixel accuracy) is supported and "0" indicates that it is not.
bConfigOnlyUsePicDestRectArea: Specifies whether the decoder can command operations which specify values for areas of an output generated destination picture that fall outside of the PictureDestinationRect. "0" indicates that areas outside of the PictureDestinationRect can be specified by the blend combination commands. "1" indicates that the host decoder cannot rely on the values of and cannot display any region of the blended surface outside of the area specified by PictureDestinationRect.
bConfigGraphicResizing: Specifies whether the GraphicSourceRect for subpicture blending may differ in size from the GraphicDestinationRect, thus requiring the alpha blending graphic to be resampled by the accelerator. "1" indicates that resizing is supported and "0" indicates that it is not.
bConfigWholePlaneAlpha: Specifies whether a whole-plane alpha parameter can be applied to the graphic data. "1" indicates that a whole-plane alpha can be applied, and "0" indicates that it cannot.
It is a design requirement for all DirectX VA decoders to interoperate with all DirectX VA accelerators. This requires that every DirectX VA decoder be capable of operation with any member of a set of connection configurations, and that every DirectX VA accelerator be capable of operation with at least one connection configuration member of that set. The minimal interoperability configuration set for alpha blend data loading consists of bConfigBlendType having a choice of values as determined by the DirectX VA restricted mode profile.
An alpha blend combination buffer governs the generation of a blended picture from a source picture and a graphic image with accompanying alpha blending information. In the event that the source and destination pictures are not in 4:4:4 format, every second sample of the graphic blending information (i.e., the first, third, fifth, etc.) shall be applied to the subsampled source chrominance information in the vertical or horizontal direction as applicable to produce the blended result.
DXVA_BlendCombination |
16b 16b // 128b // 128b // 128b // 128b // 16b // 8b // 8b // 32b |
wPictureSourceIndex: The index of the picture to be combined with the graphic. Shall be 0xFFFF if back-end hardware alpha blending is in use.
wBlendedDestinationIndex: The index of the combined picture to be created. Shall be 0xFFFF if back-end hardware alpha blending is in use. Shall not be equal to wPictureSourceIndex.
PictureSourceRect16thPel: The area of the source picture to be combined with the graphic, specified as a RECT data structure in units of 1/16th sample in the luminance component. In other words, the parameters in the RECT data structure are fixed-point representations which each have 28 bits before and 4 bits after the binary point. (This 1/16th sample accuracy allows PictureSourceRect16thPel to contain the same accuracy as the frame_centre_horizontal_offset and frame_centre_vertical_offset pan-scan parameters in MPEG-2.) If bConfigPictureResizing is "0", all parameters in PictureSourceRect16thPel shall be integer multiples of 16.
The following restrictions apply to the RECT parameters of PictureSourceRect16thPel:
"left" and "top" shall be greater than or equal to zero,
"right" and "bottom" shall be greater than or equal to "left" and "top", respectively,
if "right" is equal to "left" or "top" is equal to "bottom", a case only allowed if bConfigOnlyUsePicDestRectArea is "0", all of the RECT parameters shall have the value "0", indicating no use of the source picture, and
"right" and "bottom" shall not exceed 16 times the allocated width and height, respectively, of the uncompressed source picture surface.
For example, if PictureSourceRect16thPel is used to select an entire MPEG-2 decoded picture, the PictureSourceRect16thPel parameters can be computed as follows:
left = 0
top = 0
right = 16·horizontal_size
bottom = 16·vertical_size
PictureDestinationRect: The area of the destination picture to hold the area from the source picture, specified as a RECT data structure. If bConfigPictureResizing is "0", shall have the same width and height as the area specified by PictureSourceRect16thPel. If PictureDestinationRect differs in size from PictureSourceRect16thPel, the resampling method to be applied is not specified, but should have a quality at least equivalent to that of bilinear resampling.
The following restrictions apply to the RECT parameters:
"left" and "top" shall be greater than or equal to zero,
"right" and "bottom" shall be greater than or equal to "left" and "top", respectively,
if "right" is equal to "left" or "top" is equal to "bottom", a case only allowed if bConfigOnlyUsePicDestRectArea is "0", all of the RECT parameters shall have the value "0" and PictureSourceRect16thPel shall also specify all parameters having the value "0",
if bConfigBlendType = 0, "right" and "bottom" shall not exceed the allocated width and height, respectively, of the uncompressed destination picture surface, and
if bConfigBlendType = 1, "right" and "bottom" shall not exceed the allocated width and height, respectively, of the source graphic surface.
GraphicSourceRect: The area of the graphic picture to be combined with the source picture, specified as a RECT data structure. If alpha blend data loading uses bConfigDataType equal to "2", the following restrictions on GraphicSourceRect apply:
The "top" and "left" parameters of GraphicSourceRect shall be zero.
The "right" parameter of GraphicSourceRect shall be equal to the "End X-coordinate" minus the "Start X-coordinate" of the last preceding DVD SET_DAREA DCCMD, plus 1 to adjust for the differing rectangle interpretations as described in the note below.
The "bottom" parameter of GraphicSourceRect shall be equal to the "End Y-coordinate" minus the "Start Y-coordinate" of the last preceding DVD SET_DAREA DCCMD, plus 1 to adjust for the differing rectangle interpretations as described in the note below.
The following restrictions apply to the RECT parameters:
"left" and "top" shall be greater than or equal to zero,
"right" and "bottom" shall be greater than or equal to "left" and "top", respectively,
if "right" is equal to "left" or "top" is equal to "bottom", all of the RECT parameters shall have the value "0", indicating no use of the graphic picture, and
"right" and "bottom" shall not exceed the allocated width and height, respectively, of the graphic source surface. The allocated width and height are defined as 720 and 576, respectively, when bConfigDataType = "2".
GraphicDestinationRect: The area of the destination picture to hold the area from the graphic picture, specified as a RECT data structure. If bConfigGraphicResizing is "0", shall have the same width and height as the area specified by GraphicSourceRect. If GraphicDestinationRect differs in size from GraphicSourceRect, the resampling method to be applied to the graphic is not specified, but should have a quality at least equivalent to that of bilinear resampling of an AYUV surface representing the blending information.
The following restrictions apply to the RECT parameters of GraphicDestinationRect:
"left" and "top" shall be greater than or equal to zero, unless this requirement conflicts with the need to offset the graphic by 8 samples as described in the next paragraph and in Sections 3.7.3.3 and 3.7.3.4,
"right" and "bottom" shall be greater than or equal to "left" and "top", respectively,
if "right" is equal to "left" or "top" is equal to "bottom", then all of the RECT parameters shall have the value "0" and GraphicSourceRect shall also specify all parameters having the value "0",
if bConfigBlendType = 0, "right" and "bottom" shall not exceed the allocated width and height, respectively, of the uncompressed destination picture surface, and
if bConfigBlendType = 1, "right" and "bottom" shall not exceed the allocated width and height, respectively, of the source graphic surface, and
If alpha blend data loading uses bConfigDataType equal to "2" and if bConfigGraphicResizing is "0", the following additional restrictions on GraphicDestinationRect apply:
"top" shall be equal to the "Start Y-coordinate" of the last preceding DVD SET_DAREA DCCMD.
"left" shall be equal to either the "Start X-coordinate" of the last preceding DVD SET_DAREA DCCMD or to that value minus 8 (see Sections 3.7.3.3 and 3.7.3.4).
The "right" parameter of GraphicDestinationRect shall be equal to the value of "left" plus the "End X-coordinate" minus the "Start X-coordinate" of the last preceding DVD SET_DAREA DCCMD, plus 1 to adjust for the differing rectangle interpretations as described in the note below.
The "bottom" parameter of GraphicDestinationRect shall be equal to the value of "top" plus the "End Y-coordinate" minus the "Start Y-coordinate" of the last preceding DVD SET_DAREA DCCMD, plus 1 to adjust for the differing rectangle interpretations as described in the note below.
NOTE: There appears to be a difference between the way the DVD specification defines a subpicture rectangular area and the convention typically used in Microsoft applications. We follow the Microsoft convention for this specification, which is to say that a rectangle of width 10 and height 10 in the upper left corner of the picture is defined by "top" = 0, "left" = 0, "right" = 10, "bottom" = 10. The convention in the DVD specification appears to instead use an equivalent of "right" = 9 and "bottom" = 9.
wBlendDelay: If back-end hardware blending is in use, wBlendDelay contains a number of milliseconds of delay prior to the combination operation going into effect. Has no meaning and shall be "0" if front-end blending is in use.
bBlendOn: If back-end hardware blending is in use, the specified blending shall be applied from the time specified in a blending combination function with bBlendOn being "1" until the specified execution time of a new blending combination with bBlendOn being "1" or until the blending is disabled by a blending combination function with bBlendOn being "0". Has no meaning and shall be "0" if front-end blending is in use.
If back-end hardware blending is in use and bBlendOn is "0", the only other parameter in the alpha blend combination buffer that has meaning is wBlendDelay.
bWholePlaneAlpha: Contains an opacity multiplier for the alpha channel of the graphic. The value "0" indicates that the graphic is transparent (so that the graphic content has no effect on the resulting blended picture) and the value "255" indicates that the graphic content uses its full sample opacity. For non-zero values of bWholePlaneAlpha, the blend specified is to multiply the opacity of each location in the graphic content by (bWholePlaneAlpha+1)/256. For a zero value of bWholePlaneAlpha, the blend specified is to use the opacity specified in the graphic content without alteration. Shall be equal to "255" if bConfigWholePlaneAlpha is "0".
OutsideYUVcolor: Contains an indication of whether the areas outside of the PictureDestinationRect are to be generated as having a constant color for blending, and contains that constant color if so indicated. OutsideYUVcolor is formatted as a DXVA_AYUVsample and shall follow the following convention:
The value of bSampleAlpha8 in OutsideYUVcolor shall be "255" if the areas outside of the PictureDestinationRect are to be generated as a constant color for blending.
The value of bSampleAlpha8 in OutsideYUVcolor shall be "0" if either of the following two cases applies:
if the areas outside of the PictureDestinationRect are to be unaffected by the blend or
if the areas outside of the PictureDestinationRect cannot be used (as indicated by bConfigStayInPicDestRectArea having the value "1").
All other values for bSampleAlpha8 in OutsideYUVcolor are reserved for future use.
The value of bSampleAlpha8 in OutsideYUVcolor shall be "0" if bConfigStayInPicDestRectArea is "1".
If bSampleAlpha8 in OutsideYUVcolor is "0", the only area of the destination surface that will be affected by the blend is the part within the PictureDestinationRect. If bSampleAlpha8 in OutsideYUVcolor is "255", any area of the destination surface that is outside of the PictureDestinationRect but within the GraphicDestinationRect shall be generated by blending the graphic with the color specified in the non-alpha parameters of OutsideYUVcolor and the entire allocated area of the destination surface that falls outside of both PictureDestinationRect and GraphicDestinationRect shall be set to the color specified in the non-alpha parameters of OutsideYUVcolor. If bConfigBlendType is "1", the OutsideYUVcolor parameters shall be set to indicate blending with black by specifying bSampleAlpha8 = 255, bY_Value = 16, bCbValue = bCrValue = 128.
NOTE: When bConfigBlendType is "1" (back-end hardware blend), the actual blending operations may differ somewhat in method from what is described herein. However, the visual result shall be equivalent. For example, some resizing specified to map a video picture from a source to a destination size may instead be applied in an inverse manner map the graphic content to its proper location relative to the source video picture, with the blended result then displayed in a manner compensating for this difference in operation - but the ultimate visual effect of the result shall be as specified by the blend combination command.
For example, if PictureSourceRect16thPel is used to select an area specified by MPEG-2 video pan-scan parameters, the PictureSourceRect16thPel parameters can be computed as follows (provided that the values computed do not violate the restrictions above, as could be the case with some MPEG-2 pan-scan parameters and in particular can be the case with some MPEG-2 DVD content):
left = 8·(horizontal_size - display_horizontal_size) - frame_centre_horizontal_offset
top = 8·(vertical_size - display_vertical_size) - frame_centre_vertical_offset
right = left + 16·display_horizontal_size
bottom = top + 16·display_vertical_size
And the PictureDestinationRect would normally then be created as:
left = 0 or 8 (per Section 3.7.3.3 below)
top = 0
right = left + display_horizontal_size
bottom = top + display_vertical_size
For example, in DVD's use of MPEG-2 for 4:3 pan-scan within 16:9 pictures, the pan-scan parameters are restricted such that they will not violate the restrictions given above, and further such that in this case the pan-scan parameters follow the following restrictions:
horizontal_size = 720 or 704
vertical_size = 480 or 576
display_horizontal_size = 540
display_vertical_size = vertical_size
frame_centre_vertical_offset = 0
frame_centre_horizontal_offset has an absolute value
that is
less than or equal to 1440 for horizontal_size = 720 and
less than or equal to 1312 for horizontal_size = 704
The formulation described in Section 3.7.3.1 can then be applied directly in this case.
As an alternative example, the use of MPEG-2 on DVD for 704-wide pictures would specify a source rectangle if using Section 3.7.3.1 that exceeds the boundaries of the decoded picture (it specifies a display_horizontal_size of 720 that exceeds the decoded picture's horizontal_size of 704). In such a case, the host software decoder is responsible for cropping the source rectangle to keep it from reaching outside the allocated source area and for managing the destination rectangle to adjust for the cropping. In this case there are two basic alternatives. Both cases involve specifying a picture source rectangle (in 1/16th sample resolution) such that:
left = 0
right = 16·(left + horizontal_size) = 11264
The two alternatives are:
Either the picture destination rectangle can be set as:
left = (display_horizontal_size - horizontal_size) / 2 = 8
right = left + horizontal_size = 712
with the graphic used in a straightforward fashion
Or the picture destination rectangle can be set as:
left = 0
right = left + horizontal_size = 704
with the graphic destination rectangle being displaced to the left by 8 samples to compensate for the shifted picture destination
The second of these two alternatives is presumably the preferred one.
Another example is DVD's use of 352-wide pictures, which can be stretched to a width of 704 by use of a picture source rectangle (in 1/16th sample resolution) having:
left = 0
right = 16·(left + horizontal_size) = 5632
and a picture destination rectangle having two alternative formulations:
Either the picture destination rectangle can be set as:
left = 8
right = left + 2·horizontal_size = 712
with the graphic used in a straightforward fashion
Or the picture destination rectangle can be set as:
left = 0
right = left + 2·horizontal_size = 704
with the graphic destination rectangle being displaced to the left by 8 to compensate for the shifted picture destination
The second of these two alternatives is presumably the preferred one.
A rather trivial example is the use of MPEG-2 on DVD with 720-wide pictures, in which case the picture source rectangle (in 1/16th sample resolution) would have:
left = 0
right = left + 16·horizontal_size = 11520
and the destination rectangle would normally have:
left = 0
right = left + horizontal_size = 720
Another example is DVD's use of 16:9 video for 4:3 displays with letterbox framing. This can be supported by using a picture source rectangle having:
top = 0
bottom = top + 16·vertical_size = 7680 or 9216
and a picture destination rectangle having:
top = vertical_size / 8 = 60 or 72
bottom = 7 · vertical_size / 8 = 420 or 504
The picture resampling function is specified for such purposes as spatial scalable video coding, reference picture resampling, or resampling for use as an upsampled or display picture.
The resampling shall be performed as specified for H.263 Annex O Spatial Scalability or for H.263 Annex P with "clipping" at the picture edges, which is the same as in some forms of Spatial Scalability in MPEG-2 and MPEG-4. This uses simple two-tap separable filtering.
Picture resampling control does not require a connection configuration. Its operation requires only support of the appropriate restricted mode GUID.
As no connection configuration is needed for picture resampling control, no minimal interoperability set needs to be defined for its operation.
A single buffer type is defined for controlling the resampling process.
DXVA_PicResample |
// 16b // 16b // 16b 8b 8b // 32b // 32b // 32b // 32b // 32b // 32b |
wPicResampleRcontrol: Specifies the averaging rounding mode of the resampling operation. In the case of H.263 Annex O Spatial Scalability, this parameter shall be "1". (This corresponds to the value of RCRPR in H.263 Annex P which is equivalent to the upsampling needed for H.263 Annex O spatial scalability.) In the case of H.263 Annex P Reference Picture Resampling, this parameter shall be equal to the H.263 parameter RCRPR.
bPicResampleExtrapWidth, bPicResampleExtrapHeight: If nonzero and the padding method of using motion vectors over picture boundaries is used on the accelerator, any resampling shall include padding of the resampled picture as well - and this padding shall cover at least the specified width and height around each edge of the resampled picture regardless of the resampling operation which is performed.
dwPicResampleSourcePicIndex: Specifies the reference buffer to be resampled.
dwPicResampleDestPicIndex: Specifies the buffer to be used for the output of the reference picture resampling operation.
dwPicResampleSourceWidth, dwPicResampleSourceHeight: The width and height of the area of the source picture to be resampled to the destination picture.
dwPicResampleDestWidth, dwPicResampleDestHeight: The width and height of the area of the destination picture to contain the resampled data from the source picture.
dwPicResampleFullDestWidth, dwPicResampleFullDestHeight: The full width and height of the area of the destination picture to contain the resampled data from the source picture. Clipping shall be used to generate any samples outside the source resampling area. (This parameter is necessary for H.263 Annex P support of custom source formats in which the luminance width or height is not divisible by 16.)
A number of restricted mode profiles are defined in this section. The definition of these restricted mode profiles allows a decoder to determine the capabilities of an accelerator by obtaining an indication of which restricted mode profiles it supports.
Some of the restricted mode profiles defined herein are defined by simple subsets of the capabilities of other restricted mode profiles also defined herein (for example, the MPEG2_A profile is a subset of the capabilities of the MPEG2_B profile). In order to enforce a logical structure to the accelerator capability determination, it is a requirement that all accelerators that support some particular restricted mode profile shall also expose support of all other restricted mode profiles that are defined as subsets of the capabilities of the particular profile in question. For example, since the MPEG2_A profile is defined as a subset of the capabilities of the MPEG2_B profile, all accelerators that support the MPEG2_B profile shall also expose support of the MPEG2_A profile.
The restricted-mode profiles below are defined in anticipation of the combinations of features which might be likely to find widespread support. Microsoft is open to suggestions as to the addition of new profiles containing different combinations of features as are identified as practical and necessary. The restricted mode profile is defined herein to establish the set of video coding "tools" necessary for decoding - essentially to determine whether a given video data format can be decoded in some fashion using the API. In addition to establishing the set of "tools", the decoder must also know how to drive those tools through the API. That issue is discussed in the following section on configuration information.
If this API/DDI is operated without strict conformance to a restricted-mode profile as specified below, the wRestrictedMode field shall be set to a specific value to indicate this lack of restriction.
Connection mode and functions:
a) wRestrictedMode = 0xFFFF
b) All defined values of bDXVA_Func allowed
The H261_A restricted profile contains the set of features required for minimal support of ITU-T Rec. H.261 without acceleration support for H.261 Annex D graphics. Support of this profile is not an intermediate-term requirement. This set of features is defined by the following set of restrictions:
Connection mode and functions:
a) wRestrictedMode = "1"
b) bDXVA_Func = 1
bDXVA_Func = 1 (picture decoding) restrictions:
a) Frame-level restrictions:
BPP = 8
bSecondField = 0
(MacroblockWidth, MacroblockHeight) = (16, 16)
(WT, HT) = (8, 8)
bChromaFormat = '01' (4:2:0)
bPicStructure = '11' (frame structured)
bMVprecisionAndChromaRelation = '10' (H.261 integer-sample motion)
bPicExtrapolation = 0
bPicDeblocked = 0
bPic4MVallowed = 0
bPicOBMC = 0
bMV_RPS = 0
bPicScanFixed = 1
b) Macroblock-level restrictions:
MotionType = '10' (frame motion) if MotionForward = 1
MBscanMethod = '00' (zig-zag) if bConfigHostInverseScan = 0
FieldResidual = 0 (frame residual)
MotionBackward = 0 (no backward prediction)
c) Bitstream buffer restrictions:
The contents of any bitstream buffers shall contain data in the H.261 video format.
The H261_B restricted profile contains the set of features required for support of ITU-T Rec. H.261 without acceleration support for H.261 Annex D graphics but with deblocking filter post-processing support. Support of this profile is not intermediate-term requirement. This set of features is defined by the restrictions listed above for the H261_A restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = "2"
bDXVA_Func = 1 (picture decoding) restrictions:
a) bPicDeblocked may be either 0 or 1
b) wDeblockedPictureIndex shall not be equal to wDecodedPictureIndex when bPicDeblocked is 1
The H263_A restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a small specific set of enhanced optional capabilities. Support of this profile is an intermediate-term requirement. This set of features is defined by the following set of restrictions:
Connection mode and functions:
a) wRestrictedMode = "3"
d) bDXVA_Func = 1
bDXVA_Func = 1 (picture decoding) restrictions:
a) Frame-level restrictions:
BPP = 8
bSecondField = 0
(MacroblockWidth, MacroblockHeight) = (16, 16)
(WT, HT) = (8, 8)
bChromaFormat = '01' (4:2:0)
bPicStructure = '11' (frame structured)
bRcontrol = 0
bMVprecisionAndChromaRelation = '01' (H.263 half-sample motion)
bPicExtrapolation = 0
bPicDeblocked = 0
bPic4MVallowed = 0
bPicOBMC = 0
bMV_RPS = 0
bPicScanFixed = 1
b) Macroblock-level restrictions:
MotionType = '10' (frame motion) if MotionForward = 1
MBscanMethod = '00' (zig-zag) if bConfigHostInverseScan = 0
FieldResidual = 0 (frame residual)
H261LoopFilter = 0 (no H.261 loop filter)
MotionBackward = 0 (no backward or bidirectional motion)
c) Bitstream buffer restrictions:
The contents of any bitstream buffers shall contain data in the H.263 video format in "baseline" mode (no options, no PLUSPTYPE) or with Annex L information (to be ignored).
The H263_B restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a specific set of enhanced optional capabilities. Support of this profile is expected to be a longer-term requirement. This set of features is specified by the restrictions listed above for the H263_A restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = "4"
bDXVA_Func = 1 (picture decoding) restrictions):
a) Frame-level restrictions:
bPicDeblocked may be either 0 or 1
wDeblockedPictureIndex may or may not be equal to
wDecodedPictureIndex when bPicDeblocked = 1
bRcontrol may be either 0 or 1
bPicExtrapolation may be either 0 or 1
Pic4MV may be either 0 or 1
Motion4MV may be either 0 or 1
bPicScanFixed may be either 0 or 1
b) Macroblock-level restrictions:
MBscanMethod may be '00' (zig-zag), '01' (alternate vertical) or '10' (alternate horizontal) if bConfigHostInverseScan = 0
c) Bitstream buffer restrictions:
The contents of any bitstream buffers may also contain data in the H.263 video format with any subset of CPCF, CPFMT and Annexes D, I, N (single forward reference picture per output picture), and T.
The H263_C restricted
profile contains the set of features required for support of ITU-T Rec. H.263
and a specific set of enhanced optional capabilities. Support of this profile is an
intermediate-term expected to be a
longer-term requirement. This set of features is specified by the restrictions listed above for
the H263_B restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = "5"
bDXVA_Func = 1 (picture decoding) restrictions):
a) Frame-level restrictions:
bPicDeblocked may be either 0 or 1
wDeblockedPictureIndex may or may not be equal to wDecodedPictureIndex when bPicDeblocked = 1
b) Bitstream buffer restrictions:
The contents of any bitstream buffers may also contain data in the H.263 video format with any subset of CPCF, CPFMT and Annexes D, I, J, N (single forward reference picture per output picture), and T.
The H263_D restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a specific set of enhanced optional capabilities. Support of this profile is expected to be a longer-term requirement. This set of features is specified by the restrictions listed above for the H263_C restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = "6"
b) bDXVA_Func may be either 1 (picture decoding) or 4 (picture resampling)
bDXVA_Func = 1 (picture decoding) restrictions):
a) Frame-level restrictions:
bBidirectionalAveragingMode = 1 (H.263 bidirectional averaging) or 0 (MPEG-2 bidirectional averaging)
bMV_RPS may be either 0 or 1
b) Macroblock-level restrictions:
MotionBackward may be either 0 or 1
c) Bitstream buffer restrictions:
The contents of any bitstream buffers may also contain data in the H.263 video format with any subset of Annexes K, O, P (factor-of-two resizing with clipping only in one or both dimensions), S, and U.
bDXVA_Func = 4 (picture resampling) restrictions:
a) PicResampleSourceWidth and PicResampleDestWidth shall be equal or related by a multiplicative factor of 2 (or ½).
b) PicResampleSourceHeight and PicResampleDestHeight shall be equal or related by a multiplicative factor of 2 (or ½).
c) If PicResampleSourceHeight and PicResampleDestHeight are equal, PicResampleSourceWidth and PicResampleDestWidth shall be related by a multiplicative factor of 2 (or ½). If PicResampleSourceHeight and PicResampleDestHeight indicate an upsampling operation, PicResampleSourceWidth and PicResampleDestWidth shall not indicate a downsampling operation, and vice versa.
NOTE: Although H.263 requires only support of
bBidirectionalAveragingMode equal to "1" when MotionForward is "1" and
MotionBackward is "1", the H263_C D restricted
profile also allows bBidirectionalAveragingMode being "0". This is intended to allow the H263_C D restricted
profile to support MPEG-4 video as well as H.263 video - and MPEG-4 uses the
MPEG-1/MPEG-2 style of bidirectional averaging.
The H263_D restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a specific set of enhanced optional capabilities. Support of this profile is not an intermediate-term requirement. This set of features is specified by the restrictions listed above for the H263_D restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = "7"
bDXVA_Func = 1 (picture decoding) restrictions):
a) Frame-level restrictions:
bPicOBMC may be either 0 or 1
b) Macroblock-level restrictions:
if bPicOBMC is 1 and Motion4MV is 1, MotionBackward shall be 0
c) Bitstream buffer restrictions:
The contents of any bitstream buffers may also contain data in the H.263 video format with Annex F.
The H263_E restricted profile contains the set of features required for support of ITU-T Rec. H.263 and a specific set of enhanced optional capabilities. Support of this profile is not an intermediate-term requirement. This set of features is specified by the restrictions listed above for the H263_E restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = "8"
bDXVA_Func = 1 (picture decoding) restrictions):
d) Frame-level restrictions:
bPicBinPB may be either 0 or 1
b) Bitstream buffer restrictions:
The contents of any bitstream buffers may also contain data in the H.263 video format with any subset of Annexes G, M, V and W.
The MPEG1_A restricted profile contains a set of features required for support of MPEG-1 video. Support of this profile is an intermediate-term requirement. This set of features is defined by the following set of restrictions:
Connection mode and functions:
a) wRestrictedMode = "9"
b) bDXVA_Func = 1
bDXVA_Func = 1 (picture decoding) restrictions):
a) Frame-level restrictions:
BPP = 8
bSecondField = 0
(MacroblockWidth, MacroblockHeight) = (16, 16)
(WT, HT) = (8, 8)
bChromaFormat = '01' (4:2:0)
bPicStructure = '11' (frame structured)
bRcontrol = 0
bBidirectionalAveragingMode = 0 (MPEG-2 bidirectional averaging)
bMVprecisionAndChromaRelation = '00' (MPEG-2 half-sample motion)
bPicExtrapolation = 0
bPicDeblocked = 0
bPic4MVallowed = 0
bPicOBMC = 0
bMV_RPS = 0
SpecificIDCT = 0
bPicScanFixed = 1
b) Macroblock-level restrictions:
MotionType = '10' (frame motion)
MBscanMethod = '00' (zig-zag) if bConfigHostInverseScan = 0
FieldResidual = 0 (frame residual)
H261LoopFilter = 0 (no H.261 loop filter)
c) Bitstream buffer restrictions:
The contents of any bitstream buffers shall contain data in the MPEG-1 main profile video format.
The MPEG2_A restricted profile contains a set of features required for support of MPEG-2 video Main Profile. Support of this profile is an intermediate-term requirement. This set of features is defined by the following set of restrictions:
Connection mode and functions:
a) wRestrictedMode = 0xA
b) bDXVA_Func = 1
bDXVA_Func = 1 (picture decoding) restrictions):
a) Frame-level restrictions:
wRestrictedMode = 0xA
BPP = 8
(MacroblockWidth, MacroblockHeight) = (16, 16)
(WT, HT) = (8, 8)
bChromaFormat = '01' (4:2:0)
bRcontrol = 0
bBidirectionalAveragingMode = 0 (MPEG-2 bidirectional averaging)
bMVprecisionAndChromaRelation = '00' (MPEG-2 half-sample motion)
bPicExtrapolation = 0
bPicDeblocked = 0
bPic4MVallowed = 0
bPicOBMC = 0
bMV_RPS = 0
SpecificIDCT = 0
bPicScanFixed = 1
b) Macroblock-level restrictions
MBscanMethod may be '00' (zig-zag) or '01' (alternate vertical) if bConfigHostInverseScan = 0
H261LoopFilter = 0
c) Bitstream buffer restrictions:
The contents of any bitstream buffers shall contain data in the MPEG-2 main profile video format
bNewQmatrix[i] = 0, for i = 2 and 3
Since the MPEG2_A restricted profile is defined by a relaxation of the accelerator requirements of the MPEG2_B profile, all accelerators that support the MPEG2_B profile shall support the MPEG2_A profile.
The MPEG2_B restricted profile contains a set of features required for support of MPEG-2 video Main Profile and an associated DVD subpicture using front-end buffer-to-buffer sub-picture blending. Support of this profile is not an intermediate-term requirement. This set of features is defined by the restrictions listed above for the MPEG2_A restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = 0xB
b) bDXVA_Func = 1 (picture decoding) or 2 (alpha blend data loading) or 3 (alpha blend combination) (all of these values shall be supported)
c) bConfigBlendType = 0 (front-end buffer-to-buffer blending)
d) bConfigDataType = 0, 1, or 3 (at the accelerator's discretion)
e) Alpha blending source and destination surfaces are supported with width and height of at least 720 and 576, respectively.
The MPEG2_C restricted profile contains a set of features required for support of MPEG-2 video Main Profile. Support of this profile is an intermediate-term requirement. This set of features is defined by the restrictions listed above for the MPEG2_A restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = 0xC
b) one additional member having bConfigResidDiffHost=0 and bConfigResidDiffAccelerator=1 is added to the minimal interoperability set as described in Section 3.5.2.
Since the MPEG2_C restricted profile is defined by a relaxation of the accelerator requirements of the MPEG2_A profile (by allowing an accelerator to not support any of the members of the minimal interoperability set for MPEG2_A), all accelerators that support the MPEG2_A profile shall support the MPEG2_C profile. Similarly, all accelerators that support the MPEG2_D profile shall support the MPEG2_C profile.
The MPEG2_D restricted profile contains a set of features required for support of MPEG-2 video Main Profile and an associated DVD subpicture using back-end hardware sub-picture blending. Support of this profile is not an intermediate-term requirement. This set of features is defined by the restrictions listed above for the MPEG2_B restricted profile, except:
Connection mode and functions:
a) wRestrictedMode = 0xD
b) one additional member having bConfigResidDiffHost=0 and bConfigResidDiffAccelerator=1 is added to the minimal interoperability set as described in Section 3.5.2.
c) bConfigBlendType may be either 0 or 1 (at the accelerator's discretion)
d) bConfigDataType may support any value (at the accelerator's discretion)
Since the MPEG2_D restricted profile is defined by a relaxation of the accelerator requirements of the MPEG2_B profile (by allowing an accelerator to not support any of the members of the minimal interoperability set for MPEG2_B), all drivers that support the MPEG2_B profile shall support the MPEG2_D profile.
The IAMVideoAccelerator interface provides the mechanism for DirectX VA operation within Windows 2000. This section describes how the DirectX VA data formats shall be sent through this interface.
The following is intended as an expansion upon (and potentially an improvement of) the overview description of the use of IAMVideoAccelerator in the current Microsoft DirectShow® documentation set.
NOTE: This interface is available in Microsoft Windows® 2000.
The overlay mixer's input pin supports the IAMVideoAccelerator interface, and the decoder's output pin supports the IAMVideoAcceleratorNotify interface. The sequence of events for connecting the filter pins is as follows:
The filter graph manager calls decoder's output pin's IPin::Connect. An AM_MEDIA_TYPE is an optional parameter.
An AM_MEDIA_TYPE is a data structure that describes a type of media. It contains a majortype GUID (which in our case should be MEDIATYPE_Video), a subtype GUID (which in our case should be a video accelerator GUID), and a variety of other things. One of those things is a format type GUID containing information about the media, including in our case the width and height of an uncompressed video picture - most likely in an MPEG1VIDEOINFO, VIDEOINFOHEADER, MPEG2VIDEOINFO, or VIDEOINFOHEADER2 structure.
The AM_MEDIA_TYPE, if present, tells the decoder to operate using the specified media type, which may be "fully specified" or "partially specified". If "fully specified," the decoder would normally simply attempt to operate with that media type. If "partially specified," it will attempt to find a "fully-specified" compatible mode of operation that it can use to connect in a manner consistent with the "partially-specified" media type.
The ordinary manner for attempting to find a "fully-specified" media type to use for a connection is to simply run through a list of every "fully-specified" media type that the output pin supports which is compatible with the "partially-specified" media type and attempt to connect with each of them until successful. The process would normally be similar if no AM_MEDIA_TYPE is contained in the IPin::Connect call, but with the output pin needing to check all of its media types.
If the decoder wants to check whether a specific AM_MEDIA_TYPE (including a video accelerator GUID) is supported by the downstream input pin, it can call that pin's IPin::QueryAccept (with the video accelerator GUID as the subtype of the AM_MEDIA_TYPE) or it can simply attempt to connect to that pin as described in item 5 below.
If the decoder does not know which video accelerator GUIDs the downstream input pin supports and does not wish to propose just some particular candidate video accelerator GUID by calling the downstream input pin's IPin::QueryAccept, the decoder can call IAMVideoAccelerator::GetVideoAcceleratorGUIDs to get a list of the video accelerator GUIDs the pin supports.
For some particular video accelerator GUID, the decoder can call the downstream input pin's IAMVideoAccelerator::GetUncompFormatsSupported to get a list of the DDPIXELFORMATs that can be used to render a specific video accelerator GUID. The list returned should be considered to be in decreasing preference order (i.e., with the most preferred format listed first).
The decoder calls the downstream input pin's IPin::ReceiveConnection, passing it an AM_MEDIA_TYPE with the proper video accelerator GUID as the subtype of the media type. This sets up the connection for operation, including the creation of the uncompressed output surfaces (which are allocated using the width and height found in AM_MEDIA_TYPE, and the number of surfaces to allocate found by a call described below, and whatever other information the video accelerator has available and wishes to use for that purpose - such as the video accelerator GUID itself). If the downstream input pin rejects the video accelerator GUID or some other aspect of the connection, this can cause the IPin::ReceiveConnection to fail. If the IPin::ReceiveConnection fails, this is indicated in a returned HRESULT, and the decoder can try to make the call again, e.g., with a new video accelerator GUID in the AM_MEDIA_TYPE.
NOTE: This is another way (and the most definitive way) for the decoder to determine what is supported by the downstream input pin - simply calling IPin::ReceiveConnection and trying to connect, and then check whether the connection attempt was successful.
During the IPin::ReceiveConnection, the overlay mixer calls the decoder's IAMVideoAcceleratorNotify::GetUncompSurfacesInfo, passing it the video accelerator GUID and an AMVAUncompBufferInfo structure, in order to figure out how many uncompressed surfaces to allocate. The decoder returns an AMVAUncompBufferInfo structure.
The AMVAUncompBufferInfo data structure contains the minimum and maximum number of surfaces to be allocated of the particular type, and a DDPIXELFORMAT structure describing the pixel format of the surfaces to be allocated.
MINOR NOTE: Nothing is actually passed in to the decoder in the call to IAMVideoAcceleratorNotify::GetUncompSurfacesInfo other than the video accelerator GUID.
The overlay mixer calls the decoder's IAMVideoAcceleratorNotify::SetUncompSurfacesInfo, passing to the decoder the actual number of uncompressed surfaces that were allocated.
The overlay mixer calls the decoder's IAMVideoAcceleratorNotify::GetCreateVideoAcceleratorData to get any data needed to initialize the video accelerator.
The decoder calls IAMVideoAccelerator::GetCompBufferInfo, passing it a video accelerator GUID, an AMVAUncompDataInfo structure, and the number of compressed buffer types, to get in return a set of AMVACompBufferInfo data structures, one corresponding to each type of compressed data buffer used by the video accelerator GUID.
The AMVAUncompDataInfo structure contains the width and height of the decoded uncompressed data (in pixels) and the DDPIXELFORMAT of the uncompressed picture.
The AMVACompBufferInfo data structures returned each contain:
The number of compressed buffers needed of the specific type,
The width and
height of the surface to create (fields which may or may not have any actual
meaning):
NOTE: The DirectDraw surface allocation operation for the compressed buffers
does not currently provide for the width or height of these surfaces to be
greater than or equal to 215, although the surface allocation call
may not overtly fail if this limit is violated. Therefore, the driver should structure its requests for compressed
buffer memory to avoid such extreme sizes. For example, rather than requesting a buffer with width="1" and
height="65536", the driver should request a buffer of width= "1024" and
height="64".)
The total number of bytes to be used by the surface,
A structure of type DDSCAPS2 defining a DirectDrawSurface object, describing the capabilities to create surfaces to store compressed data,
A DDPIXELFORMAT, describing the pixel format used to create surfaces to store compressed data (a field which may or may not have any actual meaning).
NOTE: The overlay mixer's calls to some of the decoder's IAMVideoAcceleratorNotify interface methods may (and normally would) occur inside of the decoder's call to the overlay mixer's IPin::ReceiveConnection. Specifically, this applies to the following:
IAMVideoAcceleratorNotify::GetUncompSurfacesInfo,
IAMVideoAcceleratorNotify::SetUncompSurfacesInfo, and
IAMVideoAcceleratorNotify::GetCreateVideoAcceleratorData.
NOTE: To support dynamic format changes, the decoder may also call IPin::ReceiveConnection and other methods per above while the filters are connected and running. This capability is provided in order to support dynamic format changes (although not in the H.263 sense - as all data sets are started up again from scratch and any reference picture information is therefore lost).
The following is a description of IAMVideoAccelerator use during operation after initialization:
For each uncompressed surface, the decoder calls IAMVideoAccelerator::BeginFrame to begin the processing to create the output picture. When it does this, the decoder sends an AMVABeginFrameInfo structure.
The AMVABeginFrameInfo structure contains an index for a destination buffer, a pointer to some data to send downstream, and a pointer to a place where the accelerator can put some data for the decoder to read.
NOTE 1: The accelerator does not actually receive the destination buffer index, as it is translated by the overlay mixer before going downstream.
NOTE 2: IAMVideoAccelerator::BeginFrame can be called more than once between calls to IAMVideoAccelerator::EndFrame.
NOTE 3: There is no assumption within the interface operation that IAMVideoAccelerator::BeginFrame and IAMVideoAccelerator::EndFrame need to be called for the processing of every individual picture in the bitstream. The only real things that IAMVideoAccelerator::BeginFrame does, as far as the interface is concerned, is:
Create an association within the overlay mixer between an index and an uncompressed surface, and
Provide a means to call a specific function in a device driver (with support of a means of passing arbitrary data back and forth between the decoder and the device driver).
(However, in DirectX VA operation there is a requirement described below that IAMVideoAccelerator::BeginFrame and IAMVideoAccelerator::EndFrame do need to be called for the processing of every individual picture in the bitstream.)
For sending uncompressed data to the accelerator, the decoder calls:
IAMVideoAccelerator::GetBuffer to lock and obtain access to a specified buffer (if it has not previously called this to get that access). IAMVideoAccelerator::GetBuffer can also be used to get a copy of the contents of the last uncompressed output picture for which IAMVideoAccelerator::BeginFrame was called, providing IAMVideoAccelerator::EndFrame has not been called for that destination buffer index. If the DDI returns a render status of DDERR_WASSTILLDRAWING for the requested buffer, a sleep loop will be operated within IAMVideoAccelerator::GetBuffer until this condition is cleared. In order to call IAMVideoAccelerator::GetBuffer, the decoder will need some information from an AMVACompBufferInfo data structure which is obtained by calling IAMVideoAccelerator::GetCompBufferInfo.
IAMVideoAccelerator::Execute to indicate that the data in a set of compressed buffers as indicated in an array of AMVABUFFERINFO data structures should be processed. A function code dwFunction is passed to the driver in this call. A lpPrivateInputData pointer is also passed to some data to send downstream, and a lpPrivateOutputData pointer is passed to a place where the downstream process can put some data for the decoder to read.
IAMVideoAccelerator::ReleaseBuffer to indicate that the decoder has completed use of a specified buffer for the moment and no longer needs locked access to the buffer. (If the decoder wishes to continue to use the buffer, it can simply not call IAMVideoAccelerator::ReleaseBuffer for the moment, thus avoiding the need to call IAMVideoAccelerator::GetBuffer until it really intends to not use the buffer anymore.
IAMVideoAccelerator::QueryRenderStatus to check on whether a buffer is safe for reading from or writing to.
To complete output processing for a destination buffer, the decoder calls IAMVideoAccelerator::EndFrame. It can pass some arbitrary data downstream with this call, and that's essentially all that happens as a result of this call. It doesn't send a destination buffer index in this call, so it can't indicate to the accelerator precisely what destination buffer is completed unless this indication is contained in the arbitrary data that is passed.
To display a frame, the decoder calls IAMVideoAccelerator::DisplayFrame with the index of the frame to display and a IMediaSample structure containing start and stop time stamps.
Finally, the decoder should, upon completion of all processing, indicate completion of all remaining begun output frames by calling IAMVideoAccelerator::EndFrame and release all of its locked buffers by calling IAMVideoAccelerator::ReleaseBuffer for each unreleased buffer.
Due to the variety of types of data that can be decoded by DirectX VA, and the multiple decoding configurations supported within DirectX VA for each of these types of data (e.g., using bitstream buffers vs. host residual difference decoding vs. accelerator-based IDCT with and without encryption of each relevant type of buffer, etc.), we believe it would be somewhat ungainly to simply specify a unique GUID for every unique data type and decoding configuration. This would create a large number of GUIDs (e.g., hypothetically if there were 16 profiles of DirectX VA and 16 configurations possible for each, there would need to be 256 defined GUIDs - requiring 4k bytes of memory just to hold them all. This issue is the most difficult part of determining how to map DirectX VA into IAMVideoAccelerator, with the remainder of the operational definition mostly being quite straightforward. As a result, we specify a unique GUID only for each type of data (i.e., for each restricted mode profile) and allow an additional GUID to be associated with each type of encryption. The decoding configuration is then established between the decoder and accelerator by a lower-level subordinate negotiation using probing and locking operations to establish configurations for each type of DirectX VA function.
The precise mechanism of operation is as follows:
Each restricted mode profile defined herein has an associated DirectX VA GUID which can be supported by a downstream input pin's IPin::QueryAccept and IPin::ReceiveConnection and listed in IAMVideoAccelerator::GetVideoAcceleratorGUIDs.
Similarly, each encryption protocol type for use with DirectX VA shall have an associated encryption protocol type GUID which can be supported by a downstream input pin's IPin::QueryAccept and IPin::ReceiveConnection and listed in IAMVideoAccelerator::GetVideoAcceleratorGUIDs. The "no encryption" GUID DXVA_NoEncrypt shall not be sent in this list, as support for it is required and therefore implicit.
After calling IPin::ReceiveConnection to attempt a connection to the downstream input pin, the decoder's IAMVideoAcceleratorNotify::GetCreateVideoAcceleratorData shall return a pointer to a DXVA_ConnectMode data structure containing the connection mode information for the connection.
IAMVideoAccelerator::GetCompBufferInfo shall be called with *pdwNumTypesCompBuffers = 16 and shall return compressed buffer information based on the convention that the type number of each buffer as defined in Section 3.4 can be used directly as the zero-based index into the array of AMVACompBufferInfo data structures that is returned. This requires that for any buffer types that will not be used (including buffer type 0, since there is no defined use of that buffer type), the accelerator driver will provide AMVACompBufferInfo data structures with some form of "dummy" parameter values (e.g., dwNumCompBuffers=0, dwWidthToCreate=0, dwHeightToCreate=0, and dwBytesToAllocate=0).
DXVA function indications and associated data buffers are sent using IAMVideoAccelerator::Execute. The DXVA function is indicated in the dwFunction parameter of the call. The only DXVA functions that are relevant for initialization are DXVA_ConfigQueryOrReplyFunc and DXVA_EncryptProtocolFunc.
If dwFunction contains a DXVA_ConfigQueryOrReplyFunc, the lpPrivateInputData pointer for passing data to the accelerator in this call shall point to a configuration data structure, the lpPrivateOutputData pointer for receiving information from the accelerator shall point to an area where an alternative or duplicate configuration data structure can be placed, the pamvaBufferInfo pointer for an array of AMVABUFFERINFO shall be NULL, and dwNumBuffers shall be zero. The returned HRESULT contains the S_OK or S_FALSE indication, or E_FAIL or E_INVALIDARG or some other error indication HRESULT in the event of a severe problem in protocol execution (such as an invalid configuration parameter). All calls to IAMVideoAccelerator::Execute for all uses of DXVA_ConfigQueryOrReplyFunc shall precede all other calls to IAMVideoAccelerator::Execute.
If dwFunction contains a DXVA_EncryptProtocolFunc, the lpPrivateInputData pointer for passing data to the accelerator in this call shall point to an encryption protocol data structure that begins with DXVA_EncryptProtocolHeader, the lpPrivateOutputData pointer for receiving information from the accelerator shall point to an area where the data to be returned (such as a certificate) by the encryption protocol (which will begin with DXVA_EncryptProtocolHeader) can be placed, the pamvaBufferInfo pointer for an array of AMVABUFFERINFO shall be NULL, and dwNumBuffers shall be zero. The returned HRESULT contains S_OK as long as the encryption protocol is functioning normally and contains E_FAIL or E_INVALIDARG or some other error indication HRESULT in the event of a severe problem in protocol execution.
After initialization of operation in the above fashion, the actual operation of the decoder proceeds as follows:
IAMVideoAccelerator::BeginFrame shall be called prior to sending any bDXVA_Func with compressed buffer parameters which cause writes to an uncompressed destination surface. The purpose of IAMVideoAccelerator::BeginFrame in DirectX VA is to associate destination surfaces with index values and to notify the video accelerator driver of the intent to initiate writes a surface so that the driver can respond with an indication of whether the surface is ready to be overwritten. The AMVABeginFrameInfo structure passed in IAMVideoAccelerator::BeginFrame shall contain a pInputData pointer to single WORD wBeginPictureIndex parameter matching the frame index passed into IAMVideoAccelerator::BeginFrame (and dwSizeInputData shall be 2). This is the index to be used in a compressed buffer to command a write to the surface (i.e., to be used as wDecodedPictureIndex, wDeblockedPictureIndex, wBlendedDestinationIndex, or wPicResampleDestPicIndex). Each call to IAMVideoAccelerator::BeginFrame shall be paired with a corresponding call to IAMVideoAccelerator::EndFrame as described below. For example, if a compressed picture is to be decoded and then alpha blended using front-end buffer-to-buffer blending with a graphic image, there would be a call to IAMVideoAccelerator::BeginFrame prior to decoding the compressed picture into a surface specified in wDecodedPictureIndex, then a call to IAMVideoAccelerator::EndFrame after passing all compressed buffers used to decode the picture, then a second call to IAMVideoAccelerator::BeginFrame prior to commanding alpha blending combination of the graphic source with the decoded picture into a surface specified in wBlendedDestinationIndex, and then a second call to IAMVideoAccelerator::EndFrame after the alpha blend combination operation The pointer pOutputData in AMVABeginFrameInfo shall be NULL (and dwSizeOutputData shall be "0"). The HRESULT that is returned by IAMVideoAccelerator::BeginFrame shall be:
S_OK if the uncompressed surface is available and ready for use.
E_PENDING if the uncompressed surface is not yet available for use but will become available soon (i.e., if the uncompressed surface is being read for display and the reading/display of the surface has not yet been completed).
E_FAIL or E_INVALIDARG some other error indication only if a data format or protocol error is detected (such as an incorrect value of dwSizeInputData or a non-NULL pOutputData).
DXVA function indications and assocated data buffers are sent using IAMVideoAccelerator::Execute. More than one bDXVA_Func value may be indicated in the same call to IAMVideoAccelerator::Execute. The bDXVA_Func values shall be packed into the dwFunction parameter of the call, with the first function command in the eight MSBs, the next command in the next eight bits, etc., and with any remaining bits padded out with zeros. The value 0xFF for bDXVA_Func indicates that the bDXVA_Func is extended to two or four bytes. If the second byte is also 0xFF, this indicates that bDXVA_Func is extended to four bytes. If the upper four bits of the third byte are 0xF or 0x0, this indicates that bDXVA_Func contains a DXVA_ConfigQueryOrReplyFunc or DXVA_EncryptProtocolFunc. Multi-byte commands shall not indicate continuation past the end of dwFunction. Care must be taken by the decoder to ensure that no sequential dependencies are present between different bDXVA_Func values specified in the same call to IAMVideoAccelerator::Execute and that all potential race conditions (such as between picture decoding and sub-picture blending, between sub-picture loading and sub-picture blending, etc.) are prevented by appropriate calls to IAMVideoAccelerator::BeginFrame and IAMVideoAccelerator::QueryRenderStatus before subsequent calls to IAMVideoAccelerator::Execute.
If dwFunction contains a DXVA_ConfigQueryOrReplyFunc, the lpPrivateInputData pointer for passing data to the accelerator in this call shall point to a configuration data structure, the lpPrivateOutputData pointer for receiving information from the accelerator shall point to an area where an alternative or duplicate configuration data structure can be placed, the pamvaBufferInfo pointer for an array of AMVABUFFERINFO shall be NULL, and dwNumBuffers shall be zero. The returned HRESULT contains the S_OK or S_FALSE indication in response to the query, or E_FAIL or E_INVALIDARG some other error indication HRESULT in the event of a severe problem in protocol execution (such as an invalid configuration parameter). All calls to IAMVideoAccelerator::Execute for all uses of DXVA_ConfigQueryOrReplyFunc shall precede all other calls to IAMVideoAccelerator::Execute.
If dwFunction contains a DXVA_EncryptProtocolFunc, the lpPrivateInputData pointer for passing data to the accelerator in this call shall point to an encryption protocol data structure that begins with DXVA_EncryptProtocolHeader, the lpPrivateOutputData pointer for receiving information from the accelerator shall point to an area where the data to be returned (such as a certificate) by the encryption protocol (which will begin with DXVA_EncryptProtocolHeader) can be placed, the pamvaBufferInfo pointer for an array of AMVABUFFERINFO shall be NULL, and dwNumBuffers shall be zero. The returned HRESULT contains S_OK as long as the encryption protocol is functioning normally and contains E_FAIL or E_INVALIDARG or some other error indication HRESULT in the event of a severe problem in protocol execution.
If dwFunction does not contain a DXVA_ConfigQueryOrReplyFunc or DXVA_EncryptProtocolFunc, the lpPrivateInputData pointer for passing data to the accelerator shall point to a buffer description list. The first four entries in the buffer description list structure for each buffer (dwTypeIndex, dwBufferIndex, dwDataOffset, and dwDataSize) shall be equal to those in the AMVABUFFERINFO data structure for the same buffer. If bDXVA_Func equal to "1" is specified within dwFunction and bPicReadbackRequests is "1", the lpPrivateOutputData pointer for receiving information from the accelerator shall point to an area of persistent memory (e.g., heap) to be filled in with read-back macroblock data from the accelerator (such data not guaranteed to be present until IAMVideoAccelerator::QueryRenderStatus for writing to the same picture parameters buffer indicates S_OK as described in item 10 below). Otherwise, the lpPrivateOutputData pointer for receiving information from the accelerator shall point to a single DWORD to be set to one of the following indication values (particularly useful for reporting bitstream errors in off-host VLD operation):
"0": Execution OK,
"1": Minor problem in data format encountered,
"2": Significant problem in data format encountered,
"3": Severe problem in data format encountered,
"4": Other severe problem encountered.
If either type of "severe" problem is indicated, the software decoder should cease to operate the function(s) unless corrective action can be taken. This data returned from the accelerator shall not be read by the host until after the buffer rendering for the picture has completed, as can be tested by IAMVideoAccelerator::QueryRenderStatus. The returned HRESULT contains S_OK as long as the interface operation is functioning normally and may return E_FAIL or E_INVALIDARG or some other error indication HRESULT in the event of a severe problem.
The picture decoding parameters buffer shall be among the first buffers sent for the decoding of each picture when using IAMVideoAccelerator::Execute with bDXVA_Func equal to "1", and all the buffers for decoding a picture in a bitstream shall be sent before any buffers for decoding subsequent pictures. If a macroblock control command buffer is sent, a corresponding residual difference data buffer shall be sent (containing data for the same macroblocks) with the same IAMVideoAccelerator::Execute call.
IAMVideoAccelerator::EndFrame shall be called after all compressed buffers have been sent that will cause the creation of the output content in a specified uncompressed surface (i.e., a result of operations specified for wDecodedPictureIndex, wDeblockedPictureIndex, wBlendedDestinationIndex, or wPicResampleDestPicIndex). The purpose of this call to IAMVideoAccelerator::EndFrame is to notify the video accelerator hardware that all data needed for the specified operation has been sent. The pointer to data to send downstream through IAMVideoAccelerator::EndFrame shall point to a single WORD wEndPictureIndex containing the index of the frame that is ending. This parameter shall match the wBeginPictureIndex value specified in the prior call to IAMVideoAccelerator::BeginFrame before the sending of the relevant compressed buffers. Subsequent to a call to IAMVideoAccelerator::EndFrame, the uncompressed surface with index wEndPictureIndex shall not be found in any picture's wDecodedPictureIndex, wDeblockedPictureIndex, wBlendedDestinationIndex, or wPicResampleDestPicIndex until after another call to IAMVideoAccelerator::BeginFrame is issued to announce that this will occur and an S_OK has been returned as a result. However, that destination surface index may occur in subsequent read access commands such as wForwardRefPictureIndex, wBackwardRefPictureIndex, wPicResampleSourcePicIndex, or bRefPicSelect[i]. The HRESULT returned by IAMVideoAccelerator::EndFrame shall be S_OK unless there is some kind of data format or protocol error, in which case it can be E_FAIL or E_INVALIDARG or some other error indication.
In the case of field based decoding (e.g. in MPEG-2 bitstreams) there will not be a one-to-one mapping of functional pictures in the bitstream to uncompressed surfaces in the accelerator interface. When decoding field pictures in an MPEG-2 bitstream, there will be two "pictures" decoded to produce one complete output uncompressed surface. In the DirectX VA interface definition, each frame corresponds to each use of wDecodedPictureIndex, wDeblockedPictureIndex, wBlendedDestinationIndex, or wPicResampleDestPicIndex. Thus two pairs of calls to IAMVideoAccelerator::BeginFrame and IAMVideoAccelerator::EndFrame are required for the decoding of field pictures into output uncompressed surfaces.
A call to IAMVideoAccelerator::QueryRenderStatus with dwFlags equal to zero which occurs sometime after a call to IAMVideoAccelerator::EndFrame with a particular wEndPictureIndex and checks the status of a buffer that was sent that contained the wEndPictureIndex in wDecodedPictureIndex, wDeblockedPictureIndex, wBlendedDestinationIndex, or wPicResampleDestPicIndex will return an S_OK indication if all of the operations to write the data to the uncompressed surface have completed and will return E_PENDING if the operation has not yet completed. E_FAIL or E_INVALIDARG or some other error indication may be returned in the event of a protocol error.
This section contains a description of the Motion Compensation device driver side of the DirectX VA interface [Reference: Windows 2000 DDK - Graphics Drivers - Design Guide - 3.0 DirectDraw DDI - 3.12 Motion Compensation]. The following items refer to entries accessed through the DD_MOTIONCOMPCALLBACKS structure:
At the start of the relevant processing, the device driver's DdMoCompCreate is used to notify the driver that the software decoder will start using a video acceleration object.
GUIDs received from IAMVideoAccelerator::GetVideoAcceleratorGUIDs originate from the device driver's DdMoCompGetGUIDs.
A call to the downstream input pin's IAMVideoAccelerator::GetUncompFormatsSupported returns data from the device driver's DdMoCompGetFormats.
The DXVA_ConnectMode data structure from the decoder's IAMVideoAcceleratorNotify::GetCreateVideoAcceleratorData is passed to the device driver's DdMoCompCreate.
Data returned from IAMVideoAccelerator::GetCompBufferInfo originates from the device driver's DdMoCompGetBuffInfo.
Buffers sent using IAMVideoAccelerator::Execute are received by the device driver's DdMoCompRender.
Use of IAMVideoAccelerator::QueryRenderStatus invokes the device driver's DdMoCompQueryStatus. A return code of DDERR_WASSTILLDRAWING from DdMoCompQueryStatus will be seen by the host decoder as a return code of E_PENDING from IAMVideoAccelerator::QueryRenderStatus.
Data sent to IAMVideoAccelerator::BeginFrame are received by the device driver's DdMoCompBeginFrame. A return code of E_PENDING is needed from DdMoCompBeginFrame in order for E_PENDING to be seen by the host decoder in response to IAMVideoAccelerator::BeginFrame.
Data sent to IAMVideoAccelerator::EndFrame are received by the device driver's DdMoCompEndFrame.
At the end of the relevant processing, the device driver's DdMocompDestroy is used to notify the driver that the current video acceleration object will no longer be used, so that the driver can perform any necessary cleanup.
Some details about the interactions between H.263's OBMC, 4MV, B, EP, and B in PB may be helpful:
a) OBMC cannot be used in a H.263 B or EP picture (H.263 Section 5.1.4.5 item 2)
b) OBMC cannot be used in the B part of a H.263 PB picture (H.263 Section F.1 paragraph 1)
c) 4MV cannot be transmitted in a H.263 B or EP picture (Only one MVFW and MVBW in syntax diagrams, and no 4MV macroblock types in H.263 Tables O.1 and O.2)
d) If 4MV is used in the macroblock of a H.263 P picture which is used as the future reference macroblock for "direct" prediction in a H.263 B picture, the OBMC is not used in the direct prediction, because four motion vectors are used according to H.263 Annex M which uses them like H.263 Annex G which doesn't apply the OBMC (H.263 Section O.4 paragraph 2 and item 2 above it).
Thus, H.263 would never require both OBMC and backward prediction at the same time, and never uses 4MV in a backward direction.
|