CPUINFO
Technical SpecificationsAPPNOTE
Version 1.2
Intel Corporation
January 1997
Table of Contents
1. Introduction
1.1 Motivation
1.2 Abstract
1.3 Usage
2. System Requirements
2.1 CPUINF16.DLL
2.2 CPUINF32.DLL
3. Theory
3.1 Processor and Feature Identification
3.2 Processor Clock Frequency
4. CPUINFO Code Overview
4.1 CPUINF16.DLL
4.1.1 CPUID.C
4.1.2 TIMEPROC.ASM
4.1.3 SPEED. C
4.2 CPUINF32.DLL
4.2.1 CPUID.C
4.2.2 SPEED.C
4.3 Sample Programs: TSTDLL16 & TSTDLL32
4.3.1 TSTDLL32.C
4.3.2 TSTDLL16.C
5. Disclaimer
Multimedia applications have a tendency to
push the capabilities of any PC. Since
you are able to do more interesting things on high-performance machines, you
need to develop 'scaleable' applications to enhance your users'
experiences. If you know that they have
high-performance systems, then your application can do more exciting
things. For customers with slower
systems, some features will have to be scaled down or removed. CPUINFO allows
your code to detect CPU type and performance at runtime, so you can develop high-performance
scalable applicationsapplications which
automatically adapt their features to each user's system.
Background and
OrganizationPart of the CPUINFO
package is a collection ofThe CPUINFO package
includes two DLLs (Dynamic Link Libraries) for WindowsMicrosoft
Windows* Operating Systems (Windows 3.1, Win32s*, Windows for
Workgroups*, Windows 95*, and Windows NT*). These DLLs test for the presence of a Genuine Intel processor, the
family, model, feature set, and processor clock frequency. Also included is a Windows example program
for each of thefor DLLs to show sample implementation
of the DLL calls.
The DLLs are CPUINF16.DLL and CPUINF32.DLL
which provide, providing five functions accessible
by 16 and 32-bit Windows programs respectively. These functions are CPU identification, CPU extended identification, CPU
clock frequency, CPU features, and the CPU Time Stamp Read.
The information returned by cpuspeed() can be unreliable under some circumstances. On systems which do not support the Time Stamp register, for example, the speed measurement is simply a trivial benchmark. Therefore, although cpuspeed() usually returns accurate data, it is likely that there are processors which will return incorrect or skewed values. In addition, the DOS version of this code returns low values for Pentium® Processors with clock speeds in excess of 120 MHz running Windows NT. Due to these possible errors on some systems, it is suggested not to return information about the processor speed to the user. The suggested use of this program is to set default values of which features (i.e. stereo sound, high graphic detail, or whatever it is that consumes large amounts of processor cycles and may need to be disabled on slower processors) to use in your software upon program start-up or installation.
While this program works on all current Intel processors, the code is designed to not analyze processor speed if it has been verified that the host processor is an Intel imitator.
****Mark, what should
I say here?****
Requires a system with at least an Intel386TM(TM)
processor running Windows 3.1. This
16-bit DLL also runs correctly under Win32s, Windows for
Workgroups, Windows 95 , and Windows NT .
Requires a system
with at least an Intel386 (TM)processor system running Windows
3.1. This 32-bit DLL also runs
correctly under Win32s, Windows for Workgroups with 32-bit
extensions, Windows 95 , and Windows NT .
Three of the CPUINFO DLL functions (wincpuid(), wincpuidext(), and wincpufeatures())
use the Intel CPUID instruction
(opcode 0FA2h) if supported. CPUID was created to let software
determine information like the CPU generation, manufacturer, or existence of an
OverdriveTM processor. It
also identifies features like floating point, page size, and others to be
defined for future processors. The CPUID instruction works on newer
Intel486 TM), and all
Pentium(R), and all Pentium and later
processors. On older processors, the
type must be detected otherwise -- by reading bits in the FLAGS and EFLAGS
register.
More details of the CPUID instruction can be found in Intel application note AP-485, available on the Intel Architecture Labs CD-ROM, or on the Intel FTP site (ftp.intel.com), or as hardcopy via Intel Literature, order #241618.
The CPUINFO DLL function for processor clock frequency determination, cpuspeed(), is capable of calculating a raw clock frequency and normalizing that frequency to the nearest value. The 16-bit and the 32-bit version each have two algorithms to determine clock speed. Which algorithm gets used, depends upon whether the host processor supports Time Stamp register reads (RDTSC, Read Time Stamp Counter, opcode 0F 31h).
Currently the Time Stamp read on a Pentium processor is a Ring 0 operation. In Windows, applications usually run in Ring 3, however this operation can be emulated. On a Pentium Pro processor, the read Time Stamp opcode is a valid Ring 3 operation. Thus the Time Stamp Counter can be used on both the Pentium processor and the Pentium Pro processor from Windows. In DOS, many memory managers cause applications to run in Ring 3 with no method for emulating a Ring 0 opcode, thus the DOS version of this code can use the Time Stamp Counter only on Pentium Pro processor or above.
The first algorithm, which is used on systems which do not support the Time Stamp Counter, times a known test instruction sequence as it runs on the processor. In the 16-bit DLL and in the DOS code, the timing is performed with an external hardware timer built into the PC chipset. On the 32-bit code, timing is performed by calls to the Win32 API Call QueryPerformanceCounter, which directly reads the high-resolution performance counter. The product of the timer frequency and the number of machine cycles executed is divided by the number of timer ticks. This yields the processor frequency (see Equation 1). This method has drawbacks:
a
few drawbacks though. One, the timer
available for the timing of the instruction sequence1.
The external time source is not fast compared to
the processor clock frequency. A 90MHz Pentium(R)
processor clock is 5 times
faster than the available timer which is approximately 2MHz. Two, if the known instruction sequence1.19
MHz.
is
bumped by the CPU2. If the timing task is pre-empted by the
scheduler of the operating system, inaccurate timing results will occur -- because the hardware timer keeps ticking
while the test-sequence of instruction execution is suspended. Due
to this, several samplings of delay are taken and only the shortest delay is
used to determine the amount of time necessary to execute the test instruction
sequence.
The second method, is used on systems which
do support reading of the Time Stamp Register. This algorithm uses the RDTSC instruction to count the number of CPU clock cycles
over a time period. The external time is measured by
calls to timeGetTime in the 16-bit
code, by a direct peek of the PC Chipset Counter in the DOS code, and by QueryPerformanceCounter
in the 32-bit code. First, an initial external time is recorded. The code then
grabs several more time readings until it reaches some predetermined number of
ticks past the first read. This is to ensure that the initial time reading was
close to the tick transition and not somewhere in the middle of two ticks.
Next, the Time Stamp is read. The Time Stamp register counts clock ticks since
the system was powered on. The next step simply reads the external timer (using
timeGetTime or QueryPerformanceCounter depending on which DLL is being used or by
peeking the PC Chipset Timer in the case of the DOS code) until some predefined
number of ticks have passed. After the final tick of the test is read, a final
read of the Time Stamp register is performed. From these reads, the code can
find the number of clock cycles and the number of external ticks between the
beginning and end of the test. With these values, the clock frequency can be
computed using the ratio of the difference of the two RDTSC samples over the difference of the two external timer
samples. Then the product of this
quantity and the external timer frequency gives the CPU clock frequency
(Equation 1).
where:
cycles = difference between Time Stamp Reads or number
of BSF instruction cycles consumed
and
ticks = difference between timeGetTime/QueryPerformanceCounter reads.
Equation 1: Calculating CPU Frequency
This method has the advantages that it is very accurate and that the critical sections are very short and are unlikely to be rescheduled by the operating system's CPU scheduler. However, since it is possible to reschedule during these critical periods, a number of iterations of the sampling are performed. The data from several samplings are then compared. If the values are not relatively close to one another, another sampling is made and the oldest sampling value is disregarded. This guards against returning information that is corrupted by rescheduling.
In addition to deriving accurate raw frequencies, code to normalize the raw processor speed to the nearest known value is also included. The DLL returns both the raw and the normalized processor speed as well as the internal clock cycles and the number of microseconds required to process those cycles. The DOS code simply prints both the raw and normalized clock frequency to the screen.
The tolerances on this code can be adjusted by adjusting the loops, tolerances, and the duration for testing in the header files (the SPEED.H file for the Windows version and the CPUSPEED.H in the DOS version). To understand the precision of the values returned, the test code provided can run multiple iterations at a time (32-bit code will run 1000 iterations, 16-bit code will run 100 iterations). The results of these runs are then arranged statistically to give you information on the raw values returned. Note however that this can take a while depending on what system is being tested, so be patient. The average amount of time to process per test is given after the 1000/100 test iterations have been run. If any raw frequencies differ from the normalized value by more than 2 MHz, the raw frequencies are printed to the screen at the conclusion of the test.
Table 1: Algorithm Matrix
The CPUINFO code is a mixture of C and 16-bit assembly code.
The DLL source, targeted at Microsoft Visual C++* 1.5 and 2.2, is included so that it can be embedded if desired.
For the 16-bit DLL, the main modules are:
CPUID.C
TIMEPROC.ASM
SPEED.C
CPUID.H
SPEED.H
This code was compiled in Microsoft Visual C++ 1.5 using the 'minimize size' optimization setting. If these same settings and compilations are not used, it is possible that the BSF instruction sequence could take more or less cycles than is currently coded into the SPEED.C file (the variable which defines the number of cycles per BSF is processor_cycles). If this DLL is recompiled, it is suggested that you use the same settings initially used, otherwise, the number of cycles may have to be adjusted if extreme accuracy is needed.
Figure 1: CPUINF16.DLL Code Structure
This module performs the wincpuid(), wincpuidext(), and wincpufeatures() functions using the CPUID instruction as explained earlier. If the host processor does not support the CPUID opcode, the processor type is determined by reading bits in the FLAGS and EFLAGS register.
The function wincpuidsupport() returns whether the host processor supports the CPUID opcode.
The function wincpuid() attempts to determine whether the host processor is a Genuine Intel processor or an Intel imitator. This determination is done on systems supporting CPUID by reading the vendor i.d. which contains the vendor name. On systems which do not support CPUID, the state of the carry flag after a divide is used to distinguish between Genuine Intel and imitator. This determination, however, is not perfect. All Intel processors which do not support CPUID will clear the carry flag after a division, however, most Intel imitators will not update the carry flag during a divide. This can be used to verify whether a imitator processor is being used. This cannot however determine whether a Genuine Intel processor is in use because some Intel imitators do reset the Carry Flag in the same manner as the Genuine Intel processors.
If it has been verified by either of the above methods that an Intel imitator is the host processor, the function wincpuid() will 'OR' the processor family returned with the CLONE_MASK (intially set to 0x8000). Using this initial setting for the constant, the uppermost bit of the processor family will be set to one if the host processor is a verified Intel imitator. Unless the CPUID instruction was used, a zero does not necessarily mean that a Genuine Intel processor is the host processor, as explained above do to the inability of the carry after divide method to distinguish between Genuine Intel and some Intel imitators (i.e. AMD).
The wincpuidext() and the wincpufeatures() functions also act differently if the CPUID opcode is not supported. The processor extensions will only return the family type. The type, model, and stepping will all be zeroes. The feature flags returned will all be set to 0.
This module also contains the winrdtsc() function which simply uses the RDTSC opcode and returns it to the user in a two DWORD variable structure.
This assembly language module is compiled by MASM 6.1 and then the object file created is linked into the Microsoft Visual C++ 1.5 project. This module is called by the SPEED.C module. It contains the function Time_Processor_bsf() that times a loop of 4000 (number of iterations (ITERATIONS) can be defined in SPEED.H) BSF instructions (Bit Scan Forward). The BSF instruction operating on the EAX register was chosen for non-pairing (on Pentium) and a consistently long cycle count (approximately 43 cycles per iteration) that is unaffected by memory accesses besides instruction fetch. The timing is accomplished by using the PC chipset timer. The number of timer ticks which elapse during the sequence of BSF instructions is returned by this function.
This module performs the processor speed computations and normalization. The first task is to determine the features of the host processor. If the host processor does not support Time Stamp Registers, the code will execute Time_Processor_bsf(). Based on the processor family, (retrieved from the wincpuid() function) the code will determine the number of machine cycles that the multiple iterations of the BSF instruction sequence should consume. This number is the product of the number of BSF instructions that the Time_Processor_bsf() function performs and the number of clock cycles per BSF instruction. The default number of clock cycles per BSF instruction is stored in the processor_cycles array, and can be overridden by passing the cycle count in integer form to the cpuspeed() function ( i.e. cpuspeed(43) on a Pentium processor system).
The next task is to call Time_Processor_bsf() ten times (the number of times this loop is run can be changed by setting the constant SAMPLINGS in SPEED.H). The minimum run time for these ten samplings is then used as the time necessary to process the instruction sequence. Since the operating system can interrupt the instruction sequence and use the process cycles for other things and since other interruptions can occur, it would be possible to sometimes get times that are much longer than what was actually used for the instruction sequence alone. This is the reason that multiple samplings are run and that only the shortest sampled value is used.
If the host processor does support Time Stamp Registers, CPUINF16 will utilize a call to timeGetTime. The code will then go into a tight loop of querying timeGetTime until 3 ticks have passed since that first query. Doing this will ensure that the Time Stamp is read both for the starting and the ending of the test at approximately the same point in relationship to the changing of the counter. After this 3rd (constant is INITIAL_DELAY in SPEED.H) tick is read, the Time Stamp register is read. The last timeGetTime reading is now the initial external time for the sampling and the Time Stamp read is the initial clock cycle count. The code will again go into a tight loop of querying timeGetTime until 60 (constant is SAMPLING_DELAY in SPEED.H) ticks have passed. Once 60 ticks have passed, one last Time Stamp reading is made. The processor speed can now be determined based on the number ticks (which can be converted to microseconds of operation) and the number of clock cycles executed. In this algorithm there is less likelihood of error, since the critical stage of computation in which the OS can interrupt is smaller (between the timeGetTime and the corresponding read of the Time Stamp). However, the code can be interrupted at these points. In addition, Windows NT emulates the timeGetTime call, which makes it a little less predictable. Due to these possible losses of accuracy, at least three samplings are taken and then compared. If all three values are within 1 MHz of their average, then the average processor speed is returned. If any one of the three deviates more than 1 MHz from the average, however, a new sample is taken and the earliest sampling is disregarded. This assures that at least three consecutive consistent values were measured before the returned value was decided upon. To prevent a possible infinite loop at this point, the maximum number of times a sampling can be taken has been set to 20 (constant is MAX_TRIES in SPEED.H).
The last task is to normalize the clock frequency to one of the known frequency values for the detected family of Intel processors. This is accomplished by the NormFreq(). An example of this is a computed frequency of 118 MHz on a Pentium processor. Since there is no 118 MHz Pentium processor, this value is normalized to 120 MHz. This code simply scans the list of possible normalized processor speeds and sees whether the current raw frequency is less than or equal to the current normalized value plus some constant, if not, it moves on to the next higher possible normalized value. The constant referred to in the normalization routine is currently set to 5 (constant is TOLP5 in SPEED.H) on Pentium processors and above, to 4 (constant is TOL486 SPEED.H) on Intel 486, and to 2 (constant is TOL386 in SPEED.H) on Intel 386. The reason for using different constants is that the Intel 386's closest processor speed stepping is 16 MHz to 20 MHz, versus the Pentium processor whose closest stepping (disregarding the 66 MHz) is 90 MHz to 100 MHz.
For the 32-bit DLL, the main modules are:
CPUID.C
SPEED.C
CPUID.H
SPEED.H
This code was compiled in Microsoft Visual C++ 2.2 using the 'minimize size' optimization setting. As before, if these same settings and compilations are not used, it is possible that the BSF instruction sequence could take more or less cycles than is currently coded into the SPEED.C file (the variable which defines the number of cycles per BSF is processor_cycles). If this DLL is recompiled, it is suggested that you use the same settings initially used, otherwise, the number of cycles may have to be adjusted if extreme accuracy is needed.
Figure 2: CPUINF32.DLL Code Structure
See the explanation of the CPUID.C module for the 16-bit DLL.
The algorithms for this code is identical to the algorithms in the 16-bit version except that all external time reads are now done using the QueryPerformanceCounter call instead of directly reading the PC Chipset timer or using the timeGetTime() routine. This provides for a higher precision. In addition the Time_Processor_bsf() assembly code is now inline assembly instead of assembly compiled by MASM and linked in.
The TSTDLL32.EXE code has the following modules:
TSTDLL32.C
TSTDLL32.H
This module is a simple example/test program using the CPUINFO DLL. This module provides a simple Windows menu interface (figure 4) for each of the CPUINFO DLL functions. For each function called through the user interface, the test program displays a dialogue box with the returned information (figure 5). As noted above, the CPU Speed function also has the option of running multiple queries of the cpuspeed() function (1000 in the 32-bit version and 100 in the 16-bit version) and returning the information statistically. A sample return screen shot of this statistical information is shown in figure 6. The TSTDLL32 has the capability to detect multiple processors with the GetProcessorCount() function.
Figure 4: TSTDLL32 Menu (Screen Shot)
Figure 5: Sample dialogue box returned by the test programs (Screen Shot)
Figure 6: Sample Statistical Information returned by cpuspeed() routine (Screen Shot)
Figure 7: Sample CPU CMOS information returned by cmosspeed() routine (Screen Shot)
The TSTDLL16.EXE code has the following modules:
TSTDLL16.C
TSTDLL16.H
This module is the same as the 32-bit DLL version.
Information in this document is provided in connection with Intel products. Intel assumes no liability whatsoever, including infringement of any patent or copyright, for sale and use of Intel products except as provided in Intel's Terms and Conditions of Sale for such products.
Intel retains the right to make changes to these specifications at any time, without notice.
* Other brands and names are the property of their respective owners.
Copyright (c) 1995, Intel Corporation. All rights reserved.
|