Experiments
with On-chip Monitoring in Pentium Processors
- J. Sosnowski, J. Nowicki
- Institute of Computer Science, Warsaw University of Technology,
- ul. Nowowiejska 15/19, Warsaw 00-665, Poland, Email: jss@ii.pw.edu.pl
Various processor monitoring techniques has been developed mostly for
the purpose of on-line testing. The main idea of these techniques is to
add a small amount of hardware and software to continuously monitor the
operation of the processor. The needed hardware ranges from sophisticated
watchdog processors to simple signature monitors. During the compile time
the application program has to be modified (adapted to the monitor) or a
special watchdog program is generated. The redundant software is used by
the monitoring circuitry during program execution. Various schemes of this
approach have been described in the literature (e.g. [2-4] and cited there
references). Unfortunately these schemes are not encountered in commercially
available microprocessors.
Recently some processors provide hardware facilities for performance
monitoring, which can be helpful in testing. For example in Pentium processor
we have processor clock counter and two counters which can be programmed
to count some specific events [5]. Depending upon these events the monitor
counts the occurrence or duration (in processor clock cycles) of the specified
event. The list of programmed events is in the range 40-80 (depending upon
the processor model). Monitoring events related to caches, TLB (translation
lookaside buffer) and BTB (branch target buffer) significantly increase
test observability (which is limited in other processors). The monitored
events are useful as an additional signature (result) of executed test programs.
The available on-chip monitoring circuitry in Pentium is quite powerful,
so we have performed many experiments to check its usefulness in dependable
computing for two purposes: increasing system testability (observability)
and on-line testing of application program execution.
In the sequel we will present some results related to the following events
(in brackets we give decimal encoding of the programmed event [5]):
(00) data read, (01) data write, (02) data TLB miss, (03) data read
miss, (04) data write miss, (06) data cache lines written back, (09) memory
access in both pipes, (10) bank conflicts, (12) code read, (13) code TLB
miss, (14) code cache miss, (19) BTB hits, (21) pipeline flushes, (22)
instructions executed, (23) instructions executed in the V pipe, (50) taken
branches, (53) pipeline flushes due to wrong branch prediction.
For all of these events the on-chip monitor is capable to count their
occurrence. However in one experiment we can monitor two events (counters
CTR0 and CTR1) and count clock cycles (counter TSC).
To supervise the monitoring experiments we have developed a special program
module. It operates in Window 95 and NT environment. The access to the monitoring
facilities is assured via specially designed I/O driver (MD) cooperating
with the operating system in protected mode CPL=0. The driver cooperates
with the experiment supervising program (ESP). For each experiment we have
to specify the monitored events. The monitored program should include counter
reset and counter read instructions at its beginning and the end. To collect
results for many events ESP program has to initialize the monitored program
many times. Moreover we admit repetitions of the same experiment to check
if the monitoring results are stable. For each of these repetitions ESP
assures the same initial states for most system resources.
During the experiments we used three classes of programs: test programs,
benchmark programs and some typical application programs. The first two
classes of programs have either no data or fixed constant data (parameters).
So each execution of such programs performs the same sequence of instructions.
In typical application programs the executed sequence of instructions depends
upon input data. For each tested program we performed many monitoring experiments
and collected the statistics of monitored events. Analyzing this statistic
we can evaluate the usefulness of monitoring various events. The most interesting
results relate to experiments with test procedures and performance benchmarks.
Some sample of results (related to 3 test programs and one benchmark)
is given in tab.1. The upper part of the table shows monitoring results
for three programs: P1 - test procedure of the adder in pipe U and V of
the processor, P2 - test procedure for multiplier and divider, P3 - on-chip
cache test. All the monitored programs were executed 20 times to check the
stability of the monitored parameters. For the monitored programs with fixed
control flow (test procedures, benchmarks) high stability of most monitored
events was observed. Practically only the number of counted clock cycles
fluctuated, especially for programs with many memory references (due to
refresh cycles and asynchronous cooperation of CPU with the system bus).
For programs P1, P2 and P3 this fluctuation was in the range 2.5%, 0.0005%
and 0.0003%, respectively. For application programs the monitored parameters
depended upon the set of arguments.
The bottom part of tab.1 shows some results for Whetstone benchmark.
The first row gives the benchmark result (in Whetstones) for 4 system configurations:
C1) all functional modules increasing processor performance used (pipe V
of ALU, BTB, caches L1 and L2), C2) switched-off caches, C3) switched-off
pipe V, C4) switched-off BTB. The second row gives the values of some selected
monitored parameters for configuration C1. The results show high sensitivity
of benchmarks to various processor blocks. Hence they can be used as supplementary
test procedures (test signature comprises benchmark results and some monitored
parameters). Similar results have been obtained for other benchmarks. Benchmarks
are very attractive due to the fact that they comprise typical program structures
so they test the system for realistic instruction sequences etc (not covered
by specialized deterministic and pseudorandom tests).
The monitored parameters can be considered as some additional signature
of the executed program. This is especially important for testing procedures.
In this case the additional signature increases test result reliability.
For some functional blocks events directly related to the operation of these
blocks increase their observability. For cache memory, BTB and TLB blocks
the monitoring cpabilities allowed us to simplify significantly test procedures
and increase error coverage (as compared with tests without monitoring).
In Pentium processors we can monitor simultaneously only processor cycles
and two other selected events. Hence an important issue is to select the
most characteristic parameters, to limit the number of performed experiments.
In the case of application programs monitoring results which differ significantly
from average values may be considered as some fault signalization. To make
the results more representative we can embed into the application programs
some messages which either justify the results (e.g. clock cycles [4]) or
specify their subclasses (depending on the performed program control flow).
Moreover we found high stability in correlation between some monitored events
(e.g data reads and data read miss). Error detection capabilities of the
on-chip monitoring are being checked with fault injection experiments. The
obtained results show significant improvement as compared with mechanisms
assured by the operational system.
Analyzing possible monitored parameters we found the lack of classical
signature monitor [3], which could be added to the CPU chip (e.g. monitoring
data transmitted to the register file). Such monitor significantly simplifies
checking test results. Hence test program capabilities could become more
competitive as compared with BISTs. We have performed some experiments with
commercially available monitor [1] and a model of CPU, they proved the significant
improvement of the effectiveness (lower complexity, higher error coverage)
of the system test procedures.
References
[1] P.Forstner, Digital bus monitor - SN74ACT8994, Application report
EB210E, TI1993.
[2] K. D. Wilken, T. Kong, Concurrent detection of software and hardware
data access faults, IEEE Trans. on Comp. vol.48, No.4, April 1997, pp.412-424.
[3] J. Sosnowski, Detection of control flow errors using signature and
checking instructions, Proc. of IEEE Test Conf., 1989, pp.81-88.
[4] J. Sosnowski, Concurrent checking of program flow using single chip
microcomputers, Microprocessing and Microprogramming, 29.1988, pp.783-789.
[5] Pentium processor developer's manual, Intel 1997.
Table first Sample of experiment results.
| Ev. |
P1 |
P2 |
P3 |
| 00
01
02
03
04
06
09
10
12
13
14
19
21
22
23
50
53 |
29,732
29,732
29
1,862
29,718
2
14,866
0
772,941
0
53
14,863
2
6,064,524
3,032,257
14,864
2 |
3,445,632
1,642,690
65
512
67,072
0
1,345,496
279,562
2,764,830
0
1,036
723,210
1,270
9,286,360
2,854,315
624,552
1271 |
212,049
212,042
1
212,048
212,040
1
70,708
0
13,251,238
1
13,251,256
10,326,482
6,265
37,947,216
10,999,348
11,297,246
6,266 |
| C1: 130.6; C2: 9.1; C3:
119.5; C4: 112.7 |
| 00-68,559,667; 01-68,857,047;
02-14; 03-192; 04-1,481; 06-78; 09-29,657,844; 10-30,000,016; 12-436,978,461;
13-57; 14-813;19-14,343,714; 21-37,271; 22-177,398,057; 23-53,242,404; 53-37,224 |
|