Experiments with On-chip Monitoring in Pentium Processors

 

J. Sosnowski, J. Nowicki
Institute of Computer Science, Warsaw University of Technology,
ul. Nowowiejska 15/19, Warsaw 00-665, Poland, Email: jss@ii.pw.edu.pl

 

 

Various processor monitoring techniques has been developed mostly for the purpose of on-line testing. The main idea of these techniques is to add a small amount of hardware and software to continuously monitor the operation of the processor. The needed hardware ranges from sophisticated watchdog processors to simple signature monitors. During the compile time the application program has to be modified (adapted to the monitor) or a special watchdog program is generated. The redundant software is used by the monitoring circuitry during program execution. Various schemes of this approach have been described in the literature (e.g. [2-4] and cited there references). Unfortunately these schemes are not encountered in commercially available microprocessors.

Recently some processors provide hardware facilities for performance monitoring, which can be helpful in testing. For example in Pentium processor we have processor clock counter and two counters which can be programmed to count some specific events [5]. Depending upon these events the monitor counts the occurrence or duration (in processor clock cycles) of the specified event. The list of programmed events is in the range 40-80 (depending upon the processor model). Monitoring events related to caches, TLB (translation lookaside buffer) and BTB (branch target buffer) significantly increase test observability (which is limited in other processors). The monitored events are useful as an additional signature (result) of executed test programs.

The available on-chip monitoring circuitry in Pentium is quite powerful, so we have performed many experiments to check its usefulness in dependable computing for two purposes: increasing system testability (observability) and on-line testing of application program execution.

In the sequel we will present some results related to the following events (in brackets we give decimal encoding of the programmed event [5]):

(00) data read, (01) data write, (02) data TLB miss, (03) data read miss, (04) data write miss, (06) data cache lines written back, (09) memory access in both pipes, (10) bank conflicts, (12) code read, (13) code TLB miss, (14) code cache miss, (19) BTB hits, (21) pipeline flushes, (22) instructions executed, (23) instructions executed in the V pipe, (50) taken branches, (53) pipeline flushes due to wrong branch prediction.

For all of these events the on-chip monitor is capable to count their occurrence. However in one experiment we can monitor two events (counters CTR0 and CTR1) and count clock cycles (counter TSC).

To supervise the monitoring experiments we have developed a special program module. It operates in Window 95 and NT environment. The access to the monitoring facilities is assured via specially designed I/O driver (MD) cooperating with the operating system in protected mode CPL=0. The driver cooperates with the experiment supervising program (ESP). For each experiment we have to specify the monitored events. The monitored program should include counter reset and counter read instructions at its beginning and the end. To collect results for many events ESP program has to initialize the monitored program many times. Moreover we admit repetitions of the same experiment to check if the monitoring results are stable. For each of these repetitions ESP assures the same initial states for most system resources.

During the experiments we used three classes of programs: test programs, benchmark programs and some typical application programs. The first two classes of programs have either no data or fixed constant data (parameters). So each execution of such programs performs the same sequence of instructions. In typical application programs the executed sequence of instructions depends upon input data. For each tested program we performed many monitoring experiments and collected the statistics of monitored events. Analyzing this statistic we can evaluate the usefulness of monitoring various events. The most interesting results relate to experiments with test procedures and performance benchmarks.

Some sample of results (related to 3 test programs and one benchmark) is given in tab.1. The upper part of the table shows monitoring results for three programs: P1 - test procedure of the adder in pipe U and V of the processor, P2 - test procedure for multiplier and divider, P3 - on-chip cache test. All the monitored programs were executed 20 times to check the stability of the monitored parameters. For the monitored programs with fixed control flow (test procedures, benchmarks) high stability of most monitored events was observed. Practically only the number of counted clock cycles fluctuated, especially for programs with many memory references (due to refresh cycles and asynchronous cooperation of CPU with the system bus). For programs P1, P2 and P3 this fluctuation was in the range 2.5%, 0.0005% and 0.0003%, respectively. For application programs the monitored parameters depended upon the set of arguments.

The bottom part of tab.1 shows some results for Whetstone benchmark. The first row gives the benchmark result (in Whetstones) for 4 system configurations: C1) all functional modules increasing processor performance used (pipe V of ALU, BTB, caches L1 and L2), C2) switched-off caches, C3) switched-off pipe V, C4) switched-off BTB. The second row gives the values of some selected monitored parameters for configuration C1. The results show high sensitivity of benchmarks to various processor blocks. Hence they can be used as supplementary test procedures (test signature comprises benchmark results and some monitored parameters). Similar results have been obtained for other benchmarks. Benchmarks are very attractive due to the fact that they comprise typical program structures so they test the system for realistic instruction sequences etc (not covered by specialized deterministic and pseudorandom tests).

The monitored parameters can be considered as some additional signature of the executed program. This is especially important for testing procedures. In this case the additional signature increases test result reliability. For some functional blocks events directly related to the operation of these blocks increase their observability. For cache memory, BTB and TLB blocks the monitoring cpabilities allowed us to simplify significantly test procedures and increase error coverage (as compared with tests without monitoring).

In Pentium processors we can monitor simultaneously only processor cycles and two other selected events. Hence an important issue is to select the most characteristic parameters, to limit the number of performed experiments. In the case of application programs monitoring results which differ significantly from average values may be considered as some fault signalization. To make the results more representative we can embed into the application programs some messages which either justify the results (e.g. clock cycles [4]) or specify their subclasses (depending on the performed program control flow). Moreover we found high stability in correlation between some monitored events (e.g data reads and data read miss). Error detection capabilities of the on-chip monitoring are being checked with fault injection experiments. The obtained results show significant improvement as compared with mechanisms assured by the operational system.

Analyzing possible monitored parameters we found the lack of classical signature monitor [3], which could be added to the CPU chip (e.g. monitoring data transmitted to the register file). Such monitor significantly simplifies checking test results. Hence test program capabilities could become more competitive as compared with BISTs. We have performed some experiments with commercially available monitor [1] and a model of CPU, they proved the significant improvement of the effectiveness (lower complexity, higher error coverage) of the system test procedures.

 

References

[1] P.Forstner, Digital bus monitor - SN74ACT8994, Application report EB210E, TI1993.

[2] K. D. Wilken, T. Kong, Concurrent detection of software and hardware data access faults, IEEE Trans. on Comp. vol.48, No.4, April 1997, pp.412-424.

[3] J. Sosnowski, Detection of control flow errors using signature and checking instructions, Proc. of IEEE Test Conf., 1989, pp.81-88.

[4] J. Sosnowski, Concurrent checking of program flow using single chip microcomputers, Microprocessing and Microprogramming, 29.1988, pp.783-789.

[5] Pentium processor developer's manual, Intel 1997.

 

Table first Sample of experiment results.

Ev.

P1

P2

P3

00

01

02

03

04

06

09

10

12

13

14

19

21

22

23

50

53

29,732

29,732

29

1,862

29,718

2

14,866

0

772,941

0

53

14,863

2

6,064,524

3,032,257

14,864

2

3,445,632

1,642,690

65

512

67,072

0

1,345,496

279,562

2,764,830

0

1,036

723,210

1,270

9,286,360

2,854,315

624,552

1271

212,049

212,042

1

212,048

212,040

1

70,708

0

13,251,238

1

13,251,256

10,326,482

6,265

37,947,216

10,999,348

11,297,246

6,266

C1: 130.6; C2: 9.1; C3: 119.5; C4: 112.7

00-68,559,667; 01-68,857,047; 02-14; 03-192; 04-1,481; 06-78; 09-29,657,844; 10-30,000,016; 12-436,978,461; 13-57; 14-813;19-14,343,714; 21-37,271; 22-177,398,057; 23-53,242,404; 53-37,224