This paper describes a microprogrammed architecture for an embedded coprocessor that is able to control IEEE 1149.1 to IEEE 1149.7 test infrastructures, and explains how to expand the supported test command set. The coprocessor uses a fast simplex link (FSL) channel to interface a 32-bit MicroBlaze CPU, but it can work with any microprocessor core that accepts this simple FIFO-based interface method. The implementation cost (logic resource usage for a Xilinx Spartan-6 FPGA) and the performance data (operating frequency) are presented for a test command set comprising two parts: 1) the full IEEE 1149.1 structural test operations; 2) a subset of IEEE 1149.7 operations selected to illustrate the implementation of advanced scan formats.
After nearly 25 years of industry acceptance, starting immediately after its approval as a standard in 1990 [
This paper presents a microprogrammed architecture for an embedded coprocessor dedicated to IEEE 1149.1/ 1149.7 test operations. Its current instruction set supports all IEEE 1149.1 test operations (state transition, shift operations, shift and compare), and an additional subset of IEEE1149.7 test operations designed as proof of concept for the proposed architecture (optimized scan format Oscan1, zero-bit DR scan and escape sequences). The following section briefly describes the evolution from IEEE 1149.1 to 1149.7. Section 3 describes the micro-programmed architecture of the proposed embedded test coprocessor, and Section 4 explains the test command design process. Section 5 presents implementation cost and performance data. The main conclusions of this work are presented in the last section.
Several IEEE 1149.x standards and proposed standards were developed by the test community since the mid- 1980s, when the Joint Test Action Group (JTAG) initiated the development of the test technology that became known as the IEEE 1149.1 Standard Test Access Port and Boundary-Scan Architecture. The approval of IEEE 1149.1 in 1990 marked the beginning of a series of IEEE 1149.x test standards that include successful and unsuccessful attempts to provide industry-accepted production test technologies [
The IEEE 1149.7 standard, approved in 2009, offers a framework that ensures interoperability between 1149.1 and 1149.7 components, while enabling enhanced test and debug features, and a reduced 2-pin TAP (where mode select, data in and data out information are multiplexed in a single TMSC pin). Besides supporting several types of advanced scan formats (MScan, OScan, SScan), which enable a variety of tradeoffs between capability and performance, the IEEE 1149.7 test architecture also supports a power-down mode to reduce consumption when the test and debug logic is not in use. The original BS architecture, illustrated in
The first step for implementing a controller architecture supporting the set of commands referred in the previous section consists of building a formal representation of their functionality, bringing into evidence the corresponding control and data flow operations. The data flow operations will determine the blocks required in the coprocessor data path, which will be implemented with regular sequential circuits. There is a higher degree of freedom in what concerns the implementation of the control path, where a hardwired or a microprogrammed architecture can be used to implement the corresponding control flow operations. Notice also that the formal representation of each test command will be influenced by the decision of implementing it in Moore or Mealy form. A Moore machine will only update its outputs (the control signals to the data path) upon the rising edge of the system clock, while Mealy machines can update their outputs at any moment. To illustrate the influence of this choice,
Original IEEE 1149.1 boundary-scan test infrastructure for structural fault detection
. IEEE 1149.1/1149.7 test command set supported by the proposed embedded test coprocessor
Test command | Description |
---|---|
RESET | Takes the boundary-scan logic to Test-Logic-Reset (equivalent to SVF “STATE RESET”) |
TMS0, TMS1 | Set TMS to 0/1 and generate one TCK clock pulse (enabling the implementation of any SVF “STATE” command) |
MTCK N | Sets TMS to 0 and generates N TCK pulses (enables the multiple TCK tests carried out by SVF “RUNTEST”) |
SHF1 N X | Shifts an N-bit bitstream (X) into the [instruction | selected data] register(s) (data bits shifted out are ignored) |
SHFCP1 N X, Y, Z | Shifts an N-bit bitstream (X) into the [instruction | selected data] register(s), and compares the output bitstream with its expected response (Y), in all positions indicated by the mask bitstream (Z) (enables the scan & compare “SDR” and “SIR” SVF commands) |
ZBS N | Issues N zero bit scan sequences by counting N TMS edge counts (used to generate IEEE 1149.7 commands and to set the control level) |
ESC N | Issues N escape sequences (used to select, deselect, reset, and set up other IEEE 1149.7 specific operations) |
SHF7O1 N X | Equivalent to SHF1 above, but using the 1149.7 advanced scan format OScan1 |
SHFCP7O1 N X, Y, Z | Equivalent to SHFCP1 above, but using the 1149.7 advanced scan format OScan1 |
States 2 and 3 in
The conversion into a Moore machine increased the number of states by a factor of 2.25 (from 4 to 9). However, this higher number of states does not imply a proportional degradation of cost or performance, be it in the number of logic gates or microprogram memory positions (spatial resources), or in the number of clock cycles (speed). A simple benchmarking of the execution speed can take place by considering the time required to move from state 2 to state 4, assuming that: 1) conditions B and C hold true, and 2) condition D holds false for five times.
The corresponding state transition for the Mealy representation of
Partial representation of the ASMD chart for SHF1. (a) Mealy representation. (b) Moore representation
5-9-5-6-7, and requires a total of 15 clock cycles (that is to say, 1.25 times more than the corresponding Mealy representation).
In order to compare the corresponding implementations in terms of logic resources/FPGA floorspace, a choice will have to be made between a hardwired or a microprogrammed control path architecture. The following reasons explain our choice of a microprogrammed coprocessor architecture:
● A hardwired controller needs to be completely redesigned when a new test command is required, and the designer will have to code each new ASMD chart in the chosen hardware description language (e.g. VHDL, Verilog).
● Even if major differences are not to be expected, a hardwired architecture will correspond to a new sum-of- products (or a similar canonical form) structure for each set of test commands, meaning that there will be variations in the critical path and maximum propagation delay (on the contrary, there will be no variation in the control elements of a microprogrammed architecture, since any additional commands correspond solely to further positions added to the microprogram memory).
● Assuming that the “micro-operation set” is able to cover the primitive constructs required to implement new test commands, the designer does not have to write VHDL/Verilog code whenever a new command is added, but simply to translate the respective ASMD chart into the corresponding microprogram memory bank positions.
The basic control path architecture for a microprogrammed implementation is shown in
The architecture illustrated in
Basic microprogrammed architecture for a Moore control path
BRANCH IF microinstructions, directing the state transition from beginning to end. However, it can only implement Moore machines, since each ASMD state selects a single microprogram memory position. Since the ASMD blocks frequently contain conditional output boxes, or more than one decision box, this means that the simplicity of the architecture shown in
The general rule for preprocessing the ASMD charts consists of eliminating all conditional output boxes (which will become unconditional outputs specified in a state of their own), and splitting the state when more than one decision box is present. In addition, and since the most significant bits come directly from the test command opcode, the number of microprogram memory positions used to implement each command is fixed. This represents a waste of FPGA floorspace, since the most complex command, with the longest ASMD chart representation, will dictate the number of positions that will be used for all other commands. The control path architecture illustrated in
If we want to enable a Mealy implementation, the least significant address bits will have to be driven from data path conditions. This solution enables any number of decision boxes per state, provided that the corresponding conditions are used to drive the least significant address bits of the microprogram memory. The main drawback is that the number of microprogram memory positions will be equal to S*2D, where S is the number of states, and D is the maximum number of decision boxes existing in a single state (16 memory positions for the example presented earlier). For a test command set with O opcodes, we would have a total microprogram memory storage requirement given by S*2D*O. While this modification means that we will now have two (or a power of two) microprogram memory positions for each ASMD chart state, the end result is not necessarily an explosion of the microprogram memory space, since state decomposition is restricted to the need of preventing multiple decision boxes per ASMD state (a situation that is rather seldom).
The non-limited simple Mealy architecture enables the fastest implementation. On the other hand, it is the most expensive in terms of microprogram memory storage. The intermediate simple Mealy solution limited to one decision box per state is likewise an intermediate solution in terms of speed vs. microprogram memory storage, while the Moore representation is cheapest in terms of microprogram memory storage. It is also the slowest, although the number of clock cycles is not proportional to the number of states (instead it is dictated by the path through the ASMD chart).
The specific nature of scan test infrastructures dictates that the number of required primitive constructs (micro-operations) is very small. The data path architecture will therefore have a small number of elements, consisting only of counters, latches and serializers. The conditions associated with the operation of these data path elements are easy to typify, and are limited to detecting if the latches and counters reached one or zero. The proposed test coprocessor architecture may be represented as shown in
. Pros and cons of Moore, Mealy-1, and Mealy-2 microprogrammed control path implementations
Topology | Pros | Cons | Comments |
---|---|---|---|
Moore | Maximum simplicity | Lowest speed | Preprocessing required |
Mealy 1 | Good speed upgrade | Higher number of memory positions | Minimum preprocessing |
Mealy 2 | Marginal speed upgrade over Mealy 1 | Highest number of memory positions, most of which will not be used | No preprocessing required (if no. of decision boxes per state is ≤2) |
Microprogrammed architecture adopted for the proposed embedded test coprocessor
The proposed test coprocessor architecture and interface method
This section presents the micro-operation set that is supported by the microprogrammed control path, and explains the sequence of steps that will enable any designer to add further commands, in order to expand the application domain of the proposed test coprocessor (e.g. further IEEE 1149.7 operations). The microprogrammed control path architecture illustrated in
● The leftmost bits represent the new address, to be used whenever the next state encoding is different from the current state encoding plus 1 (in which case we’ll jump into a new address, instead of incrementing the current address). The number of bits in this field is dictated by the maximum number of states in the ASMD chart of a single test command.
● The middle field contains the micro-operation that represents the required control flow. The implementation of any ASMD chart can be carried out using three basic types of micro-operations: 1) CONTINUE (i.e. increment the current microprogram memory address); 2) JUMP TO ADDRESS (i.e. load the “new address” that is represented in the leftmost bits of the current microprogram memory position); and 3) BRANCH IF CONDITION TO ADDRESS (jumps to the indicated address if the condition is true, continues to the next address otherwise). This third type actually generates a variety of different micro-operations, i.e. BRANCH IF CONDITION_A TO ADDRESS is formally different from BRANCH IF/CONDITION_A TO ADDRESS (branch if not_condition). Each new test command to be supported will most likely require additional BRANCH IF micro-operations, to cope with the specific conditions associated with its execution. The number of bits in this field is determined by 2 (CONTINUE, JUMP TO ADDRESS) plus the total number of BRANCH IF micro-operations required.
● The rightmost field comprises all the control bits that determine the data flow operations indicated in the ASMD chart. In a horizontal microprogrammed architecture, such as the one that was adopted in this work, the number of bits in this field is equal to the total number of control bits required by the data path elements, plus all additional bits that are directly connected to test access port pins (e.g. board TMS and board TCK).
In order to expand the test command set supported by the coprocessor, the designer shall proceed as follows:
1) Draw the ASMD chart specifying the operation of the required test command.
2) If the ASMD chart contains conditional output boxes (Mealy machine), split those states so as to convert all conditional output boxes into state boxes (convert from Mealy to Moore).
3) Once the Moore ASMD chart is ready, fill in the microprogram memory template with the micro-opera- tions and control bit patterns corresponding to each ASMD state.
4) Update the content of the microprogram memory.
As an illustrative example,
As this example shows, expanding the test command set simply consists of updating the content of the microprogram memory, releasing the designer from the need to understand and modify the VHDL code that describes the control path architecture.
To evaluate the overall performance of the proposed embedded test coprocessor,
● The columns showing data for each test command (the four rightmost columns) correspond to the implementation of a single test command, without the FSL interface.
● Since the FSL interface is predesigned and independent of the proposed microprogrammed architecture, all the tables include two columns showing the implementation data for all test commands when no FSL interface is present, and when a single 1-word 32-bit FIFO interface is added from the MicroBlaze to the test coprocessor.
.1 test commands with a 4-pin TAP test infrastructure. TMS1 and TMS0 belong to the same group of results, and the same happens with MTCK and RESET, so each second command was omitted in all the tables to improve readability. It is important to notice that the usage of logic resources imposed by the most complex test command (SHFCP1) dictates the cost of the full implementation—the sole implementation of SHFCP1 requires practically the same resources as the full implementation of all the IEEE 1149.1 test commands that are presented in Table 1. The number o
(a) (b)
. Content of the microprogram memory for the MTCK test command
ASMD state | µprogram ROM | µprogram ROM: control outputs | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
New_Addr | µoperation | iL | iD | rL | cL | cD | cM | sL | sS | bS | bK | |
0 | X | CONT | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | Offset to pos. 3 | BIF/A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | Offset to END pos. | JUMP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
3 | Offset to pos. 1 | JUMP | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
(iL: iwordL; iD: iwordD; rL: bitsL; cL: cbitsL; cD: cbitsD; cM: cbitsM; sL: serL; sS: serS; bS: b_TMS; bK: b_TCK).
. Logic resources usage by the IEEE 1149.1 test commands
Parameter | All (no FSL) | All (one FSL) | Only TMS1 | Only MTCK | Only SHF1 | Only SHFCP1 |
---|---|---|---|---|---|---|
No. of slice registers | 388 (1%) | 374 (2%) | 88 - | 120 - | 274 - | 338 - |
No. of slice LUTs | 383 (4%) | 424 (4%) | 125 - | 207 - | 307 - | 380 - |
No. of memory positions | 52 | 52 | 4 | 7 | 16 | 23 |
. Logic resources usage by the IEEE 1149.7 test commands
Parameter | All (no FSL) | All (one FSL) | Only ESC | Only ZBS | Only SHF7O1 | Only SHFCP7O1 |
---|---|---|---|---|---|---|
No. of slice registers | 160 (1%) | 377 (2%) | 75 - | 75 - | 144 - | 160 - |
No. of slice LUTs | 232 (4%) | 461 (5%) | 179 - | 146 - | 178 - | 224 - |
No. of memory positions | 98 | 98 | 9 | 23 | 32 | 40 |
microprogram memory positions when the full set of commands is implemented, can be calculated by adding the equivalent value for each command individually, but taking into account that the first two positions are common to all the test commands.
The same comparison has been done for the IEEE 1149.7 test commands, and the summary of the collected data is shown in
With respect to timing performance, we again notice that the most complex command is the main contributor to determine the minimum period/maximum frequency of the embedded test coprocessor operation.
Several IEEE 1149.1 test controller solutions have been developed over the years [
1) An embedded test coprocessor that has the capability to test any IEEE 1149.x-compatible system. This coprocessor supports a wide range of testing scenarios for embedded systems based on single or multi-core 32-bit MicroBlaze CPUs, from built-in self-test to online fault detection and diagnosis.
2) A microprogrammed control path architecture for the test coprocessor, enabling a straightforward expansion of the test command set to cope with additional application domains, e.g. those made possible by IEEE 1149.7. The simplicity of the selected architecture enables a relatively fast execution of the test commands, reaching above 70% of the Nexys 3™ board system clock (100 MHz).
. Timing performance for the IEEE 1149.1 test commands
Parameter | All (no FSL) | All (one FSL) | Only TMS1 | Only MTCK | Only SHF1 | Only SHFCP1 |
---|---|---|---|---|---|---|
Min. period - Max. freq. | 12.276 ns - 81.457 MHz | 12.315 ns - 81.199 MHz | 6.053 ns - 165.211 MHz | 6.249 ns - 160.037 MHz | 7.722 ns - 128.662 MHz | 11.246 ns - 88.917 MHz |
Max. comb. path delay | 5.157 ns | 8.102 ns | 5.157 ns | 5.385 ns | 5.385 ns | 5.519 ns |
. Timing performance for the IEEE 1149.7 test commands
Parameter | All (no FSL) | All (one FSL) | Only ESC | Only ZBS | Only SHF7O1 | Only SHFCP7O1 |
---|---|---|---|---|---|---|
Min. period - Max. freq. | 12.657 ns - 79.005 MHz | 13.655 ns - 73.235 MHz | 7.756 ns - 128.929 MHz | 8.302 ns - 120.449 MHz | 11.469 ns - 87.192 MHz | 12.249 ns - 81.639 MHz |
Max. comb. path delay | - | 5.94 ns | - | - | - | - |
An FSL interface is used to interact with the MicroBlaze for exchanging the command opcode, the arguments of the various functions, and the test result. The FSL is a very popular coprocessor interface method that uses a simple hardware protocol. It is supported by a flexible, yet dedicated instruction set to write to the output and read from the input port. The 32-bit wide FSL bus interface used offers a dedicated point-to-point data streaming interface. Its point-to-point nature and its minimum hardware requirement are the greatest advantages of this interface method. Because the FSL channels are dedicated, no arbitration or bus mastering is required, ensuring an extremely fast interface with very low data latency.
The proposed architecture has the capability to cope with emerging requirements of test and debug standards based on scan test infrastructures. The microprogrammed control path architecture made the embedded test coprocessor scalable, and the FSL interface enabled an optimized solution for IEEE 1149.x test infrastructures, where the most time-consuming functions were executed in hardware, and the remaining functions were implemented by software at a higher abstraction level.