The rapid development in the digital circuit design enhances the applications on very large scale integration era. Encoders are one among the digital circuits found in all communication systems. The polar encoding is mainly meant for its channel achieving property. It finds its application in communications, sensing and information theory. This coding proposed by Erdal Arikan is significant because of its zero error floors and simple architecture for hardware implementation. In this paper, a folded polar encoder is designed to start from the fully parallel architecture and proceeds with its data flow graph, delay requirement calculation, lifetime analysis and register allocation, which results in a very large scale integration architecture with minimum hardware utilization. The results are simulated for 4 and 8 parallel folded 32-bit polar encoder using Xilinx 14.6 ISIM and implemented in Virtex 5 field programmable gate array. A comparison is made on fully parallel and various folding techniques based on their resource utilization.
The polar code belongs to the class of linear block codes. The encoding process can be characterized by the generator matrix. The generator matrix GN for code length N or 2 is obtained by applying the nth kronecker power of the kernel matrix [
Given the generator matrix, the code word x is computed by x = u∙GN, where u denotes the information. The information vector u is arranged in natural order, whereas the code vector x is in a bit reversed order. The encoding complexity of straight forward fully parallel encoder architecture is in the order of (NlogN) for the polar code of length N and takes n stages. When N = 2n, polar code with the length of 32 bit is implemented with 80 ex-or gates and processed in five stages as shown in
In VLSI architecture, reduction focuses on the minimization of the size of the components. Many techniques are involved in the minimization process. Some of the addressable techniques are k-map based Boolean expression method and block optimization method. In general, pipelining can be used in the context of architecture design [
either increase the clock speed or sample speed or to reduce the power consumption at the same speed. This in turn reduces the effective critical path by introducing pipelined latches along the data path. The pipelined technique can be broadly classified as feed forward and feedback path. The feed forward pipelined encoder structure consists of 2D commutator followed by ex-or and pass gates for achieving high throughput [
The feedback pipelined polar encoder favors for high hardware efficiency rather than high throughput. The number of ex-or gate is equal to the number of processing stages, whereas the number of delay elements gets reduced. In parallel processing, multiple outputs are computed in parallel in a clock period. Therefore, the effective sampling speed is increased by the level of parallelism [
The polar code architecture is discussed [
In this paper, the parallelism and pipelining have been combined to achieve an effective encoder structure with minimum registers. This implementation proceeds from the conventional fully parallel 32 bit architecture and transforming as a data flow graph (DFG), delay requirement table, linear lifetime chart and register allocation [
The polar encoder relies on the principle of channel polarization. It is a recursive method used to define the polar codes. A class of codes that can provably achieve the capacity of several classes of channels. It comes under linear codes. The phenomenon of channel polarization includes channel combining and channel splitting. The channel WN can be measured up with two parameters namely mutual information which defines the information capacity and Bhattacharya parameter measures the reliability of the channel.
In synthesizing DSP architectures, it is important to minimize the silicon area of the integrated circuits, which is achieved by reducing the number of functional units, multiplexers, interconnection wires. This in turn may lead to an architecture that uses a large number of registers. To avoid this, various techniques can be used to minimize the number of registers. Folding transformation reduces the hardware utilization by time multiplexing several operations of the functional unit [
The DFG of the 32 bit polar code is similar to Fast Fourier Transform (FFT), and it uses the kernel matrix instead of butterfly operation. The 4-parallel folded architecture can be realized by placing 2 functional units in each stage, since each of the functional units compute two bits at a time. Let us consider the four parallel input sequences in natural order. The initial folding sets can be given as: For stage 1: {P0, P2, P4, P6, P8, P10, P12, P14}, {P1, P3, P5, P7, P9, P11, P13, P15}. In this, the two functional units of stage 1 namely P0 and P1 execute simultaneously at the beginning and P2 and P3 at the next cycle. The stage whose index s is less than or equal to log2P, where P is the level of parallelism and has the same folding set as that of the previous one. The stage 2 has the same order as those of stage 1, since it performs the operation within the same four inputs. At later stages, the folding sets are computed by, the property that the functional unit that process a pair of inputs whose indices differ by 2(s−1) is exploited [
thus cyclic shifting of four bits right by one can be done by inserting a delay of one time unit. Thus the folding sets of stage 3 are given by {R14, R0, R2, R4, R6, R8, R10, R12}, {R15, R1, R3, R5, R7, R9, R11, R13}. The folding sets of stage 4 and stage 5 can be obtained by cyclic shifting of stage 3 by two in order to enable full utilization of functional units with adjacent iterations. The folding sets of stage 4 and stage 5 can be given as {S10, S12, S14, S0, S2, S4, S6, S8}, {S11, S13, S15, S1, S3, S5, S7, S9} and {T2, T4, T6, T8, T10, T12, T14, T0}, {T3, T5, T7, T9, T11, T13, T15, T1} respectively.
The number of delay element required in the folded architecture [
where Wij is an edge from the functional unit S to the functional unit T, having the delay d where t and s denote the position in the folding set corresponding to T and S respectively. The delay requirement of four folded 32 bit polar encoder can be given as shown in
j | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D(W1j) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
D(W2j) | 1 | 1 | 2 | 2 | 0 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 0 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 0 | 0 | 1 | 1 | 1 | 1 | −6 | −6 | 0 | 0 | −7 | −7 |
D(W3j) | 2 | 2 | 2 | 2 | 4 | 4 | 4 | 4 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | −6 | −6 | −4 | −4 | −4 | −4 | 0 | 0 | 0 | 0 | −6 | −6 | 2 | 2 |
D(W4j) | 4 | 4 | −4 | −4 | −4 | −4 | −4 | −4 | −0 | −0 | −0 | −0 | −0 | −0 | −0 | −0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
In
The 32 bit polar encoder with four parallel folded structure can be implemented with 10 functional units and 28 delay elements.
The number of delay elements can be reduced by implementing lifetime analysis for the folded architecture [
In computing the minimum number of registers required, each variable is allocated to a register. The register allocation table is utilized [
The four folded parallel pipelined structure for 32 bit polar encoder is shown in
The design of eight parallelism considers eight inputs at a time. Hence the stages are split up into four folding sets. The same procedure is applied for eight folded parallelism with the stages depicted below.
Stage 1: {P0, P4, P8, P12} {P1, P5, P9, P13} {P2, P6, P10, P14} {P3, P7, P11, P15}
Stage 2: {Q0, Q4, Q8, Q12} {Q1, Q5, Q9, Q13} {Q2, Q6, Q10, Q14} {Q3, Q7, Q11, Q15}
j | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D'(W1j) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
D'(W2j) | 1 | 1 | 2 | 2 | 0 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 0 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 0 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 0 | 0 | 1 | 1 |
D'(W3j) | 2 | 2 | 2 | 2 | 4 | 4 | 4 | 4 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 4 | 4 | 4 | 4 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |
D'(W4j) | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Cycle | Stage 2 | R1 | R2 | R3 | R4 | Stage 3 | R5 | R6 | R7 | R8 | R9 | R10 | R11 | R12 | Stage 4 | R13 | R14 | R15 | R16 | R17 | R18 | R19 | R20 | R21 | R22 | R23 | R24 | R25 | R26 | R27 | R28 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | W2,0 | W2,2 | W2,1 | W2,3 | ||||||||||||||||||||||||||||||||||||
1 | W2,4 | W2,6 | W2,5 | W2,7 | W2,2 | W2,0 | W2,3 | W2,1 | W3,0 | W3,4 | W3,1 | W3,5 | ||||||||||||||||||||||||||||
2 | W2,8 | W2,10 | W2,9 | W2,11 | W2,6 | W2,2 | W2,7 | W2,3 | W3,2 | W3,6 | W3,3 | W3,7 | W3,4 | W3,0 | W3,5 | W3,1 | W4,0 | W4,8 | W4,1 | W4,9 | ||||||||||||||||||||
3 | W2,12 | W2,14 | W2,13 | W2,15 | W2,10 | W2,8 | W2,11 | W2,9 | W3,8 | W3,12 | W3,9 | W3,13 | W3,6 | W3,4 | W3,2 | W3,0 | W3,7 | W3,5 | W3,3 | W3,1 | W4,2 | W4,10 | W4,3 | W4,11 | W4,8 | W4,0 | W4,9 | W4,1 | ||||||||||||
4 | W2,16 | W2,18 | W2,17 | W2,19 | W2,14 | W2,10 | W2,15 | W2,11 | W3,10 | W3,14 | W3,11 | W3,15 | W3,12 | W3,6 | W3,4 | W3,2 | W3,13 | W3,7 | W3,5 | W3,3 | W4,4 | W4,12 | W4,5 | W4,13 | W4,10 | W4,8 | W4,2 | W4,0 | W4,11 | W4,9 | W4,3 | W4,1 | ||||||||
5 | W2,20 | W2,22 | W2,21 | W2,23 | W2,18 | W2,16 | W2,19 | W2,17 | W3,16 | W3,20 | W3,17 | W3,21 | W3,14 | W3,12 | W3,6 | W3,4 | W3,15 | W3,13 | W3,7 | W3,5 | W4,6 | W4,14 | W4,7 | W4,15 | W4,12 | W4,10 | W4,8 | W4,4 | W4,2 | W4,0 | W4,13 | W4,11 | W4,9 | W4,5 | W4,3 | W4,1 | ||||
6 | W2,24 | W2,26 | W2,25 | W2,27 | W2,22 | W2,18 | W2,23 | W2,19 | W3,18 | W3,22 | W3,19 | W3,23 | W3,20 | W3,14 | W3,16 | W3,6 | W3,21 | W3,15 | W3,17 | W3,7 | W4,16 | W4,24 | W4,17 | W4,25 | W4,14 | W4,12 | W4,10 | W4,8 | W4,6 | W4,4 | W4,2 | W4,0 | W4,15 | W4,13 | W4,11 | W4,9 | W4,7 | W4,5 | W4,3 | W4,1 |
7 | W2,28 | W2,30 | W2,29 | W2,31 | W2,26 | W2,24 | W2,27 | W2,25 | W3,24 | W3,28 | W3,25 | W3,29 | W3,22 | W3,20 | W3,18 | W3,16 | W3,23 | W3,21 | W3,19 | W3,17 | W4,18 | W4,26 | W4,19 | W4,27 | W4,24 | W4,14 | W4,12 | W4,10 | W4,8 | W4,6 | W4,4 | W4,2 | W4,25 | W4,15 | W4,13 | W4,11 | W4,9 | W4,7 | W4,5 | W4,3 |
8 | W2,30 | W2,26 | W2,31 | W2,27 | W3,26 | W3,30 | W3,27 | W3,31 | W3,28 | W3,22 | W3,20 | W3,18 | W3,29 | W3,23 | W3,21 | W3,19 | W4,20 | W4,28 | W4,21 | W4,29 | W4,26 | W4,24 | W4,14 | W4,12 | W4,10 | W4,8 | W4,6 | W4,4 | W4,27 | W4,25 | W4,15 | W4,13 | W4,11 | W4,9 | W4,7 | W4,5 | ||||
9 | W3,30 | W3,28 | W3,22 | W3,20 | W3,31 | W3,29 | W3,23 | W3,21 | W4,22 | W4,30 | W4,23 | W4,31 | W4,28 | W4,26 | W4,24 | W4,14 | W4,12 | W4,10 | W4,8 | W4,6 | W4,29 | W4,27 | W4,25 | W4,15 | W4,13 | W4,11 | W4,9 | W4,7 | ||||||||||||
10 | W3,30 | W3,22 | W3,31 | W3,23 | W4,30 | W4,28 | W4,26 | W4,24 | W4,14 | W4,12 | W4,10 | W4,8 | W4,31 | W4,29 | W4,27 | W4,25 | W4,15 | W4,13 | W4,11 | W4,9 | ||||||||||||||||||||
11 | W4,30 | W4,28 | W4,26 | W4,14 | W4,12 | W4,10 | W4,31 | W4,29 | W4,27 | W4,15 | W4,13 | W4,11 | ||||||||||||||||||||||||||||
12 | W4,30 | W4,28 | W4,14 | W4,12 | W4,31 | W4,29 | W4,15 | W4,13 | ||||||||||||||||||||||||||||||||
13 | W4,30 | W4,14 | W4,31 | W4,15 |
Stage 3: {R0, R4, R8, R12} {R1, R5, R9, R13} {R2, R6, R10, R14} {R3, R7, R11, R15}
Stage 4: {S12, S0, S4, S8} {S13, S1, S5, S9} {S14, S2, S6, S10} {S15, S3, S7, S11}
Stage 5: {T4, T8, T12, T0} {T5, T9, T13, T7} {T6, T10, T14, T2} {T7, T11, T15, T3}
The corresponding cut-set is shown in
The linear lifetime chart is drawn for W3j and W4j, since there exists no cross over with other stage inputs on the W2j stage. This chart minimizes the registers to a count of 24 as shown in
This register count has been used to perform register allocation as illustrated in
In the folded architecture the stages 1 and 2 include zero delay and hence no registers are needed. The stage 3 requires eight registers as shown in the linear lifetime chart. The stage 4 requires sixteen registers to obtain the encoded output.
The above designs of 32 bit polar encoder for fully parallel, eight and four folded architectures are simulated using Xilinx 14.6 ISE and implemented in Virtex 5 FPGA and the corresponding outputs are obtained.
The simulation of 32 bit fully parallel architecture for a polar encoder with the input of 32’h FFFFFFFF using Xilinx 14.6 ISIM results in an output of 32’h 80008000 as depicted in
The four parallel folded architecture is simulated with the same input stream as given for fully parallel and verifies the same results as shown above in
The 8-parallel folded architecture is simulated with the same input stream given in fully parallel results in the same output verifying the functionality of polar code as in
The partially parallel implementation in [
In addition, there exist N-P delay elements with the throughput of P bits/cycle. Thus the 32 bit polar encoder design for four and eight folding matches exactly with the same criteria. The comparison of resource utilization for the 32 bit polar encoder for fully parallel, four and eight folded architectures is depicted in
This paper is focused to minimize the hardware resources for the 32 bit polar encoder. Many optimization techniques are implemented in steps to arrive at the proposed architecture for various folding levels. The simulation results show that the folded structure abides the polar encoder functionality. The implementation in Virtex 5
j | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
D(W1j) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
D(W2j) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
D(W3j) | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | −2 | −2 | −2 | −2 | 0 | 0 | 0 | 0 | −3 | −3 | −3 | −3 |
D(W4j) | 2 | 2 | 2 | 2 | −2 | −2 | −2 | −2 | −0 | −0 | −0 | −0 | −0 | −0 | −0 | −0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | −2 | −2 | −2 | −2 | 2 | 2 | 2 | 2 |
D'(W1j) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
D'(W2j) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
D'(W3j) | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
D'(W4j) | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Cycle | Stage 3 | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | Stage 4 | R9 | R10 | R11 | R12 | R13 | R14 | R15 | R16 | R17 | R18 | R19 | R20 | R21 | R22 | R23 | R24 | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | W3,0 | W3,2 | W3,4 | W3,6 | W3,1 | W3,3 | W3,5 | W3,7 | ||||||||||||||||||||||||||||||||||||
1 | W3,8 | W3,10 | W3,12 | W3,14 | W3,9 | W3,11 | W3,13 | W3,15 | W3,4 | W3,0 | W3,6 | W3,2 | W3,5 | W3,1 | W3,7 | W3,3 | W4,0 | W4,2 | W4,8 | W4,10 | W4,1 | W4,3 | W4,9 | W4,11 | ||||||||||||||||||||
2 | W3,16 | W3,18 | W3,20 | W3,22 | W3,17 | W3,19 | W3,21 | W3,23 | W3,12 | W3,4 | W3,14 | W3,6 | W3,13 | W3,5 | W3,15 | W3,7 | W4,4 | W4,6 | W4,12 | W4,14 | W4,5 | W4,7 | W4,13 | W4,15 | W4,8 | W4,0 | W4,10 | W4,2 | W4,9 | W4,1 | W4,11 | W4,3 | ||||||||||||
3 | W3,24 | W3,26 | W3,28 | W3,30 | W3,25 | W3,27 | W3,29 | W3,31 | W3,20 | W3,16 | W3,22 | W3,18 | W3,21 | W3,17 | W3,23 | W3,19 | W4,16 | W4,18 | W4,24 | W4,26 | W4,17 | W4,19 | W4,25 | W4,27 | W4,12 | W4,8 | W4,4 | W4,0 | W4,14 | W4,10 | W4,6 | W4,2 | W4,13 | W4,9 | W4,5 | W4,1 | W4,15 | W4,11 | W4,7 | W4,3 | ||||
4 | W3,28 | W3,20 | W3,30 | W3,22 | W3,29 | W3,21 | W3,31 | W3,23 | W4,20 | W4,22 | W4,28 | W4,30 | W4,21 | W4,23 | W4,29 | W4,31 | W4,24 | W4,12 | W4,8 | W4,4 | W4,26 | W4,14 | W4,10 | W4,6 | W4,25 | W4,13 | W4,9 | W4,5 | W4,27 | W4,15 | W4,11 | W4,7 | ||||||||||||
5 | W4,28 | W4,24 | W4,12 | W4,8 | W4,30 | W4,26 | W4,14 | W4,10 | W4,29 | W4,25 | W4,13 | W4,9 | W4,31 | W4,27 | W4,15 | W4,11 | ||||||||||||||||||||||||||||
6 | W4,28 | W4,12 | W4,30 | W4,14 | W4,29 | W4,13 | W4,31 | W4,15 | ||||||||||||||||||||||||||||||||||||
Design/features | Fully parallel | Eight parallel folding | Four parallel folding |
---|---|---|---|
No. of ex or gates | 80 | 20 | 10 |
No. of delay elements | 0 | 24 | 28 |
No. of registers | 0 | 96 | 224 |
Timing report (CPU to XST) | 4.09 s | 5.73 s | 8.77s |
Memory usage | 164,328 KB | 168,424 KB | 169,448 KB |
FPGA shows that the folding decreases the functional blocks (ex-or) operations, but needs trade off in the number of delay elements, registers and speed of execution.
G. Indumathi,V. P. M. B. Aarthi Alias Ananthakirupa,M. Ramesh, (2016) Architectural Design of 32 Bit Polar Encoder. Circuits and Systems,07,551-561. doi: 10.4236/cs.2016.75047