Efficient Soft Cancelation Decoder Architecturesfor Polar Codes
The flooding belief propagation (FO-BP) and the soft-cancelation (SCAN) algorithms are the two most popular soft-output BP algorithms for the decoding of capacity-achieving polar codes. The FO-BP algorithm has high throughput at the cost of performance degradation in high signal-to-noise ratio (SNR) region or with large block length. The SCAN algorithm has much better decoding performance while suffering from long decoding latency and low throughput. In this paper, an improved BP algorithm, named reduced complexity softcancelation (RCSC) algorithm, is proposed. Compared with the SCAN algorithm, the number of memory entries required by the RCSC algorithm is reduced by more than 50% in general, while achieving comparable or even better (e.g., when block size N=215) decoding performance. When block size is large(e.g., N ≥215), the proposed RCSC algorithm reduces the required memory entries by more than 23% compared with the state-of-the-art FO-BP algorithm. The numerical results show that the error performance improvement of the RCSC algorithm is more significant when the SNR increases. For a different tradeoff, a reduced latency soft-cancelation (RLSC) algorithm is proposed to reduce the decoding latency and increase the throughput of the RCSC algorithm while slightly sacrificing decoding performance. Finally, the optimized VLSI architectures are presented for the RCSC and RLSC algorithms, respectively. The synthesis results demonstrate the efficiency of the proposed algorithms and architectures. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.
The FO-BP algorithm normally has a very high level ofparallelism. Thus, it has high computational complexity, andrequires 2N (log2N+1) memory entries. The ExpressJourney Belief Propagation algorithm simplifies themessage passing when certain nodes in the factor graph areencountered. Hence, the numbers of required comparisonsand additions within iteration are reduced. In, a lowcomplexity BP algorithm for polar codes was presented toreduce computational complexity as well as latency basedon the concept of a frozen connected sub-factor graph. Thedecoding algorithms and take the flooding schedule.The soft-cancelation (SCAN) decoder is another kindof BP decoder, based on a serial message update schedule,which mimics the SC decoding process. Compared with theFO-BP decoders, the SCAN decoder has muchlower computational complexity and requires less memory tostore the soft messages. Due to more efficient dissemination ofinformation, the SCAN algorithm converges much faster thanthe FO-BP algorithms.
While the BP algorithms for polar codes have been wellstudied, the corresponding architectures of the BP decodershave not been sufficiently investigated in the literature; particularly, the hardware architectures of the SCAN algorithmshave not been studied yet to the best of our knowledge.Due to the flooding schedule, the BP decoder architectures achievemulti-gigabit per second throughput. Undera 65-nm CMOS technology, for a (1024, 512) polar code, the BP polar decoder chip, consuming 1.48 mm2siliconarea, achieves a coded throughput of 4.68 Gb/s when thesignal-to-noise ratio (SNR) equals 4 dB. The (1024, 512) polar BP decoder in with embedded early stoppingfunction achieves a net information throughput of 4.5 Gb/susing 1.96 million gates under a 45-nm CMOS technology.The (1024, 512) polar BP decoder in achieves a throughput of 1.6 Gb/s with 0.747 mm2 silicon area under a 45-nmCMOS technology.
- Area coverage and power consumption is high
The following are the main contributions of thispaper.
1) A simplified left message update (SLMU) method isfirst proposed to simplify the calculation of certainlogarithmic likelihood ratio (LLRs). Compared with theSCAN and FO-BP algorithms, the proposed methodleads to a reduction in (N/2)(log2N−1) additionsduring each iteration.
2) Based on the SLMU method, the RCSC algorithm is proposed based on the binary tree representation of a polarcode. The RCSC algorithm needs only 5N−3memoryentries to store LLRs, as opposed toN((log2N/2)−1)and 4N−2+(Nlog2N/2) LLRs needed by theFO-BP  and SCAN decoders, respectively. Whenthe block length N=215, the proposed RCSC algorithm reduces the number of memory entries by around23% and 57%, respectively. In addition, more significant savings of the memory entries will be achievedwhenNfurther increases. In terms of the decodingperformance, for a (1024, 512) polar code, the RCSCalgorithm outperforms the FO-BP algorithm by around0.7 dB when the FER is around 10−5, while showingnegligible decoding performance degradation comparedwith the SCAN algorithm. For a (32 768, 29 504) polarcode, the RCSC algorithm outperforms the FO-BP andSCAN algorithms by around 0.2 and 0.05 dB when theFER is around 10−3. The performance gap is expectedto be much larger when the target FER is below 10−10,which is typical for networking applications.
3) It is proved that the non-recursive expressions of thereturned LLR vector is a result of the RCSC messageupdate schedule when a single parity check (SPC)or repetition (REP) node is activated. Hence,the adoption of the non-recursive expressions in theproposed RCSC algorithm does not lead to any decodingperformance degradation.
4) To further reduce latency and improve throughput of thecorresponding hardware decoders, the RLSC algorithmis proposed at the cost of error performance degradation. To be specific, a modified node activation (MNA)schedule is proposed to accelerate the calculation ofreturned LLR vectors when an arbitrary rate node (notan SPC or REP node) is activated. Based on the MNAschedule, two non-recursive algorithms are proposed tocompute the returned LLR vectors from arbitrary ratenodes with 8 and 16 leaf nodes, respectively.
5) Optimized decoder architectures for the RCSC andRLSC algorithms are proposed, respectively. Bothdecoders are synthesized for two polar codes withN =210and 215under the TSMC 90-nm CMOStechnology.WhenN=210, compared with theFO-BP decoder in , though our RCSC decoderhas 18% smaller area efficiency (AE), it achieves muchbetter decoding performance when the FER is lower.On the other hand, our RLSC decoder achieves 52%and 124% better AE compared with the decoders in and , respectively. Besides, the RLSC decoder witha maximum of two iterations still outperforms theFO-BP decoders when the FER is below 10−5.Theimplementation results for N=215demonstrate thatboth the proposed decoder architectures are suitable forpolar codes with large block length.
Proposed RCSC Decoding Algorithm:
Based on the SLMU method, the RCSC algorithm is proposed based on a binary tree representation of a polar code.WhenN=23, the corresponding binary tree representationis shown in Fig. 3, where black and white leaf nodes areassociated with information and frozen bits, respectively. During each iteration, the updating of both the left and the rightLLRs is accomplished by recursively activating all the nodesin the tree.
Proposed RLSC Decoding Algorithm:
Based on the RCSC algorithm, an RLSC algorithm is proposed to reduce the decoding latency of our RCSC algorithmat the cost of error performance degradation. Compared withthe RCSC algorithm, the proposed RLSC algorithm acceleratesthe computing of the returned LLR vector when an arbitraryrate node is activated.
1) Simplified LLR Computation for Arbitrary Rate Nodes:In this paper, for a node v in the tree representation, let pvdenote a frozen bit pattern vector, which indicates the locationsof leaf nodes associated with frozen bits among all the leafnodes, where pv consists of Nv letters. Each letter of pvcould beForD, indicating that the corresponding leaf node isassociated with a frozen or information bit, respectively.
2) Proposed Tree Pruning Method:Instead of working ona complete tree representation, the proposed RLSC algorithmis performed on a pruned tree. The pruning process is accomplished by the following three serial steps.
- Remove all the child nodes of each rate-0 node and rate-1 node.
- Remove all the child nodes of each SPC or REP nodewithNv ≤NT,whereNT is a predefined parameter.For an SPC or REP nodev, NT denotes the numberof leaf nodes belonging to nodev. The larger the NTvalue is, the more complex the corresponding hardwareimplementation is.
- Remove all the child nodes of each arbitrary rate nodewithNv =16.
- Area coverage and power consumption is low
- Xilinx ISE