High-Speed Parallel LFSR Architectures Based on Improved State-Space Transformations
Linear feedback shift register (LFSR) has been widely applied in BCH and CRC encoding. In order to increase the system throughput, the parallelization of LFSR is usually needed. Previously, a technique named state-space transformation was presented to reduce the complexity of parallel LFSR architectures. Exhaustive searches are performed to find good transformation matrix candidates. This brief proposes a new technique for construction of the transformation matrix together with a more efficient searching algorithm. The realization results indicate that the proposed architecture outperforms the prior arts, improving the hardware efficiency by around 35% and the corresponding searching algorithm finds the desirable transformation matrix much faster. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.
Linear feedback shift registers (LFSRs) are extensively applied in BCH and CRC encoders to compute the remainder polynomial. A conventional encoding scheme for BCH(n, k) code is shown in Fig. 1. Although such a serial LFSR architecture can operate at very high frequency, it suffers from the inherent serial-in and serialout limitation. When the throughput of this serial architecture cannot catch up with the system data rate, parallel processing must be considered to meet the general requirements of high-speed communications and realize a higher throughput rate beyond gigabits per second such as in optical transmission. Some kinds of parallel LFSR architectures have already been presented in the related literature for BCH and CRC encoders. A parallel CRC implementation has been designed through mathematical deduction. Tree-structured computation and subexpressions sharing are adopted to optimize the cascaded logic parts. Parhi and Zhang and Parhi proposed an improved high-speed BCH encoder designed to eliminate the fanout bottleneck. This parallel LFSR architecture is efficient in terms of speeding up the computation, but its hardware cost is high. A kind of state-space transformation is developed and to reduce the complexity of the conventional parallel CRC circuits. By adopting linear matrix transformation, a full speedup factor can be achieved at the cost of an additional circuitry outside the feedback loop. Ayinala and Parhi and Junget al. proposed another kind of parallel LFSR based on IIR filter topology and its advantage is that the pipeline technique can be applied to achieve some improvement in the hardware efficiency of LFSR encoders.
Among these parallel architectures, Ayinala and Parhi and Junget al. achieve comparatively efficient hardware implementations with the loop update equations. In state-space transformation, the encoder is composed of some matrix multiplications, with the most effort devoted to searching of transformation matrices that may achieve the minimal-area circuits. In this brief, a new transformation matrix construction and an approximate searching method are proposed. Using this approximate algorithm, the desirable transformation matrix can be found in a much shorter time than exhaustive searches. A low-complexity and high-speed parallel LFSR architecture can be derived by applying the state-space transformation with the desirable solution. As a result, the hardware complexity can be effectively reduced without sacrificing performance or speed.
Figure 1: Conventional serial LFSR architecture.
A conventional serial LFSR architecture for BCH(n,k) code is illustrated in Fig. 1. The symbol ⊕indicates an XOR operation, and the symbol ⊗indicates a finite-field multiply operation. In binary codes, since the coefficient of the generator polynomial gi (0≤i ≤n−k) equals either 0 or 1, the corresponding multiply can be simplified to a connection or disconnection. Registers denoted asr0,r1,…,rn−k−1represent the remainder polynomial. Two kinds of typical parallel LFSR architectures are introduced next.
- Low speed
The transformation matrix used is chosen such that matrix ApT is a companion matrix, which will simplify the feedback loop of the parallel architecture. Unfortunately, when Ap is transformed into a companion matrix ApT, matrix T and Bp T may become complicated and dense with many 1s. Even the feedback loop is fast and of low complexity as expected; the other parts of the encoder may have a longer critical path with high complexity. After applying pipelining and retiming techniques to reduce the critical path, the data-path timing is still not satisfying and brings extra hardware cost as well.
On the other hand, the(n−k)×(n−k)matrix T has 2(n−k)×(n−k) possibilities, but the vector b1 has only 2 n−k possibilities. This difference indicates that only a minority of possibilities of T are considered if using b1 to generate T. The transformation matrix derived from optimalb1may not be the optimal one among all the possibilities of T. Since there are other types of matrices that may transform the circuits into more efficient designs, some attempts can be devoted to improve the method for constructing matrix T and to further optimize the hardware implementations with state-space transformations. In order to find a better transformation matrix, some constraints need to be defined here.
Basic Searching Algorithm:
Six CRC or BCH codes are referenced here as examples. Their generator polynomials are listed in Table I. In order to find thedesirable transformation matrixT, an exhaustive search is performed in the vector space of b2. The parallel level por the input size is chosen as the degree of generator polynomials for a clear comparison with previous architectures. The proposed method can be applied at any parallel level.
Approximate Searching Algorithm:
For eachb2, its corresponding T, ApT, and BpT can be uniquely decided and the TN can be counted. Using n−k=4asanexample, all the possible (n−k)-dimensional b2 vectors can be divided into n−k layers in a tree structure as shown in Fig. 2. The root node (the only node in layer 1) is set as [100…0], and each node in layer i is obtained by adding one “1” in the node of layer i −1. We denote this pair of nodes in adjacent layers as the “father node” and “child node,” and use a line to connect them. Constructed in this way, layer i actually contains all theb2vectors with i 1s.
Figure 2: All the possible values of a 4-D vector b2
- High speed
- Xilinx ISE