# DESIGN OF LOW COMPLEXITY HIGH-SPEED PULSE-SHAPING IIR FILTERS FOR MOBILE COMMUNICATION RECEIVERS

A.P.Vinod and E.M-K.Lai

School of Computer Engineering, Nanyang Technological University Nanyang Avenue, Singapore 639798 Email: {asvinod, asmklai}@ntu.edu.sg

# ABSTRACT

Low-complexity and high-speed design of infinite impulse response (IIR) filters focuses on minimizing the number of logic operators and the logic depth in the implementation of the coefficient multipliers. Although the Bull-Horrocks Modified (BHM) and the *n*-dimensional Reduced Adder Graph (RAGn) algorithms result in considerable reduction of logic operators (LO), it uses the highest logic depth (LD) and hence the resulting IIR filters are not optimized in view of filtering speed. The Hartley algorithm proposed for finite impulse response (FIR) filters offers a reduction in LD, but it requires additional LO respect to BHM technique. A method to minimize the number of LO and the LD in the multipliers of pulse-shaping IIR filters by efficiently combining Hartley's horizontal common subexpressions and vertical common subexpressions that occur across the filter coefficients is proposed here. Design examples of pulse-shaping filters employed in a dual-mode GSM/W-CDMA receiver show that our method offers reduction of complexity as well as delay when compared with earlier methods

## **1. INTRODUCTION**

IIR filters offer a hardware efficient solution for a given filter characteristic than their FIR counterparts since they can be realized using lower order. Efficient linear phase IIR filter designs based on allpass subfilters have been presented for pulse shaping filters in digital transmission systems [1, 2]. It has been shown that matched transmit-receive filter pairs with very low intersymbol interference (ISI) can be designed using IIR filters [3]. IIR filters employed in mobile communication systems present a hardware design challenge due to the constraints of speed and power consumption.

The number of adders or subtractors (called LO) in the implementation of the coefficient multipliers determines the complexity of implementation of digital filters. The multiplier block (MB) approaches in [4]-[7] minimize LO by extracting the common parts (common subexpressions) among the coefficients, which are represented in canonic signed digit (CSD) form. The Bull-Horrocks algorithm (BHA) in [4] used a graph representation of MB for reducing the number of LO in IIR filters. Two methods that further reduce the number of LO have been presented in [5], called the Bull-Horrocks Modified (BHM) algorithm and the *n*-dimensional Reduced Adder Graph (RAG*n*)

algorithm. As the partial sums generated in multiplication are added in a serial manner, the algorithms in [5] have a limitation in reducing the LD. (The number of adder-steps (AS) in a maximal path of decomposed multiplications is called LD). Therefore, filters designed using the algorithm in [5] are not suitable for high-speed applications. The Hartley algorithm [6] proposed for designing FIR filters eliminates redundant computations in MB by employing the most common horizontal subexpressions, [1 0 1] and [1 0 -1], among the CSD coefficients. Though the Hartley algorithm reduces the LD, it requires more adders than the BHM method. The method in [5] is modified in [7], taking into account both the delay and the number of adders by employing the Step-Limiting BHM (SLBHM) and the Step-Limiting RAGn (SLRAGn) algorithms. However, these algorithms are based on exhaustive search methods, which are time consuming and more difficult to translate into hardware. In this paper, an efficient MB reduction algorithm that combines Hartley's horizontal common subexpressions (HCS) with vertical common subexpressions that exist across the coefficients to reduce the number of adders as well as the delay in pulse-shaping IIR filters is presented.

The paper is organized as follows. The common subexpression sharing method is discussed in section 2. The MB reduction algorithm is presented in section 3. In section 4, we illustrate the implementation of pulse-shaping IIR filters with design examples and provide comparison of the hardware reduction achieved using our method with earlier methods. Section 5 provides our conclusions.

# 2. COMMON SUBEXPRESSION ELIMINATION

The goal of Hartley's horizontal common subexpression elimination (HCSE) [6] is to identify multiple occurrences of identical bit patterns that are present *within* each coefficient and eliminate these redundant computations. Consider the CSD coefficient  $h_k$ , 0.101001010101. Using direct method (i.e., implementation without using HCSE), five adders (one less than the number of nonzero bits in  $h_k$ ) are required to compute the output  $y_k$ . The pattern [1 0 1] in the above-mentioned example is present thrice, which can be expressed as a HCS,  $x_2 = x_1 + x_1 >> 2$ , where  $x_1$  is the input signal and '>>' represents shift right operation. Using the HCS, the output of the filter can be expressed as

$$y_k = x_2 >> 1 + x_2 >> 6 + x_2 >> 10 \tag{1}$$

Hence the filter structure optimized using the HCSE requires two adders less than the structure obtained using direct method. Thus using HCSE, the number of adders required to implement the MB of the filter structure is minimized.

#### 2.1 Proposed Subexpression Sharing Method

We observe that many common subexpressions exist *across* the CSD representation of the coefficients of an IIR filter. Further reduction of adders can be achieved by efficiently combining these vertical common subexpressions (VCS) and Hartley's most commonly occurring HCS, [1 0 1] and [1 0 –1]. The transposed direct form-I IIR filter structure shown in Fig. 1 is considered to illustrate our method. Note that a single MB can be formed by exploiting the redundancy in multiplication of the signal  $w_1(n)$ , with the coefficients  $a_i$ 's and  $b_i$ 's.



Fig. 1. Transposed direct form-I IIR filter structure.

The test filter 4 (a 5-tap elliptic IIR filter) in [7] is used to illustrate our method. The cutoff frequency  $\omega_n$  is 0.1 and the wordlength is 10 bits. The nonzero bits of the coefficients,  $a_i$ 's and  $b_i$ 's, in 10-bit CSD form are shown in Fig. 2.

|                       | -1 | -2 | -3 | -4                | -5 | -6            | -7         | -8 | -9 | -10 |
|-----------------------|----|----|----|-------------------|----|---------------|------------|----|----|-----|
| $b_0$                 |    | 1  | 0  | 1                 |    | 1             |            | -1 | 0  | -1  |
| <i>b</i> <sub>1</sub> |    |    | -1 | $\Leftrightarrow$ | 1  |               | 1          |    |    | -1  |
| <i>b</i> <sub>2</sub> |    |    | 0  | 1                 | 0  | 1             | 0          | -1 | 0  | -1  |
| <i>b</i> <sub>3</sub> |    |    | -1 |                   | 1  |               | 1          |    |    | -1  |
| <i>b</i> <sub>4</sub> |    |    | -1 | 0                 | -1 | <             | <i>d</i> > |    | 0  | 1   |
| <i>a</i> <sub>1</sub> | _  |    | 1  |                   |    |               |            |    | 1  |     |
| <i>a</i> <sub>2</sub> | 1  | 0  | 1  | <                 |    | -1            | 0          | -1 |    | 1   |
| <i>a</i> <sub>3</sub> | 1  |    |    |                   | -1 | $\bigvee_{s}$ | 1          |    |    | 1   |
| <i>a</i> <sub>4</sub> | 0  |    | 1  | 0                 | 1  |               | 0          | [1 | 0  |     |
| <i>a</i> <sub>5</sub> | 1  |    |    |                   | -1 |               | 1          |    |    | 1   |

Fig. 2. Combined horizontal and vertical common subexpressions in 5-tap elliptic IIR filter coefficients.

Let  $N_{ba}$  and  $N_{bb}$  are the numbers of nonzero bits in  $a_i$ 's and  $b_i$ 's respectively. The number of adders,  $T_a$  and  $T_b$ , required to implement the multiplications,  $a_i.w_1$  and  $b_i.w_1$  respectively, without employing any MB reduction technique are

$$T_a = N_{ba} - 1 - (N - 1)$$
 and  $T_b = N_{bb} - 1 - (N - 1)$  (2)

In this case,  $N_{ba}$  and  $N_{bb}$  are 19 and 21, N is 5, and hence  $T_a$ and  $T_b$  are 14 and 16 respectively. Therefore, 30 adders (i.e.,  $T_a + T_b$ ) are required without MB reduction. If  $w_1$  is the signal (same as  $w_1(n)$  in Fig. 1), the HCS [1 0 1] indicated using bold rectangles in Fig. 2 is  $w_2 = w_1 + w_1 >> 2$ . The HCS are formed with a look-ahead approach such that the algorithm will extract the maximum number of subexpressions (HCS and VCS) from the CSD form. The VCS [1 0 1] indicated using dotted rectangles in Fig. 2 is  $w_3 = w_1 + w_1[-2]$ , where [-k] represents delay by k units. By combining HCS ( $w_2$ ) and VCS ( $w_3$ ), the filter output corresponding to the coefficients  $b_i$  's is

$$y_b = w_2 >> 2 + w_1 >> 6 - w_2 >> 8 - (w_3[-1] >> 3 - w_3[-1] >> 5) + w_3[-1] >> 7 - w_3[-1] >> 10 + w_2[-2] >> 4 - w_2[-2] >> 8 - (w_2[-4] >> 3 - w_2[-4] >> 8)$$
(3)

The output corresponding to the coefficients  $a_i$ 's is

$$y_a = w_1 >> 3 + w_1 >> 9 + (w_2[-1] >> 1 - (w_2[-1] >> 6) + w_1[-1] >> 10 + w_3[-2] >> 1 - (w_3[-2] >> 5 - w_3[-2] >> 7) + w_3[-2] >> 10 + w_2[-3] >> 3 + w_3[-3] >> 8$$
(4)

Thus, it requires only 22 adders to implement the filter, one each for HCS and VCS, and twenty for filter outputs (3) and (4). Further reduction of adders can be achieved by exploiting multiple occurrences of identical shifts between two adjacent VCS or HCS. From the coefficient set in Fig. 2, two such expressions occur:

(i) The VCS,  $[-1 \ 0 \ -1]$  and  $[1 \ 0 \ 1]$  with a shift between them (indicated by solid connecting lines, 's') that occur across the coefficients,  $[b_1, b_3]$  given by  $-(w_3[-1] >> 3 - w_3[-1] >> 5)$  in (3), and  $[a_3, a_5]$  given by  $-(w_3[-2] >> 5 - w_3[-2] >> 7)$  in (4). The latter expression can be obtained from the former using simple delay [-k] and right shift (>>) operations as

$$-(w_{3}[-1] >> 3 - w_{3}[-1] >> 5) \xrightarrow{[-1]}{\rightarrow} \xrightarrow{>>2} -(w_{3}[-2] >> 5 - w_{3}[-2] >> 7)$$
(5)

(ii) The HCS,  $[-1\ 0\ -1]$  and  $[1\ 0\ 1]$  with two shifts between them (indicated by dotted connecting lines, 'd') that occur within the coefficient  $b_4$  (i.e.,  $-(w_2[-4] >> 3 - w_2[-4] >> 8)$  in (3)) and its negated version in the coefficient  $a_2$  (i.e.,  $(w_2[-1] >> 1 - w_2[-1] >> 6)$  in (4). In this case, it is possible to obtain the former expression from the latter using delay, right shift and negation operations as

$$(w_{2}[-1] >> 1 - w_{2}[-1] >> 6) \xrightarrow{[-3]}{\rightarrow} \xrightarrow{>2} (-) - (w_{2}[-4] >> 3 - w_{2}[-4] >> 8)$$
(6)

Note that the cost of shifts is assumed to be negligible since they can be hardwired. Thus, using these common subexpressions (CS) with identical shifts between them, the number of adders is reduced by two. The high-speed tree-structured multiplier that we proposed in [8] is used to implement the MB. Since the tree structured multiplier performs parallel addition, our structure computes the sum of the partial products using least number of adder-stages. Therefore, our method offers improved speed.

Note that only fourteen MBA are required to realize the filter using our method as shown in Fig. 3. A comparison of the number of MBA required for the filter using various methods are shown in Table I. In this table, 'Direct' means the implementation without employing any MB reduction technique. The first column is the number of adder-steps (NAS) for the MB optimized by the algorithms listed in the first row. The BHM algorithm results in 18 adders and 6 adder-steps whereas using the RAGn algorithm, only 16 adders and 5 adder- steps are required. However, both these methods require a larger number of adder-steps, which is undesirable for high-speed applications. The modified BHM and RAGn algorithms (SLBHM and SLRAGn) offer an improvement in speed by reducing the number of adder-steps to 3 at the cost of a slight increase in the number of adders (20 and 19 respectively). Our method offers the best reduction (14 adders and 3 adder-steps) in terms of the number of adders as well as the number of adder-steps. When compared with the method in [7], the reduction of adders achieved in proposed method is 26% for the same number of adder-steps.

#### 3. MB REDUCTION ALGORITHM

Definition 1 (HCS index): The HCS index (HI) of a coefficient is the number of HCS in its CSD representation. Statistically, the two most common HCS in the CSD representation of digital filters were  $[1 \ 0 \ 1]$  and  $[1 \ 0 \ -1]$  [6], and these two HCS are considered to obtain the HI.

Definition 2 (VCS index): The maximum number of VCS that a coefficient can form with its neighborhood coefficients is called the VCS index (VI). The four most common VCS,  $[1 \ 1]$ ,  $[1 \ -1]$ ,  $[1 \ 0 \ 1]$ , and  $[1 \ 0 \ -1]$  are considered to obtain the VI.

The core of our method relies on extracting the maximum number of CS from a given coefficient set represented in CSD. The algorithm consists of two main tasks. Firstly, the HI's and the VI's of all the coefficients are computed and a *CS index matrix* (CSIM) is formed with the HI's and VI's as its elements. In the second step, the best CS are obtained from the CSIM based on their index values. The steps are summarized below.

Step 1: Design the IIR filter, h(n), according to the desired specification and obtain the CSD representation of the coefficients,  $a_i$ 's and  $b_i$ 's, for a specific wordlength.

Step 2: Determine the HI's and VI's of the coefficient set and form the CSIM (7). Note that  $HI_i$  is the HI of the 'i'th coefficient and  $VI_{i,j}$  is the VI of the VCS across the coefficients

h(i) and h(j) and L is the number of filter taps.

*Step 3:* The CSIM is scanned row wise to find the largest index. While selecting the largest index, a look-ahead is required for the

VI's as the selection at one level must consider how it influences the possible VCS selections at the next level.

This is done as follows. Set  $i_0 = 1$  initially.

(i) Compute the largest VCS index,  $VI_{\max_{(i_0-1),j_m}}$  for  $j_m = (i_0, i_{(0+1)})$  of the ' $i_0$ 'th row from  $C[h_{i,j}]$ , i.e., from  $(VI_{(i_0-1),i_0} VI_{(i_0-1),(i_0+1)})$ . If  $VI_{\max_{(i_0-1),j_m}}$  is less than  $HI_{(i_{0-1})}$ , choose  $HI_{(i_{0-1})}$  as the largest index of ' $i_0$ 'th row and set  $VI_{\max_{(i_0-1),j_m}}$  to zero.

(ii) If  $VI_{\max_{(i_0-1),j_m}}$  is greater than  $HI_{(i_{0-1})}$ , choose  $VI_{\max_{(i_0-1),j_m}}$ as the largest index of  $'i_0'th$  row if it is greater than  $VI_{\max_{i_0,(i_0+1)}}$ . Otherwise, choose  $HI_{(i_{0-1})}$  as the largest index of  $'i_0'th$  row. In this case, the original largest index  $VI_{\max_{(i_0-1),j_m}}$ 

in  $'i_0'th$  row is omitted because of the occurrence of an index greater than the former in subsequent row, which will form a better choice of CS for that row.

Step 4: If  $i_0 \le L$ , set  $i_0 = i_0 + 1$ ; go to step 3 and find all the best CS in the remaining rows of  $C[h_{i, i}]$ . Otherwise go to step 5.

*Step 5:* The CS are examined for multiple occurrences of identical shifts between them to achieve further reduction of adders using simple shift and delay operations.

### 4. EXPERIMENTAL RESULTS

The dual-mode GSM/W-CDMA receiver architecture in [9] is considered to illustrate our design. The IF block shown in Fig. 4 performs frequency conversion for W-CDMA and channel extraction from a wide-band received signal for GSM. The input bandwidth of an IF signal covers one channel of 5 MHz in W-CDMA. Our method is applied to realize the elliptic IIR filters employed as pulse-shaping filters,  $H_1(z)$ , in W-CDMA mode (to achieve an attenuation of -40 dB at 5 MHz) and  $H_2(z)$ , in GSM mode (shaping to attenuate the block signal at 200 kHz). For the GSM mode filter  $H_2(z)$ , the number of taps is 7,  $\omega_n$  is 0.25 whereas these parameters are 9 and 0.125 respectively for the W-CDMA mode filter  $H_1(z)$ . The coefficient wordlength considered is 16 bits. The results are shown in Tables II and III. Our method results in considerable reduction of adders and adder-steps over the BHA, BHM, and RAGn algorithms. Both our method and the SLBHM and SLRAGn methods in [7] result in an identical number of adder-steps. However, our method offers an average adder reduction of 25% over the method in [7].





Fig. 4. IF architecture for dual-mode GSM/W-CDMA receiver.

Fig. 3. Implementation of test filter 4 using the proposed MB reduction method.

TABLE I Number of adders for test filter 4 in [7]

| Number of adders for test filter 4 in [/]. |        |     |     |     |              |       |  |  |  |
|--------------------------------------------|--------|-----|-----|-----|--------------|-------|--|--|--|
| NAS                                        | Direct | BHM | RAG | SL  | SL           | Prop- |  |  |  |
|                                            |        |     | n   | BHM | RAG <i>n</i> | osed  |  |  |  |
| 3                                          | 30     |     |     | 20  | 19           | 14    |  |  |  |
| 5                                          |        |     | 16  |     |              |       |  |  |  |
| 6                                          |        | 18  |     |     |              |       |  |  |  |

TABLE III Number of adders for W-CDMA mode pulse shaping filter.

| NAS | Direct | BHM | RAG | SL  | SL   | Prop- |
|-----|--------|-----|-----|-----|------|-------|
|     |        |     | п   | BHM | RAGn | osed  |
| 3   | 42     |     |     | 27  | 25   | 19    |
| 4   |        |     | 23  |     |      |       |
| 7   |        | 23  |     |     |      |       |

## 5. CONCLUSIONS

We have proposed an efficient multiplier block reduction method to implement pulse-shaping IIR filters that take into account the delay and the number of adders. Experimental results show that our algorithm can reduce the number of adders while achieving the lower bound value for adder-steps when compared with previous methods. When compared with the exhaustive search method in [7], our algorithm is fast and leads to a straightforward layout, thus simplifying hardware synthesis. This implies that the IIR filters designed using our algorithm are superior in terms of area and speed over the methods proposed earlier.

#### 6. **REFERENCES**

[1] M. Renfors and T. Saramaki, "A class of approximately linear phase digital filters composed of allpass subfilters", in *Proc. IEEE Int. Symp. Circuits Syst.*, pp. 678-681, May 1986.

TABLE II Number of adders for GSM mode pulse shaping filter

|     |        |     |     | Frank P |      |       |  |  |
|-----|--------|-----|-----|---------|------|-------|--|--|
| NAS | Direct | BHM | RAG | SL      | SL   | Prop- |  |  |
|     |        |     | п   | BHM     | RAGn | osed  |  |  |
| 3   | 27     |     | 20  | 23      | 20   | 15    |  |  |
| 4   |        | 20  |     |         |      |       |  |  |

[2] M. Renfors and T. Saramaki, "Pulse-shaping filters for digital transmission systems", in *Proc. IEEE Globecom*, vol. 1, pp. 467-471, Dec. 1992.

[3] M. Renfors and K. Vaisanen, "Efficient IIR filters for pulseshaping and jitter free frequency error detection and timing recovery," in *Proc. IEEE Globecom*, vol. 2, pp. 1440-1444, Dec. 1995.

[4] D. R. Bull and D. H. Horrocks, "Realization techniques for primitive operator infinite impulse response digital filters," in *Proc. Int. Symp. Circuits Syst.*, vol. 1, pp. 607-610, 1993.

[5] A. G. Dempster and M. D. Macleod, "Use of minimum-adder multiplier blocks in FIR digital filters," *IEEE Trans. Circuits Syst. II*, vol. 2, no. 9, pp. 569-577, Sept. 1995.

[6] R. I. Hartley, "Subexpression sharing in filters using canonic signed digit multipliers," *IEEE Trans. Circuits Syst. II*, vol. 43, pp. 677-688, Oct. 1996.

[7] Hyeong-Ju Kang and In-Cheol Park, "Multiplier-less IIR filter synthesis algorithms to tradeoff the delay and the number of adders," in *Proc. Int. Symp. Circuits Syst.*, vol. 2, pp. 693-696, 2001.

[8] A. P. Vinod, E. M-K. Lai, A. B. Premkumar and C. T. Lau, "Low-complexity filter bank channelizer for wideband receivers using minimum adder multiplier blocks," *Proceedings of the IEEE International Conference on Communications*, pp. 2699-2672, Paris, France, June 2004.

[9] M. Jian, W. H. Yung, and B. Songrong, "An efficient IR architecture for dual-mode GSM/W-CDMA receiver of a software radio," in *Proc. IEEE International Workshop on Mobile Multimedia Communications*, 99.21-24, Nov. 1999.