# Transactions Briefs

# A Comparative Analysis of Low-Power Low-Voltage Dual-Edge-Triggered Flip-Flops

Wai Chung, Timothy Lo, and Manoj Sachdev

*Abstract*—This paper compares four previously published static dual-edge-triggered flip-flops (DETFFs) with a proposed design for their performance, power dissipation, and low-voltage low-power applications. For each DETFF, the optimal delay, power consumption, and power-delay product are determined as the primary figures of merit. The proposed design is shown to have the least energy at low voltages.

Index Terms—Digital CMOS, flip-flop, low power, low voltage, VLSI.

## I. INTRODUCTION

CMOS has been the dominant technology for VLSI implementations. As VLSI circuits continue to grow and technologies evolve, the level of integration is increased and higher clock speeds are achieved. Higher clock speeds, increased levels of integration and technology scaling are causing unabated increases in power consumption. As a result, low power consumption is becoming a critical issue for modern VLSI circuits. Furthermore, power dissipation, dynamic and static, has become a limiting factor for transistor performance, long term device reliability, and increasing integration [1]. Moreover, as we aggressively scale devices toward deep-submicron technologies, scaling paths for high performance and low power applications diverge [2]. For battery operated systems, low power dissipation requirements are well understood and followed. Whereas, for high performance ICs, reducing the delay has been the main objective, and power containment was secondary. However, recent research shows that power containment for high performance applications is becoming critical for reliability, transistor performance, and cooling considerations [1].

One of the significant components of the dynamic power consumption is the clock related power. The total clock related power dissipation in synchronous VLSI circuits is further divided into three major components [3]: i) power dissipation in the clock network; ii) power dissipation in the clock buffers; and iii) power dissipation in the flip-flops. The total power dissipation of the clock network depends on both the clock frequency and the data rate, and can be computed as follows:

$$P_{\rm clk} = V_{dd}^2 [f_{\rm clk} (C_{\rm clk} + C_{ff,\rm clk}) + f_{\rm data} C_{ff,\rm data}] \tag{1}$$

where

1

| $f_{ m clk}$      | clock frequency;                                               |
|-------------------|----------------------------------------------------------------|
| $f_{ m data}$     | average data rate;                                             |
| $C_{\rm clk}$     | total capacitance seen by the clock network;                   |
| $C_{ff,clk}$      | capacitance of the clock path seen by the flip-flop;           |
| $C_{\rm ff,data}$ | capacitance of the data path seen by the flip-flop.            |
| From (1), i       | t is obvious that the clock power can be reduced if any of     |
| the parameter     | s on the right-hand side is reduced. The reduction of $V_{dd}$ |

Manuscript received September 19, 2001; revised April 26, 2002. This work

was supported by an NSERC Strategic Grant 224135-99. The authors are with the Digital Design and Test Group, Electrical and Computer Engineering Department, University of Waterloo, Waterloo, Ontario N2L

puter Engineering Department, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada (e-mail: wmchung@vlsi.uwaterloo.ca; tlo@vlsi.uwaterloo.ca; msachdev@ece.uwaterloo.ca).

Digital Object Identifier 10.1109/TVLSI.2002.808429

is already the trend of contemporary design, and it has the strongest impact on the  $P_{\rm clk}$  expression. By reducing the overall capacitance of the clock network  $C_{\rm clk}$ , the power dissipation may be reduced. For instance, the capacitance can be reduced by proper design of clock drivers and buffers. Similarly, by reducing the capacitance inside a flip-flop,  $C_{\rm ff,clk}$  and  $C_{\rm ff,data}$ , power may also be reduced. Furthermore, the clock power dissipation is linearly dependent on the clock frequency. Although the clock frequency is determined by the system specifications, the usage of dual-edge-triggered flip-flops (DETFFs) can reduce the clock frequency to half of its original value for the same data throughput. As a result, power consumption is reduced, making DETFFs desirable for low power applications. Even for high-performance applications, the usage of DETFFs offer certain benefits. Since the clock speed is reduced by a factor of two, one does not need to propagate a relatively high speed clock signal.

Although many DETFFs have been proposed, their use is still uncommon. There are several reasons why DETFFs are not popular in VLSI circuits. In DETFFs, latches are connected in parallel, which increase the input capacitance. Therefore, the setup and hold times of DETFFs are typically larger compared to that of conventional flip-flops [4]. Thus, DETFFs become less attractive for high-performance applications. DETFFs also pay a penalty in the design area [4], [5]. The larger number of transistors and increased interconnects make the footprint of a DETFF much larger than that of a conventional flip-flop. This increases the parasitic capacitances, which decreases the performance of DETFF. In addition, a DETFF captures data on both clock edges, therefore, a duty cycle of 50% is required. Deviation from a 50% duty cycle may lead to timing failures in the critical paths. As such, the specification on jitter tolerance is more stringent, which increases the design complexity of the system phase lock loop.

To date, a systematic comparison of DETFFs, targeting both performance and power dissipation, has not been reported. This article is focused on the applicability of DETFFs in low-power and low-voltage applications. Section II states the analysis methodology used in this paper. Section III describes all the DETFFs investigated in this paper, including a newly proposed DETFF. Section IV outlines the simulation testbench and parameters. In addition, the DETFF optimization procedure is also explained in this section. Simulation results are reported in Section V. Finally, the discussion and conclusions are drawn in Section VI.

#### II. ANALYSIS

Several metrics are available for comparative analysis of digital circuits. For example, power consumption, delay and latency, power delay product (PDP), energy delay product (EDP), and energy delay squared product  $(ED^2P)$  have been reported by several researchers [6], [7]. In general, a PDP-based metric is appropriate for low power portable systems in which the battery life is the primary index of energy efficiency. This is in contrast with EDP or  $ED^2P$ , where delay is weighted more heavily for high performance systems [6]. In this paper, we are primarily interested in DETFF usage for low-power low-voltage applications. Therefore, we selected PDP as the figure of merit. In particular, our analysis is similar to the comparative technique described by Stojanovic *et al.* [8]. Their study establishes a set of guidelines for objective comparisons of single-edge-triggered (SET) latches and flip-flops. The details of power and delay parameters employed in this study are defined in Section II-A and B.

## A. Power

There are three main components of power dissipation of a flip-flop.

- a) Internal power dissipation of the flip-flop represents the power consumed inside the flip-flop including the power dissipated driving the output load.
- b) Local clock power dissipation represents the portion of the power dissipated in the clock buffer that is driving the clock input of the flip-flop.
- c) Local data power dissipation represents the portion of the power dissipated in the logic gate that is driving the data input of the flip-flop.

The sum of these three components is referred to as the total power  $(P_{\text{TOT}})$ . All three components of power require independent estimation in any comparative analysis because, inherently, a tradeoff exists between the three. If a comparison is made without taking all three components into account, it may indicate misleading results.

# B. Delay

There are two delay parameters of interest in this study. The first delay is the time measured between the clock edge and the output edge, or  $t_{CQ}$ . The second delay is the time measured between the input data edge and the output edge, or  $t_{DQ}$ . The latter parameter is often referred to as the latency of a flip-flop. For a DETFF, latency is computed indirectly as the maximum  $t_{DQ}$  of a rising and a falling data transitions for both rising and falling clock edges. Thus, the delay is taken as the maximum value from the measurements of all combinations of data and clock transitions, i.e., rising clock-rising data, rising clock-falling data, falling clock-rising data, and falling clock-falling data. Latency can also be computed as the sum of the setup time and the  $t_{CQ}$ . For this study,  $t_{CQ}$  and  $t_{DQ}$  are both used as delay parameters. Latency is significant because in synchronous system, the system's cycle time depends on the longest delay of the network [9]. However,  $t_{CQ}$  is equally important for this comparison since the setup time is also often a function of the independent variable of the simulations. This is true in the optimization process where changes in the transistor width affects the setup time as well as in the case where the independent variable is the supply voltage.

For completeness, the setup and hold times, the maximum data rate and total transistor width are included as additional flip-flops performance metrics. Total transistor width is used as a measure of the flip-flop area, since the physical layout is not available at this point. However, these parameters are not the focus of this paper.

#### **III. DETFF IMPLEMENTATIONS**

We have analyzed four previously reported static DETFFs. The  $P_{\text{TOT}}$ , delay, PDP with respect to  $t_{CQ}$  ( $PDP_{CQ}$ ), and PDP with respect to  $t_{DQ}$  ( $PDP_{DQ}$ ) of these flip-flops are compared with a newly proposed DETFF.

## A. DETFF Implementations

The flip-flop  $DET_{gago}$  proposed in [10] is illustrated in Fig. 1. Nodes N2, N3, N4, and N5 represent parallel connections between input buffers and latches. The appropriate phase of clock and its complement connects and disconnects the input buffers and storage elements from the power supply and ground. As a result, it has potential for low power applications. Although the complete isolation of the active and inactive parts of the circuit helps in power saving, but it leads to a larger delay. Fig. 2 shows the circuit implementation of  $DET_{llop is}$  proposed in [3] which is a modified version of the DETFF proposed earlier in [5]. Complementary logic gates are employed here to balance the output rise and fall times of the original DETFF.



Fig. 1. DET<sub>gago</sub> proposed in [10].



Fig. 2.  $DET_{llopis}$  proposed in [3].



Fig. 3. DET<sub>pedram</sub> proposed in [11].

Furthermore, it improves the PDP at the expense of increased total transistor width.

Pedram *et al.* proposed a DETFF that is shown in Fig. 3 [11]. In  $DETFF_{pedram}$ , the role of the clock enable signal and the input data signal is reversed in the feedback transmission gate loops. Another DETFF illustrate in Fig. 4, DET<sub>strollo</sub>, is proposed by Strollo *et al.* in [12]. This DETFF is a single-latch DETFF. Its operation is based on pulse triggering that is created by its internal clock buffers. The size of the pulsewidth is crucial in this design. Hence, the proper operation of this DETFF is highly dependent on the internal clock buffer sizing and the propagation delay of the internal clock buffers.

The proposed DETFF,  $DET_{proposed}$ , is illustrated in Fig. 5. It consists of two storage elements. A true and complement combination of input data and clock signals controls the latching of the data value in the storage elements. The main advantage of this configuration is the ability to avoid stacking PMOS transistors. As a consequence, low voltage and low power operation becomes feasible.



Fig. 4. DET<sub>strollo</sub> proposed in [12].



Fig. 5. The configuration of  $DET_{proposed}$ .

## IV. SIMULATION

A tradeoff between speed and power consumption is often possible, and it is normally determined by the application. Hence, a given flip-flop can either be optimized for high performance or low power. However, when both power dissipation and performance are critical, one desires to determine a design that operates at the optimum. At this point, the power-delay product is minimum, i.e., optimal energy utilization for a given clock frequency. However, since the optimal delay and power parameters cannot be obtained in a single step, the PDP optimization procedure is often iterative [8].

## A. Testbench

Table I depicts some of the simulation parameters. For this study, 0.18  $\mu$ m CMOS technology is used. Apart from the supply voltage analysis, all simulations are carried out at nominal conditions:  $V_{DD} = 1.8$  V and at room temperature (25°C). The clock frequency is kept at 500 MHz. This clock frequency for the DETFFs is equivalent to 1 GHz for a single edge triggered flip-flop.

The testbench for this study is illustrated in Fig. 6. Additionally, input buffers are used to provide realistic clock and data signals. A fanout of five inverters is used as the nominal load for each DETFF. This load is estimated to be approximately 32 fF. These inverters, in turn, drive a capacitive load  $C_L$  of 25 fF each, to simulate the loading from the previous logic stages, as well as the following stages. All the measurements are taken over a 16-cycle data sequence of alternating 1's and 0's. As aforementioned, the total power dissipation is composed of three components. They are represented and calculated in the testbench as follows:

- a) Local data power represents the portion of power dissipated in the grey inverter driving the data input of the flip-flop.
- b) Local clock power represents the portion of power dissipated in the black inverter which drives the clock input of the flip-flop.
- c) Internal power consumption is the intrinsic power dissipated on switching the internal nodes of the flip-flop.

TABLE I CMOS Simulation Parameters

| 0.18 μm<br>MOSFE<br>Nominal | CMOS techn<br>Г Model:<br>Conditions: | ology<br>BSIM3  <br>V <sub>DD</sub> =1.8 | Level 49<br>SV T=25°C        | c                        |                                           |
|-----------------------------|---------------------------------------|------------------------------------------|------------------------------|--------------------------|-------------------------------------------|
| Clock<br>Data               | Frequency<br>500 MHz<br>n/a           | Risetime<br>100 ps<br>100 ps             | Falltime<br>100 ps<br>100 ps | Duty Cycle<br>50%<br>n/a | Sequence Length<br>n/a<br>16 clock cycles |
| Da                          | ta 🖂                                  | ≫—♪>                                     |                              | Q                        |                                           |
| Cloc                        | * ⊃—[                                 | >                                        | >c                           | lk                       |                                           |

Fig. 6. The simulation testbench for flip-flops.

In order to compute the local data power and the local clock power, the flip-flop under test is initially disconnected, and power dissipated by the grey inverter and the black inverter are recorded, respectively. The flip-flop is then connected to the testbench for performance analysis. The power consumed by the grey and black inverters are recorded again for this time. Hence, the local data power can be calculated as the difference of the two power dissipations of the grey inverter. Likewise, the local clock power is computed as the difference of the two power consumption values of the black inverter.

#### B. Optimization

Since the transistors' sizes are interrelated, the preliminary stage of the optimization is simplified as follows. For each circuit, the critical path is first identified. The width of the NMOS transistor,  $w_n$ , is then selected as the parameter of interest. The sizing of the PMOS transistors that are located on the critical path is kept at a certain ratio with respect to  $w_n$ . This ratio is determined by balancing the rising and falling edges of the output waveform of a test inverter. Note that this ratio changes with NMOS sizing. Moreover, transmission gates and transistors that are not located on the critical path are implemented with relatively small sizes.

Delay and power are measured as functions of  $w_n$ . The measured power is the sum of all three components discussed earlier, whereas the delay is expressed by  $t_{CQ}$ . Once the power and delay measurements are obtained, the PDP<sub>CQ</sub> is calculated as the product of the power and delay. Subsequently, PDP<sub>CQ</sub> is plotted as a function of  $t_{CQ}$ . The initial PDP<sub>CQ</sub> point is taken as a minimum point of the PDP<sub>CQ</sub> versus  $t_{CQ}$  curve. If the minimum point does not exist, the operating point with the minimum  $t_{CQ}$  for a given energy is selected as the initial PDP<sub>CQ</sub> point to begin the optimization process. Once the initial PDP<sub>CQ</sub> point is determined for each flip-flop, these flip-flops are further optimized using an iterative method, until the best PDP<sub>CQ</sub> and PDP<sub>DQ</sub> are found.

# C. Data Activity $\alpha$

Once the DETFFs are optimized, they are simulated at different data activity rates: 0 (all zero's and all one's), 0.5 and 1. This is to determine the efficiency and performance of each DETFF for a wide range of data activities. As aforementioned, the total power consumption of a DETFF consists of three separate components. Owing to the diverse design styles, these components can vary from flip-flop to flip-flop.



Fig. 7.  $PDP_{CQ}$  versus  $t_{CQ}$ , used to determine the initial optimization point.

As a result, the total power consumption of a flip-flop may change depending on the data activity. Therefore, it is desirable to simulate various DETFFs with different data activities. Results can then determine which DETFF is appropriate for an application weighted toward a particular data activity.

### D. Supply Voltage

The nominal power supply voltage for 0.18  $\mu$ m technology is 1.8 V. However, for battery operated systems, the power supply voltage is reduced drastically to lower the power consumption. Also, an efficient low voltage flip-flop should demonstrate a lower rate of incremental delay as the power supply voltage is reduced. Therefore delay, power, and PDP of all the DETFFs are computed as a function of power supply voltage. Again since the setup time increases with reduced supply voltage, the simulations require relaxed setup time conditions to provide results over a wide range. Hence in this analysis,  $t_{CQ}$  and PDP<sub>CQ</sub> are determined for precise results.

#### V. RESULTS

All five DETFFs studied have been optimized as described in Section IV. It is found that delay decreases as the width increases until the minimum point is reached, if such a point exists. At this point, any further increase in width does not result in any further appreciable decrease in the delay. On the contrary, owing to the increased parasitics associated with the increased width, the delay may increase. On the other hand, for all the DETFFs,  $P_{\text{TOT}}$  increases monotonically as the width increases. PDP<sub>CQ</sub> is then determined by multiplying  $P_{\text{TOT}}$ by  $t_{CQ}$  for the corresponding width. Furthermore, by combining the  $t_{CQ}$  and the PDP<sub>CQ</sub> curves, we can plot PDP<sub>CQ</sub> versus  $t_{CQ}$ , which is illustrated in Fig. 7. These curves represent the first step of the optimization process.

The slopes of the PDP<sub>CQ</sub> curves in Fig. 7 indicate sensitivity of the flip-flops to delay as the width varies. When the  $t_{CQ}$  is small, the PDP<sub>CQ</sub> is large since the total power dominates the product at larger widths. As the width decreases, the power consumption decreases, however the delay is inversely related to the width. This remains true until the local minimum is reached. At this point, both the power and delay increase because of the weakened driver strength. Fig. 7 also depicts the spread of DETFF performance in terms of PDP<sub>CQ</sub> and delay. As shown, the performance of the DETFFs studied are comparable. PDP<sub>CQ</sub> ranges from 30 to 75 fJ and delay ranges from 200 ps to 300 ps.

The initial optimization points are then extracted from Fig. 7 and an iterative process is used to complete the optimization process. The goal of the optimization is to minimize the energy consumption  $PDP_{DQ}$ . The different DETFFs are compared in terms of power, delay, and energy. The final optimal parameters are summarized in Table II. The first column of Table II lists the DETFFs and the second column displays the three components of power dissipation and the total power consumption. The third and fourth columns report the delay and energy consumption, CQ and DQ, respectively. Table III lists the other performance characteristics, such as setup and hold times, maximum data rate and total transistor width. As shown in the tables,  $DET_{pedram}$ consumes the most power, due to an extensively large internal and data power dissipation. This also leads to the highest energy consumption. However, it has the smallest total transistor width.  $DET_{llopis}$  has the largest delay, yet the smallest consumption of clock and data power.  $\mathrm{D}\mathrm{E}T_{\mathrm{g}\mathrm{ago}}$  consumes the least internal and total power, thence the least energy. DET<sub>strollo</sub> consumes the most clock power, yet this does not affect its overall performance compared to the other DETFFs studied.  $DET_{proposed}$  has the smallest delay, but it requires the largest total width.

After the DETFFs are optimized, they are simulated at different data activity rates. The results are shown in Fig. 8. In general, applications with  $\alpha = 1$ , exhibit the largest total power consumption. Clock power dissipation is rather constant over all data activity rates. Data and internal power consumption increase as the data activity increases. One exception is  $\mathrm{DET}_{\mathrm{p\,edram}}.$  Where the data sequence consists of all zeros, the internal power is remarkably large. For the case of all ones, the internal power, on the other hand, is especially small, whereas the data power is notably larger. However, the data power at  $\alpha = 0.5$ and  $\alpha = 1$  are almost the same. Furthermore, DET<sub>pedram</sub> demonstrates the worst power consumption at all data rates, except when  $\alpha = 1 \text{ DET}_{gago}$  is the best in terms of power dissipation, at all different data rates. The total power consumption of  $DET_{llopis}$  is very close to  $DET_{gago}$  in all data activity.  $DET_{proposed}$  has similar power consumption as  $DET_{gago}$ , except in the case of  $\alpha = 1$ , in which it exhibits a substantially large internal power dissipation.

The performance of DETFFs under reduced voltage conditions is depicted in Figs. 9-11. Fig. 9 plots total power consumption of DETFFs as a function of supply voltage.  $\mathrm{DET}_{\mathrm{gago}}$  exhibits the lowest power consumption.  $\mathrm{DET}_{\mathrm{proposed}}$  shows the second lowest power consumption at low supply voltage.  $DET_{llopis}$  has the second best power dissipation near nominal supply voltage, however by the time supply voltage drops to 1.4 V, it starts to exceed that of  $DET_{proposed}$ . The worst power consumption is exhibited by  $DET_{pedram}$ . The power consumption curve of DET<sub>strollo</sub> is somewhat misleading, since it fails to function below 1.3 V. Fig. 10 depicts the  $t_{CQ}$  of DETFFs as a function of supply voltage. The  $DET_{proposed}$  exhibits the lowest delay. On the other hand, DET<sub>strollo</sub> demonstrates the worst delay and quickly fails to latch below 1.3 V. All the other DETFFs have similar delay at all supply voltages tested. Fig. 11 plots the  $PDP_{CQ}$  as a function of supply voltage. The best energy consumption versus supply voltage is seen from the proposed DETFF, but  $DET_{gago}$  is comparable.  $DET_{pedram}$ and DET<sub>strollo</sub>, have similar energy dissipation at half of the nominal supply voltage. The results are further summarized in Table IV.

#### VI. DISCUSSION AND CONCLUSION

 $DET_{pedram}$  consumes the most data power in this study. It is found that the high data and internal power dissipation is a result of the positive feedback of the transmission gate loop at the input end of the flip-flop. In the feedback path of the latches, the input data controls the passing of the clock signals. For instance from Fig. 3, when D = 0and clk = 1, M1 turns on. Hence, Node A discharges to 0 and Node B

TABLE II Optimal Parameters for DETFFs Studied

| Cell         | Clock Power<br>(µW) | Data Power<br>(µW) | Internal Power<br>(µW) | Total Power<br>(µW) | t <sub>CQ</sub><br>(ps) | PDP <sub>CQ</sub><br>(fJ) | t <sub>DQ</sub> PDP <sub>DQ</sub><br>(ps) (fJ) |
|--------------|---------------------|--------------------|------------------------|---------------------|-------------------------|---------------------------|------------------------------------------------|
| sDETpedram   | 17.6                | 65.6               | 241.7                  | 324.9               | 233.1                   | 75.7                      | 245.3 79.7                                     |
| sDETllopis   | 17.0                | 4.6                | 153.4                  | 175.0               | 237.5                   | 41.6                      | 312.3 54.7                                     |
| sDETgago     | 23.2                | 11.6               | 131.4                  | 166.2               | 202.2                   | 33.6                      | 262.2 43.6                                     |
| sDETstrollo  | 30.0                | 13.4               | 194.5                  | 237.8               | 214.4                   | 51.0                      | 235.3 56.0                                     |
| sDETproposed | 18.1                | 10.9               | 189.4                  | 218.4               | 161.3                   | 35.2                      | 230.5 50.3                                     |

TABLE III PERFORMANCE CHARACTERISTICS FOR DETFFS STUDIED

| Cell         | Setup (ps) | Hold (ps) | Max. Data Rate (GHz) | Total Width (µm) |
|--------------|------------|-----------|----------------------|------------------|
| sDETpedram   | 17.9       | 34.0      | 1.75                 | 23.0             |
| sDETllopis   | 80.3       | -15.7     | 2.22                 | 37.7             |
| sDETgago     | 49.5       | -5.7      | 2.63                 | 44.6             |
| sDETstrollo  | -41.4      | 85.9      | 2.22                 | 40.5             |
| sDETproposed | 76.9       | -5.1      | 1.56                 | 56.1             |



Fig. 8. Power consumption dependence on data activity rates for DETFFs.



Fig. 9. Power consumption dependence on supply voltage.

switches to 1. Node B then switches M2 on. As a result, M1 and M2 attempt to write 0 and  $(V_{DD} - V_{tn})$  voltages simultaneously onto Node A. This voltage conflict is present until the clock changes state. Such a conflict results in a degraded noise margin. This has two implications. First, this structure allows large current to flow through the transmission gates at the input. Second, the degraded voltage level at Node A also causes a direct path current in the subsequent inverters. Hence, large data and internal power dissipation results. In addition, both data power and internal power depend on the data level rather than the data



Fig. 10.  $t_{CQ}$  as a function of supply voltage.



Fig. 11. PDP dependency as a function of supply voltage.

activity. When D = 0, NMOS pass gates are active through the input loop, while the PMOS is active in the inverter that follows the loop. The opposite is true for D = 1. In either case, PMOS transistors draw more current. The all 0's and all 1's cases are extreme examples of this effect. Despite the large data power consumption, its clock power dissipation is small because of the local clock buffers. The absence of local data buffers bring into question the robustness of the flip-flop. The transparent nature of the pass gates fails to secure unidirectional data flow. Furthermore, its energy consumption at low supply voltage is approximately twice as high as the proposed DETFF. Hence, the usage of DET<sub>pedram</sub> in low voltage and low power applications is not recommended.

 $DET_{llopis}$  has the best clock and data power dissipation. Its clock power consumption is low because of the small clock capacitance, whereas its data power dissipation is low due to the use of an inverting input buffer. Despite the fact that it has one of the smallest power consumptions at all data activity, it has the longest delay at nominal voltage

TABLE IV SUMMARY OF DETFF PERFORMANCE AS  $V_{DD}$  Reduces

| CQ-delay and PDP as a function of supply voltage with relaxed setup time |             |                      |               |                     |                      |               |                     |                      |                       |
|--------------------------------------------------------------------------|-------------|----------------------|---------------|---------------------|----------------------|---------------|---------------------|----------------------|-----------------------|
|                                                                          | Vdd = 0.9V  |                      |               | Vdd=1.3V            |                      |               | Vdd=1.6V            |                      |                       |
|                                                                          | $t_{CQ}(s)$ | P <sub>TOT</sub> (W) | $PDP_{CQ}(J)$ | t <sub>CQ</sub> (s) | P <sub>TOT</sub> (W) | $PDP_{CQ}(J)$ | t <sub>CQ</sub> (s) | P <sub>tot</sub> (W) | PDP <sub>CQ</sub> (J) |
| DETpedram                                                                | 734.1E-12   | 77.3E-6              | 56.7E-15      | 329.5E-12           | 172.2E-6             | 56.7E-15      | 244.2E-12           | 257.9E-6             | 63.0E-15              |
| DETIlopis                                                                | 762.8E-12   | 75.4E-6              | 57.5E-15      | 350.7E-12           | 117.2E-6             | 41.1E-15      | 264.8E-12           | 152.4E-6             | 40.4E-15              |
| DETgago                                                                  | 721.2E-12   | 37.0E-6              | 26.7E-15      | 335.3E-12           | 89.1E-6              | 29.9E-15      | 253.3E-12           | 143.1E-6             | 36.2E-15              |
| DETstrollo                                                               | failed      | failed               | failed        | 932.4E-12           | 118.4E-6             | 110.4E-15     | 262.2E-12           | 183.2E-6             | 48.0E-15              |
| DETproposed                                                              | 445.6E-12   | 51.2E-6              | 22.8E-15      | 233.7E-12           | 111.7E-6             | 26.1E-15      | 180.0E-12           | 174.8E-6             | 31.5E-15              |

since the data must propagate through the most logic stages compared to the other DETFF configurations. This leads to a comparatively large energy consumption at nominal condition. As a function of supply voltage, its total power consumption drops at a much lower rate and its delay rises at a slightly higher rate, compared to other DETFFs studied. Hence, it results in a higher energy consumption at low voltage. Therefore, its application for low voltage conditions is limited and its best energy consumption is seen around 1.5 V.

 $DET_{gago}$  is found to be the most energy efficient DETFFs in all circumstances under nominal conditions in this study. Its superior low power performance is mainly due to the complete isolation of the elements when they are not in use. Its low power application is demonstrated. Under low supply voltage condition, although it has the lowest power consumption, but its delay is relatively higher than that of the proposed DETFF. It results in a slightly higher energy consumption than  $DET_{proposed}$  at low supply voltage.

 $\mathrm{DET}_{\mathrm{strollo}}$  consumes the largest clock power because of the chain of internal clock buffers. The delay through these clock buffers defines the activation pulse for the flip-flop. The definition of the activation pulsewidth is crucial to its operation. As the supply voltage reduces, the activation pulsewidth varies that causes the delay to increase at a much higher rate. The delay rapidly approaches the clock pulsewidth, hence it fails to latch the input data anymore. Therefore, it is not suitable to use in low voltage environment.

 $\mathrm{DET}_{\mathrm{proposed}}$  has superior delay because the use of NMOS transistors and the avoidance of PMOS transistor stacking in its design. However, its inferior slew rate leads to an especially prominent power consumption at high data rates. As a result, its overall energy consumption at nominal condition is close to  $\mathrm{DET}_{\mathrm{gago}}$  which has the lowest energy dissipation. In reduced supply voltage condition,  $DET_{proposed}$ has the second best power consumption and the best delay. Therefore, the best energy consumption at low-supply voltage results. Hence, it has promising usage in low-energy and low-voltage applications.

The proposed design is an attempt to design a low voltage DETFF. Although  $DET_{proposed}$  can achieve good performance, it is found that the complete isolation of the deactivated elements, as in the case of  $DET_{gago}$ , is a key to low power dissipation. However,  $DET_{proposed}$ has been shown to operate the most efficiently at low supply voltage. Hence, the proposed DETFF is recommended for further research in low power low voltage subsystems.

# REFERENCES

- [1] V. De and S. Borkar, "Technology and design challenges for low power and high performance," in Proc. 1999 Int. Symp. Low Power Electronics and Design, 1999, pp. 163-168.
- [2] B. Davari, R. Dannard, and G. G. Shahdi, "CMOS scaling for high performance and low power-The next ten years," Proc. IEEE, vol. 83, pp. 595-606, Apr. 1995.
- [3] R. P. Llopis and M. Sachdev, "Low power, testable dual edge triggered flip-flops," in 1996 Int. Symp. Low Power Electronics and Design, 1996, pp. 341-345.
- [4] S. L. Lu and M. Ercegovac, "A novel CMOS implementation of double-edge-triggered flip-flops," IEEE J. Solid-State Circuits, vol. 25, pp. 1008-1010, Aug. 1990.
- [5] R. Hossain, L. D. Wronski, and A. Albicki, "Low power design using double edge triggered flip-flops," IEEE Trans. VLSI Syst., vol. 25, pp. 261-265, June 1994.
- [6] D. M. Brooks, P. Bose, S. E. Schuster, H. Jacobson, P. N. Kudva, A. Buyuktosunoglu, J. Wellman, V. Zyuban, M. Gupta, and P. W. Cook, "Power-aware microarchitecture: Design and modeling challeges for next generation microprocessors," IEEE Micro, vol. 20, no. 6, pp. 26-44. Nov.-Dec. 2000.
- [7] S. J. Abou-Samra and A. Guyot, "Performance/complexity space exploration: Bulk vs. SOI," in PATMOS '98, Int. Workshop-Power and Timing Modeling, Optimization and Simulation, Oct. 7-9, 1998.
- [8] V. Stojanovic and V. G. Oklobdzija, "Comparative analysis of masterslave latches and flip-flops for high-performance and low-power systems," IEEE J. Solid-State Circuits, vol. 34, pp. 536-548, Apr. 1999.
- A. Chrakasan, W. Bowhill, and F. Fox, Eds., Design of High-Perfor-[9] mance Microprocessor Circuits. Piscataway, NJ: IEEE Press, 2000.
- [10] A. Gago, R. Escano, and J. A. Hidalgo, "Reduced implementation of D-type DET flip-flops," IEEE J. Solid-State Circuits, vol. 28, pp. 400-442, Mar. 1993.
- [11] M. Pedram, Q. Wu, and X. Wu, "A new design of double edge triggered flip-flops," in Proc. ASP-DAC '98 Asian and South Pacific Design Automation Conf. 1998, 1998, pp. 417-421.
- [12] A. G. M. Strollo, E. Napoli, and C. Cimino, "Low power double edgetriggered flip-flop using one latch," Electron. Lett., vol. 35, no. 3, pp. 187-188, 1999.
- [13] M. Afghahi and J. Yuan, "Double edge-triggered D-flip-flops for high-speed CMOS circuits," IEEE J. Solid-State Circuits, vol. 26, pp. 1168-1070, Aug. 1991.
- [14] S. H. Unger, "Double-edge-triggered flip-flops," IEEE Trans. Computers, vol. C-30, pp. 447-451, June 1981.

# ACKNOWLEDGMENT

The authors would like to acknowledge graduate students M. El-Gebaly, M. Nummer, and B. Chatterjee for their valuable input.