A 2.5-ps Bin Size and 6.7-ps ResolutionFPGA Time-to-Digital Converter Basedon Delay Wrapping and Averaging

Abstract:

A high-resolution time-to-digital converter (TDC) implemented with field programmable gate array (FPGA) based on delay wrapping and averaging is presented. The fundamental idea is to pass a single clock through a series of delay elements to generate multiple reference clocks with different phases for input time quantization. Due to periodicity, those phases will be equivalently wrapped within one reference clock period to achieve the required fine resolution. In practice, a hybrid delay matrix is created to significantly reduce the required number of delay cells. Multiple TDC cores are constructed for parallel measurements and then exquisite routing control and averaging are applied to smooth out the large quantization errors caused by the inhomogeneity of the TDC delay lines for both linearity and single-shot precision enhancement. To reduce the impact of temperature sensitivity, a cancellation circuit is created to substantially reduce the offset and confine the output difference within 2 LSB for the same input interval over the full operation temperature range of FPGA. With such a fine resolution of 2.5 ps, the integral nonlinearity is measured to be from merely −2.98 to 3.23 LSB and the corresponding rms resolution is 4.99–6.72 ps. The proposed TDC is tested to be fully functional over 0 °C–50 °C ambient temperature range with extremely low resolution variation. Its performance is even superior to many full-custom-designed TDCs The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

Existing System:

Conventionally, TDC with subnanosecond resolutioncan be realized with emitter-coupled logic (ECL) whichis not only power consuming but also area consumingand unsuitable for portable systems or integrated chips. Many different techniques have been developedin order to achieve a high resolution and a widemeasurement range such as time-to-amplitude conversion,Vernier principle, time stretching, and time interpolation.In theory, the simplest implementation of TDC is a highfrequency countersthose increments every clock cycle.3-ps incremental resolution is achieved with the help of atime-consuming statistical method: the lookup table (LUT).Meanwhile, multistage interpolation can be appliedstraightforwardly to obtain a wide measurement range whilekeeping high resolution at the same time. Fig. 1 shows theconceptual timing diagram of the two-stage time interpolationtechnique based on the classic Nutt method. Theinput interval Tin is segmented into T12, T1, andT2. T12 issynchronous with the reference clock CLK and can be readilydigitized by a coarse counter whileT1 andT2 with durationless than one clock period TCLKare processed by fine TDCsor interpolators with resolutions much smaller thanTCLK. Tincan be measured as

Tin =T12+T1−T2.                                                                                                                   (1)

Since the interpolator dominates the effective resolution ofTDC, many structures are created to enhance its accuracy.The most commonly used are tapped delay line, pulsestretcher (dual-slope conversion), pulse shrinking, andVernier delay line (differential delay line) to achievesubgate delay resolution. After tens of years of evolution, it isstill a challenge for experienced designers to accomplish aneffective resolution better than 10 ps for TDCs. More subtletechniques are required. Time amplification is adopted toimplement a TDC with 9 b, a 1.25-ps bin size, and an outputstandard deviation of<1 LSB. The measured differential nonlinearity (DNL) and integral nonlinearity (INL) are 0.8 LSBand 3 LSB, respectively, with a limited dynamic range. Cyclictime-domain successive approximation is created to geta 1.2-ps resolution and a 327-μs dynamic range. The RMSsingle-shot precision is 3.2 ps achieved using an externalINL-LUT for the interpolators. Vernier ring is invented togenerate an 8-ps LSB width with an output standard deviationof <1 LSB also. The performance is further improved bya gated Vernier ring structure to realize an equivalent resolution of 3.2 ps with an oversampling ratio of 16.An 8-b cyclic TDC is proposed to achieve a 1.25 psLSB width, a±0.7 LSB DNL, and a−3to+1LSBINL.To enhance dynamic accuracy for applications with periodicTDC input, time-domain delta sigma modulation for noiseshaping is adopted to get an effective resolution around 6 ps.

Disadvantages:

  • worst performance

Proposed System:

Assuming thatnwrapped phases are uniformlydistributed in one reference clock period, the bin size of theTDC can be calculated as

LSB=TCLK/n=1/n×f                                                                                                              (2)

During circuit implementation, the pulse-shrinking/stretching mechanism caused by the aspect ratio mismatcha adjacent devices will limit the realizable length ofthe clock delay line [36]. To accomplish picosecond-levelresolution, at least hundreds of delay cells are required.After being fed into such a long delay line, the duty cycleof high-frequency reference clock will be either shrunk orstretched to be 0% or 100% before reaching the end ofdelay line. No delayed clock signal will be generatedfor the rest of the delay stages after the duty cycle reaches0% or 100% to ruin the TDC accuracy. In theory, the clockfrequency can be lowered to get a larger pulse width toensure that the reference clock can propagate to the endof delay line. However, the delay line must be lengthenedcorrespondingly to maintain the same resolution as revealed by (2). The impact of pulse-shrinking/stretching mechanismis proportionally increased to spoil the effectiveness of clockfrequency lowering. On the contrary, the input signal canbe made with a larger pulse width than the reference clockand fed into the delay line instead to solve the dilemma.The conceptual timing diagram is shown in Fig. 1. Since allthe wrapped clocks quantize the same input signal, Tin canbe duplicated in theory so that each clock can be paired upwith one specific input signal (e.g.,Ci withTin,i)as depictedin Fig. 1(a). Then, we can align all clocks while shifting theinput signals accordingly to keep exactly the same timingrelation between each pair of signalsCi andTin, I in Fig. 4(b).Equivalently,Tinis fed into the same delay line and then alldelayed input signals are quantized by the same referenceclock to get the same output for the proposed TDC. Theexpense is long dead time since only when Tin propagatesto the last delay stage can the TDC get the final conversionoutput.

Figure 1: Timing diagram with (a) delayed clocks and (b) delayed inputs

Another problem is raised by the above modification todelayTininstead of CLK. For much fine resolution, the inputdelay line is expected to be very long with significant pulseshrinking/stretching impact which limits the smallest measurable width of the input signalTin. Consequently, a large TDCoffset can be expected. To reduce the offset and logic utilization, a delay matrix with multiple short delay lines can be usedfor a single input Tin to generate enough number of delayed signals as revealed in Fig. 2(a). In theory, different delay cellsor strict timing constraints need to be adopted for vertical andhorizontal delay lines to make sure the maximum uniformitycan be realized a the wrapped phases. Since both Tinand CLK can be delayed to generate the required phase shifts,a hybrid delay matrix or the so-called 2-D Vernier is thusconstructed to substantially reduce the number of delay cellsfrom approximateH×Vto H+Vas shown in Fig. 2(b).

Figure 2: (a) Delay matrix. (b) Hybrid delay matrix

One feasible way to evenly distribute the phases areference clocks is to use FPGA embedded multi-output phaselocked loop (PLL) for phase division as depicted in Fig. 3.There is only oneH-stage delay line used.

Figure 3: Hybrid delay matrix withPLL for clock phase division.

Advantages:

  • Better performance

Software implementation:

  • Modelsim
  • Xilinx ISE