WO2001013562A2

WO2001013562A2 - A high speed burst-mode digital demodulator architecture

Info

Publication number: WO2001013562A2
Application number: PCT/US2000/017873
Authority: WO
Inventors: Soheil I. Sayegh; James R. Thomas
Original assignee: Comsat Corporation
Priority date: 1999-08-13
Filing date: 2000-08-01
Publication date: 2001-02-22
Also published as: WO2001013562A3

Abstract

A unique demodulator device structure is disclosed which includes methods for enhancing the reliability of processing digital data at high speeds, while in some cases reducing the amount of hardware required for such structures. Parallel data paths within several common functions of the overall demodulator design allow for reducing the rate at which data is processed within each such data path (e.g., Fig. 3, paths 0, 1, 2 and 3) to a rate lower than that at which the data was received. Subsequently, the data may be passed on at the original received data rate or processing may be completed at a lower rate. The reduction in hardware typically results in reductions in mass, size, and power for the final demodulator while simultaneously allowing the demodulator to work at speeds not otherwise possible.

Description

A HIGH SPEED BURST-MODE DIGITAL DEMODULATOR ARCHITECTURE

BACKGROUND OF THE INVENTION

The present invention relates generally to a high speed digital communication device and a method used therein for processing large amounts of digital data at high speeds. More specifically, the invention relates to a high speed demodulator utilizing unique parallel data structures to enhance the speed and reliability of the digital circuits used to effect data transfer and at the same time, in some cases, reducing the amount of hardware that would otherwise be required to perform certain parallel demodulator functions.

Historically, when transmission rates were much slower, e.g., much slower than one hundred megahertz, digital demodulators were commercially available that allowed direct transmission, reception and processing of the digital data. As transmission rates increased, however, the reliability of the digital circuits used to effect data transfer within a demodulator began to decrease. In digital communications, each bit of digital data is "clocked" independently at all data processing stages. As the demand for information increases, larger amounts of digital data, transmitted in ever decreasing amounts of time, are required. Consequently, the maximum rate at which each device must operate becomes a limiting factor in the digital system. If any one of the devices in a particular system is unable to successfully process each bit of data within a single clock cycle as the data progresses through the system, the data becomes corrupted. Hence, the transmission or reception system fails.

To address this problem, it has been considered to separate incoming data into a number of individual parallel paths, each path operating at a significantly slower rate than the rate at which the original d?ta was received. This technique is appropriately referred to as

'parallelism'. As a stream of data samples is received at the front end of the system, each sample is alternately directed to one of the individual but similar paths. Each path processes its respective data and the resultant data is then recombined at the back end of the system before it is passed on for further processing by the system, possibly at the original, higher, rate. Straight parallelism, as this technique is called, however, is typically subject to several problems. For instance, simply splitting the data into a fixed number of separate paths requires additional hardware to accommodate each path. Oftentimes, the additional hardware required is either financially prohibitive or the usable space and/or allowable weight budget for the particular application does not provide for the additional devices.

For example, in the case of a demodulator using a Finite Impulse Response (FIR) filter, one approach previously considered was to split the incoming data into four paths and process each resultant data stream in a separate conventional FIR filter structure, each with coefficients shifted by one sample and spaced four samples apart. This approach is well suited for use with commercially available filters and conventional filter structures. A serial data stream can be "clocked-in" at the rate at which it was received from an Analog to Digital (A/D) converter, called the sample rate, and then the data could be diverted to, for example, four separate paths. Each path representing, for example, a different partial component of the filter response.

Within each of the four paths, a bank of commercially available FIR filter devices, operating at a slower rate than the sample rate, are cascaded together, each device performing a filter function by convolving a collection of filter coefficients with the data samples in sequence as they arrive. As the individual filter operations are completed, the results of each of the four filter paths are summed, again at the slower rate. Ultimately, the output of the summing devices are applied to a commutating device whereby the data from the four paths are arranged sequentially and output at the original sample rate as 'full-rate' filtered data. While this approach

has the benefit of minimizing circuitry operating at full speed, the number of logic gates used to carry out the desired function is increased significantly. The number of gates used is a parameter associated with size, power and weight of the needed hardware and is similarly minimized in most applications.

Another conventional approach to achieving demodulation of transmitted signals is to use analog devices. This approach pre-dates the method discussed above and is undesirable for different reasons. Traditionally, analog demodulators have been used to demodulate signals where the data rate is too high for a practical digital demodulator to be implemented. Analog demodulators, however, have several disadvantages, such as non-repeatable performance, nonlinear phase characteristics, and high cost, driven primarily by the requirement for experienced technicians to individually tune each device.

SUMMARY OF THE INVENTION

In view of the aforementioned problems with the conventional approach to high speed signal demodulation, the present invention is directed to a method and circuit for performing high speed demodulation of modulated carriers used primarily in the area of digital communications. In particular, the invention includes improved methods of paralleling hardware to reduce the operating speed of the required circuits where operation at full speed is not practical, and in some cases, to reduce the amount of hardware required as well.

In one embodiment of the present invention a novel derivation of the process described above with respect to the FIR filter function of a demodulator is provided wherein digital data is processed at rates lower than the original sample rate, but with the additional advantage of reduced logic gate counts and, therefore, reduced size/weight and power dissipation.

Further, in applications where the coefficients used in the filtering function are fixed, this approach lends itself to further reductions in the gate counts of the associated multipliers.

In another embodiment of the invention, a method and circuit for paralleling another demodulator function/device known as a correlator is presented. A correlator is often used in burst mode demodulation where a unique word (UW) is used to acquire clock and carrier synchronization as well as to identify the start of frame marker. As in the case of the parallel FIR filter structure mentioned above, the correlator design in accordance with the invention is suitable for operational speeds greater than those that were previously practical when a straight implementation of a correlator was utilized.

In another embodiment of the invention, the parallel data structure is implemented in a demodulator clock phase tracking loop in order to achieve higher operating speeds.

In yet another embodiment of the invention, the parallel data structure is implemented in a demodulator carrier phase tracking loop in order to achieve higher operating speeds.

Lastly, in a final embodiment of the invention, the parallel data technique is implemented as a demodulator carrier loop phase interpolator, which is used to produce the final phase values for application to a conventional parallel implementation of a complex phase rotator function.

The embodiments mentioned above, and described in more detail hereafter, are typically used in combination with certain other required elements of a complete demodulator structure, such as loop filters, data detectors, etc. The structure disclosed herein is done so in a manner which efficiently uses and maintains the parallelism method disclosed, thereby maintaining the ability of the overall demodulator to operate at reduced clock rates, maintain circuit reliability, and use less hardware than other parallel implementations.

The present invention is important because, due to the high data transmission rates anticipated for digital communication, a straightforward implementation of a digital demodulator architecture, such as the one shown in Figure 1, is not possible. In low to moderate speed demodulators, the received signal is typically sampled at two to four times the symbol rate (Rsym) in order to meet the well known Nyquist criterion. [Note: The symbol rate is the frequency at which symbols are transmitted. Symbols are the smallest units in which data is sent across a communications system. BPSK systems send 1 bit per symbol, QPSK systems 2 bits per symbol, and so on.] The received data is then processed serially at the sampling rate. For high speed demodulators, however, such as the one disclosed herein, the received data signal is sampled at, for example, twice the symbol rate and then the samples are processed in several parallel paths such that the clock speed required by devices subsequent to the initial sampling generally does not exceed one half the symbol rate. Therefore, the parallel architecture described herein allows for digital demodulation of data transmitted at rates that would otherwise not be feasible.

BRIEF DESCRIPTION OF THE DRAWINGS

The object and features of the present invention will become more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a block diagram illustrating the basic data flow and architecture of a demodulator circuit.

FIG. 2 is a block diagram illustrating the basic data flow and architecture of a demodulator circuit according to the conventional parallelism structure as well as the present invention.

FIG. 3 is a block diagram illustrating an 8-tap filter operation in accordance with the present invention.

FIG. 4 is a block diagram illustrating the operation of an 8-tap filter operation in accordance with the present invention when the filter coefficients are fixed.

FIG. 5 is a block diagram illustrating the operation of one possible conventional parallel FIR filter.

FIG. 6 is a block diagram illustrating the operation of a transposed form correlator operation in accordance with the present invention. FIG. 7 is a block diagram illustrating the operation of a transposed form correlator operation in accordance with the present invention as the clock speed is reduced from full speed to half speed.

FIG. 8 is a block diagram illustrating the operation of a direct form correlator operation in accordance with the present invention.

FIG. 9 is a block diagram illustrating the operation of a direct form correlator operation in accordance with the present invention as the clock speed is reduced from full speed to half speed.

FIG. 10 is a block diagram illustrating the operation of a clock phase tracking loop operation in accordance with the present invention.

FIG. 11 is a block diagram illustrating the operation of a carrier phase tracking loop operation in accordance with the present invention.

FIG. 12 is a block diagram illustrating the operation of a conventional parallel carrier phase rotator operation.

FIG. 13 is a block diagram illustrating the operation of a carrier phase interpolator operation in accordance with the present invention.

FIGS. 14a-14c are block diagrams conceptually illustrating the differences between the conventional method of constructing a non-decimating FIR filter, a conventional decimating FIR filter and a decimating FIR filter in accordance with the present invention, respectively.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Figure 2 illustrates the general architecture of a digital demodulator wherein a parallel data structure is utilized. It should be noted that the parallel data structure shown in Fig. 2 is in accordance with both a conventional parallel data structure, as discussed previously, as well as a parallel structure in accordance with the present invention. The invention is realized within the details of several blocks referenced in Fig. 2, each of which will be described in detail herein. The parallel structure illustrated is typically implemented in an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA), however, implementation within such devices is not necessarily required, for example, discrete components might also be used. The figure illustrates the basic method for maintaining parallelism between the various building blocks of the demodulator.

In the example illustrated, digital data is received into the demodulator from an A D converter (not shown) already divided into parallel paths. The data received is In-phase (I) and Quadrature (Q) data and has been previously commutated to a rate equal to, for example, one half the symbol rate.

Initially, the data may be filtered using a FIR filter ('Matched Filter' in Fig. 2) configured in accordance with the invention. Fig. 3 illustrates the operative details of a FIR filter in accordance with the invention with respect to the filtering process of I data. A similar configuration can be used for Q data, but is not shown. In Fig. 3, the I data has been received at a receiving rate and then divided into four separate paths, each path accommodating every fourth data sample, e.g. 10, 14, 18 etc. for one path and II, 15, 19 etc. for a second path and so on. This process of steering samples into alternating paths is generally called commutation. The example filter configuration shown illustrates an 8-tap filter with 8 different coefficients (AO through A7) corresponding to each respective tap. However, more or less taps could also be used. As data samples are received into the four paths, or processing paths, the samples progress sequentially into individual data registers represented in Fig. 3 by boxes with respective numbers within each box. Further, because the data is now in a parallel paths, the rate at which data progresses within each path can be significantly lower than the receiving rate at which the data was originally clocked into the device.

The numbers within each rectangular box along each path in Fig. 3 represent the respective data sample located within that particular register during a single representative clock cycle. Data is shifted during subsequent clock cycles and, as time progresses, each coefficient (A_n) is multiplied, using a respective multiplier, by each data sample in sequence as the data progresses through the series of shift registers. The process continues until all the results required for the given filter function are determined. The results for each respective path, marked phase 0, 1, 2 and 3 in the drawing, are then summed accordingly, to obtain four respective values.

Each of the four respective paths illustrated in Fig. 3 constitute only a small section of the respective data path existing in the overall filter structure. For instance, each path shows only 2 or 3 representative shift registers. However, one of ordinary skill in the art would be aware that a typical filter design might contain many additional shift registers and, hence, many additional multipliers attached to each register in order to accommodate the intended filter function. The limited length of each path shown is intended to simplify the representation of the invention.

As shown in Fig. 3, in accordance with the desired representative filter function, multiple coefficients can be multiplied by a single data sample simultaneously as opposed to the prior art method wherein each data sample, corresponding to each register, is multiplied by a single coefficient during any single clock cycle. This prior art method is shown, for example, in Fig. 5. In accordance with the method of Fig. 3, as a result of the parallel configuration of the data paths, and arranging the multipliers accordingly, it is possible to multiply data sample 14 by coefficients Al, A2, A3 and A4 simultaneously. By combining the number of coefficients applied to each data sample in this manner, the total number of registers needed to accomplish the intended filter function is reduced. For instance, in the aggregate, the number of registers needed to perform the filter function represented in Fig. 3 is reduced four- fold over what would have been required in a similar prior art structure as illustrated in Fig. 5. This reduction in hardware is owed to the unique configuration of the multipliers in addition to the separation of the data into four independent paths, i.e. parallelism. Note that other degrees of parallelism are also possible. For example, if eight paths are used, there would be eight multipliers at each register and so forth. Such variations are to be considered within the scope of the invention.

The reduction in required registers becomes even more apparent when the invention is applied to a particular well known variation of a FIR filter known as a decimating FIR filter. The differences between a non-decimating FIR filter and a decimating FIR filter are illustrated in Figs. 14a-14c. In this situation, the number of registers, or perhaps some other delay element such as a FIFO, located between each tap of the filter is typically much higher in the decimating configuration than in a non-decimating configuration. Typically, in the case of a decimating filter, there will be 2ⁿ shift registers between each tap, where n is the decimation factor. For example, if n=6, there will be 64 shift registers between each tap. Therefore, for a filter design in accordance with the prior art configuration, there would be 64 shift registers between each multiplier. If, on the other hand, the filter were configured in accordance with the invention, wherein multiple multipliers correspond to each tap, the number of registers required for the overall filter design would be even more significantly reduced than for the previous, non-decimating, example. For instance, if a particular filter design contained 16 taps, and the decimation factor were 64, the number of registers saved by implementing the invention would be 768; [(16 * 64) - (16 * 64)/4].

Figure 4 shows another embodiment of the invention wherein a reduction in the number of full multipliers required to implement a given FIR filter may be realized if certain conditions are met. Specifically, when the coefficients required for the filter are not expected to vary over time, i.e., are fixed, the multipliers may be combined as shown with an attendant savings in gate count. This savings will depend on the exact values of the coefficients and will vary, accordingly, from a small savings to, in some cases, significant savings.

These savings are obtained by computing the required partial products for all four multipliers only once and then summing them individually to produce each product. This method only works here, in conjunction with the invention, because of the characteristic of the invention in which four multipliers can be attached to a single tap. This method requires that the multipliers be custom designed for each particular coefficient set and, hence, is best suited for application in an ASIC, however an FPGA could also be used.

In accordance with another embodiment of the present invention, a correlator is built using a similar parallel data structure as discussed above.

A correlator, as shown generally in Fig. 2, is a well known device which compares input data streams with expected patterns. Figure 6 shows a correlator in accordance with the invention wherein the input data stream has been commutated into 4 input paths for I and Q data, respectively. The outputs of these parallel paths are summed and detection of the expected pattern is determined by comparing the results to threshold values. When the threshold value is exceeded, a match is declared. The summation and comparison must operate at the sample rate, however, the processing in the parallel path, which is far and away the bulk of required computation, proceeds at the lower rate (Rsym/2 in this example), thereby significantly easing circuit requirements.

The process of determining the precise time at which the unique word has been received is important because by establishing this time, various other signal parameters can be determined, such as the clock timing and the carrier phase and level. Determination of these parameters is known generally as "acquisition" of the data signal and is performed within the acquisition processor, however, this function is otherwise outside the scope of the invention and will not be discussed further.

The parallel implementation of the correlator may be applied in either the "Transposed" form as shown in Figure 6 or in the "Direct" form as shown in Figure 8. Figures 7 and 9 simply show the evolution from the conventional non-parallel implementation to the parallel version shown in Figures 6 and 8.

Another embodiment of the present invention is shown in Fig. 10 wherein an input circuit to the clock phase tracking loop shown generally in Fig. 2 is illustrated. In Fig. 2 it is shown that data obtained from the data and feedback mapping portion of the demodulator in accordance with the present invention is received in order to close the clock loop. According to the detailed representation of Fig. 10, by implementing the parallel data structure of the present invention, the error detection function can be run at rates less than the sample rate. After the error detection is complete, the results are summed prior to further processing, which may also proceed at a lower rate.

Similarly to the input stage of the clock phase tracking loop, the present invention can be used to perform error detection within the carrier phase tracking loop. As shown in Fig. 11, alternating samples are input to two individual error detectors and the results of each are then summed prior to further processing. Due to the parallel data structure, it is possible to process the data at one half the symbol rate, thus, achieving the general benefit of the invention.

Finally, Figure 12 illustrates the phase rotator operation consisting of 4 complex multipliers operating at half the symbol rate each to provide the total throughput of twice the symbol rate, as required. Note that the rate of update from the table lookup function is performed at half the symbol rate. This is straightforward parallelism, however under certain circumstances a novel variation may be applied as follows.

When the frequency offset of the carrier is large and the change in phase is significant from sample to sample the update rate has to be effectively as high as 2*Rsym, i.e., the full sample rate. In this event, the lookup tables may have to be paralleled as well in order to achieve the required speed of operation with practical memory devices.

In order to produce the phase estimates for these lookup tables without running the carrier loop circuitry at the higher rate, or using parallel hardware throughout the carrier processing loop, another method of generating the phase estimates is envisioned. In this method, shown in Figure 13, a linear interpolator is added to generate the four required lookup table addresses from the phase estimates produced by the carrier loop at Rsym/2. This method produces the required estimates for input to the lookup tables with a minimum of additional circuitry while adding only a small amount of degradation due to the linear interpolation. With this method, four individual lookup tables are required. In many newly proposed systems, the carrier frequency offset is controlled tightly enough to eliminate the requirement for this feature. Nevertheless, it is considered within the scope of the invention.

Claims

WHAT IS CLAIMED IS:

1. A high speed digital communication device comprising:

a receiver receiving digital data at a receiving rate and separating said digital data into a plurality of sets of data;

a plurality of processing paths, each corresponding to one of said sets of data and comprising;

a plurality of shift registers containing said data; and

a plurality of multipliers associated with each of said shift registers, wherein said multipliers determine a plurality of products of said data with a plurality of coefficients associated with said path.

2. A high speed digital communication device according to claim 1 further comprising;

a plurality of summers connected to said multipliers, said summers determining a sum of a plurality of said products.

3. A high speed digital communication device according to claim 1 wherein said processing paths operate at a rate less than said receiving rate.

4. A high speed digital communication device according to claim 1 wherein said coefficients represent coefficients of a Finite Impulse Response filter.

5. A high speed digital communication device according to claim 1 wherein each of said plurality of multipliers associated with a respective shift register determines a respective product simultaneously.

6. A high speed digital communication device according to claim 4 wherein said Finite Impulse Response filter is a decimating filter,

wherein at least two shift registers are disposed in series between each of said multipliers.

7. A high speed digital communication device comprising:

a plurality of shift registers containing said data;

a plurality of multipliers, wherein each multiplier is associated with one of said shift registers and each multiplier feeds at least one of a plurality of summers.

8. A high speed digital communication device according to claim

7 wherein each of said multipliers corresponds to a set of coefficients wherein the value of each coefficient in each of said sets of coefficients does not vary over time.

9. A high speed digital correlator device comprising:

a receiver receiving digital data at a receiving rate and separating said digital data into a plurality of sets of data; and

a plurality of processing paths, each path corresponding to one of said sets of data and comprising;

a plurality of shift registers containing said data; and

a plurality of summers, wherein each of said summers determines a sum of products of said data with a set of comparison coefficients, and

wherein each of said processing paths operates at a rate less than said receiving rate.

10. A high speed digital correlator device according to claim 9,

wherein a match is declared if a sum of the results of said comparisons exceeds a predetermined threshold.

11. A high speed digital correlator device according to claim 9,

wherein each of said summers receives data from one of said shift registers.

12. A high speed digital correlator device according to claim 9,

wherein each of said summers receives data from a plurality of said shift registers.

13. A high speed digital demodulator comprising: a receiver receiving digital data at a receiving rate and separating said digital data into a plurality of sets of data;

a plurality of processing paths, each path corresponding to one of said sets of data;

a phase tracking loop device receiving processed digital data from an intermediate processing device, said processed digital data corresponding to each of said processing paths, said phase tracking loop device comprising;

at least two separate error detectors each error detector receiving a different set of samples of said processed digital data and outputting at least one respective error signal; and

a summer determining a sum of said respective error signals,

wherein said phase tracking loop device operates at a rate lower than said receiving rate.

14. A high speed digital demodulator comprising:

a plurality of processing paths, each path independently processing one of said sets of data; and,

a carrier phase tracking loop receiving alternating samples of processed digital data from said processing paths, said carrier phase tracking loop comprising at least two error detectors detecting phase errors in said processed digital data and said carrier phase tracking loop operating at a rate lower than said receiving rate.

15. A high speed digital demodulator according to claim 14 further comprising;

a phase rotator receiving a phase estimate from said carrier phase tracking loop, said phase rotator comprising;

a plurality of complex multipliers each receiving processed data from a respective one of said processing paths; and

a look up table comprising at least one memory device storing sine and cosine values of said phase estimate,

wherein each of said complex multipliers multiplies its respective processed data by said sine and cosine values of said phase estimate at a rate less than the receiving rate.

16. A high speed digital demodulator according to claim 14 further comprising;

a phase rotator receiving a phase estimate from said carrier phase tracking loop, said phase rotator comprising a plurality of complex multipliers each receiving processed data from a respective one of said processing paths; and

a phase interpolator linearly interpolating said phase estimates and applying said linearly interpolated phase estimates to a plurality of look up tables at a rate less than the receiving rate,

17. A high speed digital demodulator comprising: a matched filter receiving a stream of digital data separated into a plurality of sets of data, wherein each of said sets of data represents a collection of data samples, wherein each of said collections of samples represents an equal portion of said stream of digital data.

18. A high speed digital demodulator according to claim 17 wherein each of said equal portions of said stream of digital data is processed in a separate data path, each data path comprising:

a plurality of registers each register storing one of said samples for a period of time equal to one clock cycle of a clocking device,

a plurality of multipliers separated into a plurality of subsets of multipliers each subset comprising at least two of said multipliers and each subset of multipliers receiving the contents of only one of said registers; and

a plurality of summers each determining a sum of a plurality of said multipliers.

19. A high speed digital demodulator according to claim 18 wherein each of said plurality of subsets of multipliers is combined into a single shared partial product multiplier.

20. A high speed digital demodulator according to claim 18 wherein each of said subsets of multipliers provides data to at least two of said summers.

21. A high speed digital demodulator according to claim 19 wherein each of said shared partial product multipliers provides data to at least two of said summers.

22. A high speed digital demodulator according to claim 18 wherein each one of said plurality of registers is comprised of at least two secondary registers.

23. A high speed digital demodulator according to claim 19 wherein each one of said plurality of registers is comprised of at least two secondary registers.

24. A high speed digital demodulator comprising;

a phase interpolator receiving phase estimate values at a rate less than the receiving rate and generating interpolated phase values evenly spaced in time between each of said phase estimate values.