1
INPUT PIPELINE REGISTERS FOR A NODE
IN AN ADAPTIVE COMPUTING ENGINE
CLAIM OF PRIORITY
5
This application claims priority from U.S. Provisional Patent Application Ser. No. 60/422,063, filed Oct. 28, 2002; entitled "RECONFIGURATION NODE RXN" which is hereby incorporated by reference as if set forth in full in this application. 10
CROSS-REFERENCE TO RELATED
APPLICATIONS
This application is related to the following co-pending 15 U.S. patent applications that are each incorporated by reference as if set forth in full in this application:
"ADAPTABLE DATAPATH FOR A DIGITAL PROCESSING SYSTEM," Ser. No. 10/626,833, filed Jul. 23, 2003 (Our Ref. No. 021202-003710US); 20
"CACHE FOR INSTRUCTION SET ARCHITECTURE USING INDEXES TO ACHIEVE COMPRESSION," Ser. No. 10/628,083, filed Jul. 24, 2003 (Our Ref. No. 021202003730US);
"ADAPTIVE INTEGRATED CIRCUITRY WITH HET- 25 EROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS," Ser. No. 09/815,122, filed on Mar. 22, 2001. 30
BACKGROUND OF THE INVENTION
This invention is related in general to digital processing architectures and more specifically to the use of pipeline 35 registers to facilitate improved processing performance.
A basic design for digital signal processor (DSP) 10 architecture is shown in the prior art diagram of FIG. 1A. DSP calculations require many iterations of fast multiplyaccumulate and other repetitive operations. Typically, "func- 40 tional units" such as multipliers, adders, accumulators, shifters, etc. are used to perform the operations. Such functional units are shown as 12, 14 and 16. The functional units obtain instructions and data, such as values, opcodes, operands, etc. (collectively referred to as "data") from main 45 memory 20 that is typically a random access memory (RAM). The DSP system can be included within a chip that resides in a device such as a consumer electronic device, computer, etc. Note that many variations on the design of FIG. 1A are possible. For example, a single functional unit, 50 such as a general-purpose central processing unit (CPU) can be used. Typically, more than one memory storage unit is used, such as separate storage for instructions and data.
In the basic design, the functional units are constantly transferring data to and from memory, other functional units; 55 and other devices, sources and destinations (collectively referred to as "components"). The speed at which data can be transferred among various components in the architecture design is a primary factor in determining the speed and efficiency of the overall design. 60
Since accesses to main memory (or external cache or other storage) are relatively slow and require using bus interface logic, one approach to improve performance is the use of bus register file interface 40 and bus register file 42. Bus register file 42 allows data to be stored proximately, and 65 in association with, the localized bus 32 so that accessing the main memory is not necessary for frequently-needed values.
2
However, this approach still places limitation on access times and system performance as explained, below, in connection with FIG. IB.
FIG. IB illustrates a basic pipelined instruction cycle having fetch, decode and execute stages. In FIG. IB, pipelines 50, 60 and 70 allow concurrent execution of each stage in a manner that is known in the art. Thus, each of the fetch, decode and execute stages for different instructions or operations can be executed in a same clock cycle. This allows, e.g., the decode stage of instruction 52 to be executed at the same time as the fetch stage of instruction 62.
One drawback of this approach is that a result of an instruction is not available until the end of the execute cycle when the data is computed and stored back into a bus register. Since data is available to an instruction after the fetch stage, an instruction executing in a different pipeline may have to wait for one or more cycles before the data result of a different instruction is available. For example, FIG. IB shows instruction 52 completing its execute stage at a time designated by line 54. However, at this time instruction 62 is past its fetch stage and so instruction 62 is delayed within the pipeline and its stages are repeated as instruction 64 so that a fetch stage is executed to obtain the data. Alternatively, the fetch stage of instruction 62 can be flushed, rescheduled, suspended, or affected in other ways, until after execution of instruction 52's execute stage. In any case, the inability of instruction 62 to have needed data at the time of its execute stage causes delays and inefficiencies in processing. The use of additional pipelines, such as pipeline 70 can compound and further complicate data accesses.
Thus, it is desirable to provide a design that improves data accesses in a digital processing architecture.
SUMMARY OF THE INVENTION
The present invention includes input pipeline registers at inputs to different functional units. Pipeline registers are used to hold last-accessed values at various inputs and onto various buses and data lines. A preferred embodiment also allows pipeline registers to immediately place commonly needed constant values, such as zero or one, onto inputs and data lines. This approach can reduce the time to obtain data values and conserve power by avoiding slower and more complex memory or storage accesses such as via an arbitrated bus.
Another embodiment of the invention allows data values to be obtained earlier during pipelined execution of instructions. For example, in a three stage fetch-decode-execute type of reduced instruction set computer (RISC), a data value can be ready from a prior instruction at the decode or execute stage of a subsequent instruction.
A specific embodiment of the invention provides a digital processor including a clock signal for determining a processor cycle, the digital processor comprising one or more functional units coupled by a bus, wherein the one or more functional units include functional unit inputs; at least one input register coupled between the bus and at least one functional unit input; and a control signal for selectively causing the at least one input register to hold a data value from the bus for one or more processor cycles.
Another embodiment of the invention provides a method for providing data in a digital processor, the method comprising including input registers at inputs to functional units, wherein the input registers are coupled to a bus for obtaining data from the bus; and including a control signal for selectively causing the input registers to hold a data value from the bus for one or more processor cycles.