US7568070B2 - Instruction cache having fixed number of variable length instructions - Google Patents

Instruction cache having fixed number of variable length instructions Download PDF

Info

Publication number
US7568070B2
US7568070B2 US11/193,547 US19354705A US7568070B2 US 7568070 B2 US7568070 B2 US 7568070B2 US 19354705 A US19354705 A US 19354705A US 7568070 B2 US7568070 B2 US 7568070B2
Authority
US
United States
Prior art keywords
instruction
instructions
cache
cache line
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/193,547
Other versions
US20070028050A1 (en
Inventor
Jeffrey Todd Bridges
James Norris Dieffenderfer
Rodney Wayne Smith
Thomas Andrew Sartorius
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority to US11/193,547 priority Critical patent/US7568070B2/en
Assigned to QUALCOMM INCORPORATED, A DELAWARE CORPORATION reassignment QUALCOMM INCORPORATED, A DELAWARE CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BRIDGES, JEFFREY TODD, DIEFFENDERFER, JAMES NORRIS, SARTORIUS, THOMAS ANDREW, SMITH, RODNEY WAYNE
Priority to EP06788854A priority patent/EP1910919A2/en
Priority to JP2008524216A priority patent/JP4927840B2/en
Priority to CNA2006800343645A priority patent/CN101268440A/en
Priority to CN201510049939.1A priority patent/CN104657110B/en
Priority to KR1020087004751A priority patent/KR101005633B1/en
Priority to PCT/US2006/029523 priority patent/WO2007016393A2/en
Publication of US20070028050A1 publication Critical patent/US20070028050A1/en
Publication of US7568070B2 publication Critical patent/US7568070B2/en
Application granted granted Critical
Priority to JP2011237313A priority patent/JP5341163B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30149Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3816Instruction alignment, e.g. cache line crossing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3818Decoding for concurrent execution
    • G06F9/382Pipelined decoding, e.g. using predecoding

Definitions

  • the present invention relates generally to the field of processors and in particular to a processor having an instruction cache storing a fixed number of variable length instructions.
  • Microprocessors perform computational tasks in a wide variety of applications, including portable electronic devices. In many cases, maximizing processor performance is a major design goal, to permit additional functions and features to be implemented in portable electronic devices and other applications. Additionally, power consumption is of particular concern in portable electronic devices, which have limited battery capacity. Hence, processor designs that increase performance and reduce power consumption are desirable.
  • processors employ one or more instruction execution pipelines, wherein the execution of many multi-step sequential instructions is overlapped to improve overall processor performance.
  • instruction execution pipelines wherein the execution of many multi-step sequential instructions is overlapped to improve overall processor performance.
  • Recently executed instructions are stored in a cache—a high-speed, usually on-chip memory—for ready access by the execution pipeline.
  • variable length instructions that is, the instruction op codes read from memory do not all occupy the same amount of space. This may result from the inclusion of operands with arithmetic or logical instructions, the amalgamation of multiple operations into a Very Long Instruction Word (VLIW), or other architectural features.
  • VLIW Very Long Instruction Word
  • One disadvantage to variable length instructions is that, upon fetching instructions from an instruction cache, the processor must ascertain the boundaries of each instruction, a computational task that consumes power and reduces performance.
  • One approach known in the art to improving instruction cache access in the presence of variable length instructions is to “pre-decode” the instructions prior to storing them in the cache, and additionally store some instruction boundary information in the cache line along with the instructions. This reduces, but does not eliminate, the additional computational burden of ascertaining instruction boundaries that is placed on the decode task.
  • FIG. 1 depicts a representative diagram of two lines 100 , 140 of a prior art instruction cache storing variable length instructions (I 1 -I 9 ).
  • each cache line comprises sixteen bytes, and a 32-bit word size is assumed.
  • Most instructions are a word width, or four bytes. Some instructions are of half-word width, comprising two bytes.
  • a first cache line 100 and associated tag field 120 contain instructions I 1 through I 4 , and half of instruction I 5 .
  • a second cache line 140 with associated tag field 160 , contains the second half of instruction I 5 , and instructions I 6 through I 9 .
  • the instruction lengths and their address are summarized in the following table:
  • the processor To read these instructions from the cache lines 100 , 140 , the processor must expend additional computational effort—at the cost of power consumption and delay—to determine the instruction boundaries. While this task may be assisted by pre-decoding the instructions and storing boundary information in or associated with the cache lines 100 , 140 , the additional computation is not obviated. Additionally, a fetch of instruction I 5 will require two cache accesses. This dual access to fetch a misaligned instruction from the cache causes additional power consumption and processor delay.
  • variable-length instructions are stored in each line of an instruction cache.
  • the variable-length instructions are aligned along predetermined boundaries. Since the length of each instruction in the line, and hence the span of memory the instructions occupy, is not known, the address of the next following instruction is calculated and stored with the cache line. Ascertaining the instruction boundaries, aligning the instructions, and calculating the next fetch address are performed in a predecoder prior to placing the instructions in the cache.
  • a method of cache management in a processor having variable instruction length comprises storing a fixed number of instructions per cache line.
  • a processor in another embodiment, includes an instruction execution pipeline operative to execute instructions of variable length and an instruction cache operative to store a fixed number of the variable length instructions per cache line.
  • the processor additionally includes a predecoder operative to align the variable length instructions along predetermined boundaries prior to writing the instructions into a cache line.
  • FIG. 1 is a diagram of a prior art instruction cache storing variable length instructions.
  • FIG. 2 is a functional block diagram of a processor.
  • FIG. 3 is a diagram of an instruction cache storing a fixed number of variable length instructions, aligned along predetermined boundaries.
  • FIG. 2 depicts a functional block diagram of a representative processor 10 , employing both a pipelined architecture and a hierarchical memory structure.
  • the processor 10 executes instructions in an instruction execution pipeline 12 according to control logic 14 .
  • the pipeline includes various registers or latches 16 , organized in pipe stages, and one or more Arithmetic Logic Units (ALU) 18 .
  • a General Purpose Register (GPR) file 20 provides registers comprising the top of the memory hierarchy.
  • GPR General Purpose Register
  • the pipeline fetches instructions from an Instruction Cache (I-cache) 22 , with memory addressing and permissions managed by an Instruction-side Translation Lookaside Buffer (ITLB) 24 .
  • I-cache Instruction Cache
  • ITLB Instruction-side Translation Lookaside Buffer
  • a pre-decoder 21 inspects instructions fetched from memory prior to storing them in the I-cache 22 . As discussed below, the pre-decoder 21 ascertains instruction boundaries, aligns the instructions, and calculates a next fetch address, which is store in the I-cache 22 with the instructions.
  • TLB Translation Lookaside Buffer
  • the ITLB 24 may comprise a copy of part of the TLB 28 .
  • the ITLB 24 and TLB 28 may be integrated.
  • the I-cache 22 and D-cache 26 may be integrated, or unified. Misses in the I-cache 22 and/or the D-cache 26 cause an access to main (off-chip) memory 32 , under the control of a memory interface 30 .
  • the processor 10 may include an Input/Output (I/O) interface 34 , controlling access to various peripheral devices 36 .
  • I/O Input/Output
  • the processor 10 may include a second-level (L 2 ) cache for either or both the I and D caches 22 , 26 .
  • L 2 second-level cache for either or both the I and D caches 22 , 26 .
  • one or more of the functional blocks depicted in the processor 10 may be omitted from a particular embodiment.
  • the processor 10 stores a fixed number of variable length instructions in each cache line.
  • the instructions are preferably aligned along predetermined boundaries, such as for example word boundaries. This alleviates the decode pipe stage from the necessity of calculating instruction boundaries, allowing higher speed operation and thus improving processor performance.
  • Storing instructions this way in the I-cache 22 also reduces power consumption by performing instruction length detection and alignment operation once. As I-cache 22 hit rates are commonly in the high 90%, considerable power savings may be realized by eliminating the need to ascertain instruction boundaries every time an instruction is executed from the I-cache 22 .
  • the pre-decoder 21 comprises logic interposed in the path between main memory 32 and the I-cache 22 .
  • the pre-decoder 21 logic inspects the data retrieved from memory, and ascertains the number and length of instructions.
  • the pre-decoder aligns the instructions along predetermined, e.g., word, boundaries, prior to passing the aligned instructions to the cache to be stored in a cache line.
  • FIG. 3 depicts two representative lines 200 , 260 of the I-cache 22 , each containing a fixed number of the variable length instructions from FIG. 1 (in this example, four instructions are stored in each cache line 200 , 260 ).
  • the cache lines 200 , 260 are 16 bytes. Word boundaries are indicated by dashed lines; halfword boundaries are indicated by dotted lines. The instructions are aligned along word boundaries (i.e., each instruction starts at a word address).
  • the decode pipe stage may simply multiplex the relevant word from the cache line 200 , 260 and immediately begin decoding the op code.
  • half-word instructions e.g., I 3 and I 8
  • one half-word of space in the cache line 200 , 260 is unused, as indicated in FIG. 3 by shading.
  • the cache 22 of FIG. 3 stores only eight instructions in two cache lines, rather than nine.
  • the word space corresponding to the length of I 9 the halfwords at offsets 0 ⁇ 0A and 0 ⁇ 1E—is not utilized. This decrease in the efficiency of storing instructions in the cache 22 is the price of the simplicity, improved processor power, and lower power consumption of the cache utilization depicted in FIG. 3 .
  • a next fetch address is calculated by the pre-decoder 21 when the instructions are aligned (prior to storing them in the I-cache 22 ), and the next fetch address is stored in a field 240 along with the cache line 200 .
  • an offset from the tag 220 may be calculated, and stored in along with the cache line 200 , such as in an offset field 240 .
  • the next fetch address may then be easily calculated by adding the offset to the tag address. This embodiment incurs the processing delay and power consumption of performing this addition each time a successive address fetch crosses a cache line.
  • other information may be stored to assist in the calculation of the next fetch address. For example, a set of bits equal to the fixed number of instructions in a cache line 240 may be stored, with e.g.
  • next address calculation aids may be devised and stored to calculate the next instruction fetch address.
  • any variable length instructions may be advantageously stored in an instruction cache 22 in a fixed number, aligned along predetermined boundaries.
  • a different size cache line 240 , 300 than that depicted herein may be utilized in the practice of various embodiments.

Abstract

A fixed number of variable-length instructions are stored in each line of an instruction cache. The variable-length instructions are aligned along predetermined boundaries. Since the length of each instruction in the line, and hence the span of memory the instructions occupy, is not known, the address of the next following instruction is calculated and stored with the cache line. Ascertaining the instruction boundaries, aligning the instructions, and calculating the next fetch address are performed in a predecoder prior to placing the instructions in the cache.

Description

BACKGROUND
The present invention relates generally to the field of processors and in particular to a processor having an instruction cache storing a fixed number of variable length instructions.
Microprocessors perform computational tasks in a wide variety of applications, including portable electronic devices. In many cases, maximizing processor performance is a major design goal, to permit additional functions and features to be implemented in portable electronic devices and other applications. Additionally, power consumption is of particular concern in portable electronic devices, which have limited battery capacity. Hence, processor designs that increase performance and reduce power consumption are desirable.
Most modern processors employ one or more instruction execution pipelines, wherein the execution of many multi-step sequential instructions is overlapped to improve overall processor performance. Capitalizing on the spatial and temporal locality properties of most programs, recently executed instructions are stored in a cache—a high-speed, usually on-chip memory—for ready access by the execution pipeline.
Many processor Instruction Set Architectures (ISA) include variable length instructions. That is, the instruction op codes read from memory do not all occupy the same amount of space. This may result from the inclusion of operands with arithmetic or logical instructions, the amalgamation of multiple operations into a Very Long Instruction Word (VLIW), or other architectural features. One disadvantage to variable length instructions is that, upon fetching instructions from an instruction cache, the processor must ascertain the boundaries of each instruction, a computational task that consumes power and reduces performance.
One approach known in the art to improving instruction cache access in the presence of variable length instructions is to “pre-decode” the instructions prior to storing them in the cache, and additionally store some instruction boundary information in the cache line along with the instructions. This reduces, but does not eliminate, the additional computational burden of ascertaining instruction boundaries that is placed on the decode task.
Also, by packing instructions into the cache in the same compact form that they are read from memory, instructions are occasionally misaligned, with part of an instruction being stored at the end of one cache line and the remainder stored at the beginning of a successive cache line. Fetching this instruction requires two cache accesses, further reducing performance and increasing power consumption, particularly as the two accesses are required each time the instruction executes.
FIG. 1 depicts a representative diagram of two lines 100, 140 of a prior art instruction cache storing variable length instructions (I1-I9). In this representative example, each cache line comprises sixteen bytes, and a 32-bit word size is assumed. Most instructions are a word width, or four bytes. Some instructions are of half-word width, comprising two bytes. A first cache line 100 and associated tag field 120 contain instructions I1 through I4, and half of instruction I5. A second cache line 140, with associated tag field 160, contains the second half of instruction I5, and instructions I6 through I9. The instruction lengths and their address are summarized in the following table:
TABLE 1
Variable Length Instructions in Prior Art Cache
Instruction Size Address Alignment
I1 word 0x1A0 aligned on word boundary
I2 word 0x1A4 aligned on word boundary
I3 halfword 0x1A8 aligned on word boundary
I4 word 0x1AA misaligned across word boundaries
I5 word 0x1AE misaligned across cache lines
I6 word 0x1B2 misaligned across word boundaries
I7 word 0x1B6 misaligned across word boundaries
I8 halfword 0x1BA not aligned on word boundary
I9 word 0x1BC aligned on word boundary
To read these instructions from the cache lines 100, 140, the processor must expend additional computational effort—at the cost of power consumption and delay—to determine the instruction boundaries. While this task may be assisted by pre-decoding the instructions and storing boundary information in or associated with the cache lines 100, 140, the additional computation is not obviated. Additionally, a fetch of instruction I5 will require two cache accesses. This dual access to fetch a misaligned instruction from the cache causes additional power consumption and processor delay.
SUMMARY
A fixed number of variable-length instructions are stored in each line of an instruction cache. The variable-length instructions are aligned along predetermined boundaries. Since the length of each instruction in the line, and hence the span of memory the instructions occupy, is not known, the address of the next following instruction is calculated and stored with the cache line. Ascertaining the instruction boundaries, aligning the instructions, and calculating the next fetch address are performed in a predecoder prior to placing the instructions in the cache.
In one embodiment, a method of cache management in a processor having variable instruction length comprises storing a fixed number of instructions per cache line.
In another embodiment, a processor includes an instruction execution pipeline operative to execute instructions of variable length and an instruction cache operative to store a fixed number of the variable length instructions per cache line. The processor additionally includes a predecoder operative to align the variable length instructions along predetermined boundaries prior to writing the instructions into a cache line.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is a diagram of a prior art instruction cache storing variable length instructions.
FIG. 2 is a functional block diagram of a processor.
FIG. 3 is a diagram of an instruction cache storing a fixed number of variable length instructions, aligned along predetermined boundaries.
DETAILED DESCRIPTION
FIG. 2 depicts a functional block diagram of a representative processor 10, employing both a pipelined architecture and a hierarchical memory structure. The processor 10 executes instructions in an instruction execution pipeline 12 according to control logic 14. The pipeline includes various registers or latches 16, organized in pipe stages, and one or more Arithmetic Logic Units (ALU) 18. A General Purpose Register (GPR) file 20 provides registers comprising the top of the memory hierarchy.
The pipeline fetches instructions from an Instruction Cache (I-cache) 22, with memory addressing and permissions managed by an Instruction-side Translation Lookaside Buffer (ITLB) 24. A pre-decoder 21 inspects instructions fetched from memory prior to storing them in the I-cache 22. As discussed below, the pre-decoder 21 ascertains instruction boundaries, aligns the instructions, and calculates a next fetch address, which is store in the I-cache 22 with the instructions.
Data is accessed from a Data Cache 26, with memory addressing and permissions managed by a main Translation Lookaside Buffer (TLB) 28. In various embodiments, the ITLB 24 may comprise a copy of part of the TLB 28. Alternatively, the ITLB 24 and TLB 28 may be integrated. Similarly, in various embodiments of the processor 10, the I-cache 22 and D-cache 26 may be integrated, or unified. Misses in the I-cache 22 and/or the D-cache 26 cause an access to main (off-chip) memory 32, under the control of a memory interface 30.
The processor 10 may include an Input/Output (I/O) interface 34, controlling access to various peripheral devices 36. Those of skill in the art will recognize that numerous variations of the processor 10 are possible. For example, the processor 10 may include a second-level (L2) cache for either or both the I and D caches 22, 26. In addition, one or more of the functional blocks depicted in the processor 10 may be omitted from a particular embodiment.
According to one or more embodiments disclosed herein, the processor 10 stores a fixed number of variable length instructions in each cache line. The instructions are preferably aligned along predetermined boundaries, such as for example word boundaries. This alleviates the decode pipe stage from the necessity of calculating instruction boundaries, allowing higher speed operation and thus improving processor performance. Storing instructions this way in the I-cache 22 also reduces power consumption by performing instruction length detection and alignment operation once. As I-cache 22 hit rates are commonly in the high 90%, considerable power savings may be realized by eliminating the need to ascertain instruction boundaries every time an instruction is executed from the I-cache 22.
The pre-decoder 21 comprises logic interposed in the path between main memory 32 and the I-cache 22. The pre-decoder 21 logic inspects the data retrieved from memory, and ascertains the number and length of instructions. The pre-decoder aligns the instructions along predetermined, e.g., word, boundaries, prior to passing the aligned instructions to the cache to be stored in a cache line.
FIG. 3 depicts two representative lines 200, 260 of the I-cache 22, each containing a fixed number of the variable length instructions from FIG. 1 (in this example, four instructions are stored in each cache line 200, 260). The cache lines 200, 260 are 16 bytes. Word boundaries are indicated by dashed lines; halfword boundaries are indicated by dotted lines. The instructions are aligned along word boundaries (i.e., each instruction starts at a word address). When an instruction is fetched from the I-cache 22 by the pipeline 12, the decode pipe stage may simply multiplex the relevant word from the cache line 200, 260 and immediately begin decoding the op code. In the case of half-word instructions (e.g., I3 and I8), one half-word of space in the cache line 200, 260, respectively, is unused, as indicated in FIG. 3 by shading.
Note that, as compared to the prior art cache depicted in FIG. 1, the cache 22 of FIG. 3 stores only eight instructions in two cache lines, rather than nine. The word space corresponding to the length of I9—the halfwords at offsets 0×0A and 0×1E—is not utilized. This decrease in the efficiency of storing instructions in the cache 22 is the price of the simplicity, improved processor power, and lower power consumption of the cache utilization depicted in FIG. 3.
Additionally, by allocating a fixed number of variable length instructions to a cache line 200, 260, and aligning the instructions along predetermined boundaries, no instruction is stored misaligned across cache lines, such as I5 in FIG. 1. Thus, the performance penalty and excess power consumption caused by two cache 22 accesses to retrieve a single instruction are completely obviated.
Because a fixed number of variable length instructions is stored, rather than a variable number of instructions having a known total length (the length of the cache line), the address of the next sequential instruction cannot be ascertained by simply incrementing the tag 220 of one cache line 200 by the memory size of the cache line 200. Accordingly, in one embodiment, a next fetch address is calculated by the pre-decoder 21 when the instructions are aligned (prior to storing them in the I-cache 22), and the next fetch address is stored in a field 240 along with the cache line 200.
As an alternative to calculating and storing a next fetch address, according to one embodiment an offset from the tag 220 may be calculated, and stored in along with the cache line 200, such as in an offset field 240. The next fetch address may then be easily calculated by adding the offset to the tag address. This embodiment incurs the processing delay and power consumption of performing this addition each time a successive address fetch crosses a cache line. In other embodiments, other information may be stored to assist in the calculation of the next fetch address. For example, a set of bits equal to the fixed number of instructions in a cache line 240 may be stored, with e.g. a one indicating a fullword length instruction and a zero indicating a halfword length instruction stored in the corresponding instruction “slot.” The addresses of the instructions in memory, and hence the address of the next sequential instruction, may then be calculated from this information. Those of skill in the art will readily recognize that additional next address calculation aids may be devised and stored to calculate the next instruction fetch address.
While various embodiments have been explicated herein with respect to a representative ISA including word and halfword instruction lengths, the present invention is not limited to these embodiments. In general, any variable length instructions may be advantageously stored in an instruction cache 22 in a fixed number, aligned along predetermined boundaries. Additionally, a different size cache line 240, 300 than that depicted herein may be utilized in the practice of various embodiments.
Although embodiments of the present invention have been described herein with respect to particular features, aspects and embodiments thereof, it will be apparent that numerous variations, modifications, and other embodiments are possible within the broad scope of the present invention, and accordingly, all variations, modifications and embodiments are to be regarded as being within the scope of the invention. The present embodiments are therefore to be construed in all aspects as illustrative and not restrictive and all changes coming within the meaning and equivalency range of the appended claims are intended to be embraced therein.

Claims (20)

1. A method of cache management in a processor having variable instruction length, the method comprising:
storing a first plurality of instructions comprising a fixed number of instructions in a first cache line of a cache, the first plurality of instructions including a first instruction having a first instruction length and a second instruction having a second instruction length that differs from the first instruction length;
storing a second plurality of instructions comprising the fixed number of instructions in a second cache line of the cache; and
storing in the cache and adjacent to the first cache line one of:
a next fetch address; and
an offset yielding the next fetch address when added to a cache line tag associated with the first cache line.
2. The method of claim 1, further comprising inspecting each of the first plurality of instructions to determine their corresponding lengths and aligning each of the first plurality of instructions along predetermined boundaries.
3. The method of claim 1, further comprising, prior to placing the first plurality of instructions in the first cache line, determining at least one of:
the next fetch address; and
the offset.
4. The method of claim 1, further comprising ascertaining a corresponding predetermined instruction boundary associated with each of the first plurality of instructions.
5. The method of claim 1, further comprising aligning each of the first plurality of instructions with a corresponding predetermined boundary of the first cache line.
6. The method of claim 5, wherein each predetermined boundary is a word boundary having an associated word address.
7. The method of claim 1, wherein after storing the fixed number of instructions, the first cache line includes an unoccupied portion.
8. The method of claim 1, wherein the fixed number is four.
9. A processor comprising:
an instruction cache comprising:
a first cache line to store a predetermined number of instructions including a first instruction and a second instruction, wherein the first instruction has a first instruction length and the second instruction has a second instruction length that differs from the first instruction length; and
a second cache line to store the predetermined number of instructions;
wherein the first cache line is further to store one of:
a next fetch address; and
an offset yielding the next fetch address when added to a cache line tag associated with the first cache line.
10. The processor of claim 9, further comprising a predecoder operative to align each of the first instruction and the second instruction with a corresponding predetermined boundary of the first cache line.
11. The processor of claim 10, wherein the predecoder is operative to calculate the next fetch address of a next instruction following a final instruction written to the first cache line, and to store in the first cache line one of the next fetch address of the next instruction and the offset.
12. The processor of claim 9, further comprising a predecoder operative to:
ascertain a corresponding length of each instruction;
align each instruction along a corresponding word boundary of the cache; and
pass the aligned instructions to the cache.
13. The processor of claim 9, further comprising an instruction execution pipeline operative to execute instructions having varying lengths.
14. A method of cache management in a processor having variable instruction length, the method comprising:
determining a fixed number of instructions storable per cache line of a cache comprising a plurality of cache lines, wherein the fixed number is greater than one; and
storing a plurality of instructions including the fixed number of instructions in a first cache line, the plurality of instructions including a first instruction having a first instruction length and a second instruction having a second instruction length that differs from the first instruction length.
15. The method of claim 14, further comprising inspecting each of the plurality of instructions to determine a corresponding length of each instruction, and aligning each of the plurality of instructions along a corresponding predetermined boundary prior to placement in the cache.
16. The method of claim 14, wherein each of the plurality of instructions has a corresponding instruction length that does not exceed a word length associated with the first cache line.
17. The method of claim 15, wherein at least one predetermined boundary is a word boundary.
18. The method of claim 14, further comprising storing a next fetch address in the cache and adjacent to the first cache line.
19. The method of claim 18, further comprising determining the next fetch address prior to storing the plurality of instructions in the first cache line.
20. The method of claim 14, further comprising storing an offset in the cache and adjacent to the first cache line, the offset yielding a next fetch address when added to a cache line tag associated with the first cache line.
US11/193,547 2005-07-29 2005-07-29 Instruction cache having fixed number of variable length instructions Active 2026-09-27 US7568070B2 (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
US11/193,547 US7568070B2 (en) 2005-07-29 2005-07-29 Instruction cache having fixed number of variable length instructions
CN201510049939.1A CN104657110B (en) 2005-07-29 2006-07-26 Instruction cache with fixed number of variable length instructions
JP2008524216A JP4927840B2 (en) 2005-07-29 2006-07-26 Instruction cache with a fixed number of variable-length instructions
CNA2006800343645A CN101268440A (en) 2005-07-29 2006-07-26 Instruction cache having fixed number of variable length instructions
EP06788854A EP1910919A2 (en) 2005-07-29 2006-07-26 Instruction cache having fixed number of variable length instructions
KR1020087004751A KR101005633B1 (en) 2005-07-29 2006-07-26 Instruction cache having fixed number of variable length instructions
PCT/US2006/029523 WO2007016393A2 (en) 2005-07-29 2006-07-26 Instruction cache having fixed number of variable length instructions
JP2011237313A JP5341163B2 (en) 2005-07-29 2011-10-28 Instruction cache with a fixed number of variable-length instructions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/193,547 US7568070B2 (en) 2005-07-29 2005-07-29 Instruction cache having fixed number of variable length instructions

Publications (2)

Publication Number Publication Date
US20070028050A1 US20070028050A1 (en) 2007-02-01
US7568070B2 true US7568070B2 (en) 2009-07-28

Family

ID=37451109

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/193,547 Active 2026-09-27 US7568070B2 (en) 2005-07-29 2005-07-29 Instruction cache having fixed number of variable length instructions

Country Status (6)

Country Link
US (1) US7568070B2 (en)
EP (1) EP1910919A2 (en)
JP (2) JP4927840B2 (en)
KR (1) KR101005633B1 (en)
CN (2) CN101268440A (en)
WO (1) WO2007016393A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9916251B2 (en) 2014-12-01 2018-03-13 Samsung Electronics Co., Ltd. Display driving apparatus and cache managing method thereof

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7337272B2 (en) * 2006-05-01 2008-02-26 Qualcomm Incorporated Method and apparatus for caching variable length instructions
US8898437B2 (en) * 2007-11-02 2014-11-25 Qualcomm Incorporated Predecode repair cache for instructions that cross an instruction cache line
US8627017B2 (en) * 2008-12-30 2014-01-07 Intel Corporation Read and write monitoring attributes in transactional memory (TM) systems
US9753858B2 (en) * 2011-11-30 2017-09-05 Advanced Micro Devices, Inc. DRAM cache with tags and data jointly stored in physical rows
JP5968693B2 (en) * 2012-06-25 2016-08-10 ルネサスエレクトロニクス株式会社 Semiconductor device
US10001993B2 (en) 2013-08-08 2018-06-19 Linear Algebra Technologies Limited Variable-length instruction buffer management
US11768689B2 (en) 2013-08-08 2023-09-26 Movidius Limited Apparatus, systems, and methods for low power computational imaging
US10853074B2 (en) * 2014-05-01 2020-12-01 Netronome Systems, Inc. Table fetch processor instruction using table number to base address translation
CN106796504B (en) * 2014-07-30 2019-08-13 线性代数技术有限公司 Method and apparatus for managing variable length instruction
CN106528450B (en) * 2016-10-27 2019-09-17 上海兆芯集成电路有限公司 Extracting data in advance and the device for using the method
CN108415729A (en) * 2017-12-29 2018-08-17 北京智芯微电子科技有限公司 A kind of processing method and processing device of cpu instruction exception
CN110750303B (en) * 2019-09-25 2020-10-20 支付宝(杭州)信息技术有限公司 Pipelined instruction reading method and device based on FPGA

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5179680A (en) * 1987-04-20 1993-01-12 Digital Equipment Corporation Instruction storage and cache miss recovery in a high speed multiprocessing parallel processing apparatus
US5488710A (en) * 1991-02-08 1996-01-30 Fujitsu Limited Cache memory and data processor including instruction length decoding circuitry for simultaneously decoding a plurality of variable length instructions
US6035387A (en) 1997-03-18 2000-03-07 Industrial Technology Research Institute System for packing variable length instructions into fixed length blocks with indications of instruction beginning, ending, and offset within block
WO2001027749A1 (en) 1999-10-14 2001-04-19 Advanced Micro Devices, Inc. Apparatus and method for caching alignment information
US6253287B1 (en) 1998-09-09 2001-06-26 Advanced Micro Devices, Inc. Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions
US6253309B1 (en) 1998-09-21 2001-06-26 Advanced Micro Devices, Inc. Forcing regularity into a CISC instruction set by padding instructions
US6530013B1 (en) * 1998-12-17 2003-03-04 Fujitsu Limited Instruction control apparatus for loading plurality of instructions into execution stage
US6779100B1 (en) * 1999-12-17 2004-08-17 Hewlett-Packard Development Company, L.P. Method and device for address translation for compressed instructions

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996029645A1 (en) * 1995-03-23 1996-09-26 International Business Machines Corporation Object-code compatible representation of very long instruction word programs
JP3750821B2 (en) * 1996-05-15 2006-03-01 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴイ VLIW processor for processing compressed instruction formats
US6112299A (en) * 1997-12-31 2000-08-29 International Business Machines Corporation Method and apparatus to select the next instruction in a superscalar or a very long instruction word computer having N-way branching
JP2003131945A (en) * 2001-10-25 2003-05-09 Hitachi Ltd Cache memory device
US7133969B2 (en) * 2003-10-01 2006-11-07 Advanced Micro Devices, Inc. System and method for handling exceptional instructions in a trace cache based processor

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5179680A (en) * 1987-04-20 1993-01-12 Digital Equipment Corporation Instruction storage and cache miss recovery in a high speed multiprocessing parallel processing apparatus
US5488710A (en) * 1991-02-08 1996-01-30 Fujitsu Limited Cache memory and data processor including instruction length decoding circuitry for simultaneously decoding a plurality of variable length instructions
US6035387A (en) 1997-03-18 2000-03-07 Industrial Technology Research Institute System for packing variable length instructions into fixed length blocks with indications of instruction beginning, ending, and offset within block
US6253287B1 (en) 1998-09-09 2001-06-26 Advanced Micro Devices, Inc. Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions
US6253309B1 (en) 1998-09-21 2001-06-26 Advanced Micro Devices, Inc. Forcing regularity into a CISC instruction set by padding instructions
US6530013B1 (en) * 1998-12-17 2003-03-04 Fujitsu Limited Instruction control apparatus for loading plurality of instructions into execution stage
WO2001027749A1 (en) 1999-10-14 2001-04-19 Advanced Micro Devices, Inc. Apparatus and method for caching alignment information
US6779100B1 (en) * 1999-12-17 2004-08-17 Hewlett-Packard Development Company, L.P. Method and device for address translation for compressed instructions

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9916251B2 (en) 2014-12-01 2018-03-13 Samsung Electronics Co., Ltd. Display driving apparatus and cache managing method thereof

Also Published As

Publication number Publication date
US20070028050A1 (en) 2007-02-01
KR101005633B1 (en) 2011-01-05
JP2012074046A (en) 2012-04-12
CN104657110A (en) 2015-05-27
EP1910919A2 (en) 2008-04-16
WO2007016393A2 (en) 2007-02-08
JP4927840B2 (en) 2012-05-09
CN101268440A (en) 2008-09-17
WO2007016393A3 (en) 2007-06-28
KR20080031981A (en) 2008-04-11
CN104657110B (en) 2020-08-18
JP5341163B2 (en) 2013-11-13
JP2009503700A (en) 2009-01-29

Similar Documents

Publication Publication Date Title
US7568070B2 (en) Instruction cache having fixed number of variable length instructions
US7962725B2 (en) Pre-decoding variable length instructions
US7818542B2 (en) Method and apparatus for length decoding variable length instructions
US7818543B2 (en) Method and apparatus for length decoding and identifying boundaries of variable length instructions
US7711927B2 (en) System, method and software to preload instructions from an instruction set other than one currently executing
US6502185B1 (en) Pipeline elements which verify predecode information
US7676659B2 (en) System, method and software to preload instructions from a variable-length instruction set with proper pre-decoding
US5774710A (en) Cache line branch prediction scheme that shares among sets of a set associative cache
EP3550437B1 (en) Adaptive spatial access prefetcher apparatus and method
US20070266228A1 (en) Block-based branch target address cache
US9317285B2 (en) Instruction set architecture mode dependent sub-size access of register with associated status indication
TWI438681B (en) Immediate and displacement extraction and decode mechanism
US7519799B2 (en) Apparatus having a micro-instruction queue, a micro-instruction pointer programmable logic array and a micro-operation read only memory and method for use thereof
CN116339832A (en) Data processing device, method and processor
EP0698884A1 (en) Memory array for microprocessor cache
US11809873B2 (en) Selective use of branch prediction hints
CN113568663A (en) Code prefetch instruction
CN115858022A (en) Scalable switch point control circuitry for clustered decoding pipeline

Legal Events

Date Code Title Description
AS Assignment

Owner name: QUALCOMM INCORPORATED, A DELAWARE CORPORATION, CAL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BRIDGES, JEFFREY TODD;DIEFFENDERFER, JAMES NORRIS;SMITH, RODNEY WAYNE;AND OTHERS;REEL/FRAME:016852/0630

Effective date: 20050727

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12