US20070192626A1 - Exponent windowing - Google Patents
Exponent windowing Download PDFInfo
- Publication number
- US20070192626A1 US20070192626A1 US11/647,892 US64789206A US2007192626A1 US 20070192626 A1 US20070192626 A1 US 20070192626A1 US 64789206 A US64789206 A US 64789206A US 2007192626 A1 US2007192626 A1 US 2007192626A1
- Authority
- US
- United States
- Prior art keywords
- bits
- window
- register
- logic
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Abstract
The disclosure includes description of a processor component that includes a set of register bits to perform a shift register operation. The component window detection logic can detect a window of bits in the set of register bits and, in response to detecting the window, output the window of bits.
Description
- This claims priority from, co-pending U.S. patent application Ser. No. 11/323,329, attorney docket 42390.P23349 filed on Dec. 30, 2005, and entitled “CRYPTOGRAPHIC SYSTEM COMPONENT”.
- Cryptography can protect data from unwanted access. Cryptography typically involves mathematical operations on data (encryption) that makes the original data (plaintext) unintelligible (ciphertext). Reverse mathematical operations (decryption) restore the original data from the ciphertext. Typically, decryption relies on additional data such as a cryptographic key. A cryptographic key is data that controls how a cryptography algorithm processes the plaintext. In other words, different keys generally cause the same algorithm to output different ciphertext for the same plaintext. Absent a needed decryption key, restoring the original data is, at best, an extremely time consuming mathematical challenge.
- Cryptography is used in a variety of situations. For example, a document on a computer may be encrypted so that only authorized users of the document can decrypt and access the document's contents. Similarly, cryptography is often used to encrypt the contents of packets traveling across a public network. While malicious users may intercept these packets, these malicious users access only the ciphertext rather than the plaintext being protected.
- Cryptography covers a wide variety of applications beyond encrypting and decrypting data. For example, cryptography is often used in authentication (i.e., reliably determining the identity of a communicating agent), the generation of digital signatures, and so forth.
- Current cryptographic techniques rely heavily on intensive mathematical operations. For example, many schemes involve the multiplication of very large numbers. For instance, many schemes use a type of modular arithmetic known as modular exponentiation which involves raising a large number to some power and reducing it with respect to a modulus (i.e., the remainder when divided by given modulus). The mathematical operations required by cryptographic schemes can consume considerable processor resources. For example, a processor of a networked computer participating in a secure connection may devote a significant portion of its computation power on encryption and decryption tasks, leaving less processor resources for other operations.
-
FIG. 1 is a diagram of a cryptographic component. -
FIG. 2 is a flow diagram illustrating operation of a cryptographic component. -
FIG. 3 is a diagram of a processor including a cryptographic component. -
FIG. 4 is a diagram illustrating processing unit architecture. -
FIG. 5 is a diagram of logic interconnecting shared memory and the processing units. -
FIG. 6 is a diagram of a set of processing units coupled to a multiplier. -
FIG. 7 is a diagram of a programmable processing unit. -
FIG. 8 is a diagram illustrating operation of an instruction to cause transfer of data from an input buffer into a data bank. -
FIGS. 9-11 are diagrams illustrating operation of instructions to cause an arithmetic logic unit operation. -
FIG. 12 is a diagram illustrating concurrent operation of datapath instructions. -
FIG. 13 is a diagram illustrating different sets of variables corresponding to different hierarchical scopes of program execution. -
FIG. 14 is a diagram illustrating windowing of an exponent -
FIG. 15 is a diagram of windowing logic. -
FIG. 16 is a diagram illustrating operation of a hardware multiplier. -
FIG. 17 is a diagram of a hardware multiplier. -
FIGS. 18-20 are diagrams of different types of processing units. -
FIG. 21 is a diagram of a processor having multiple processor cores. -
FIG. 22 is a diagram of a processor core. -
FIG. 23 is a diagram of a network forwarding device. -
FIG. 1 depicts a sample implementation of asystem component 100 to perform cryptographic operations. Thecomponent 100 can be integrated into a variety of systems. For example, thecomponent 100 can be integrated within the die of a processor or found within a processor chipset. Thesystem component 100 can off-load a variety of cryptographic operations from other system processor(s). Thecomponent 100 provides high performance at relatively modest clock speeds and is area efficient. - As shown, the
sample component 100 may be integrated on a single die that includes multiple processing units 106-112 coupled to sharedmemory logic 104. The sharedmemory logic 104 includes memory that can act as a staging area for data and control structures being operated on by the different processing units 106-112. For example, data may be stored in memory and then sent to different processing units 106-112 in turn, with each processing unit performing some task involved in cryptographic operations and returning the, potentially, transformed data back to the sharedmemory logic 104. - The processing units 106-112 are constructed to perform different operations involved in cryptography such as encryption, decryption, authentication, and key generation. For example,
processing unit 106 may perform hashing algorithms (e.g., MD5 (Message Digest 5) and/or SHA (Secure Hash Algorithm)) whileprocessing unit 110 performs cipher operations (e.g., DES (Data Encryption Standard), 3DES (Triple DES), AES (Advanced Encryption Standard), RC4 (ARCFOUR), and/or Kasumi). - As shown, the shared
memory logic 104 is also coupled to a RAM (random access memory) 114. In operation, data can be transferred from theRAM 114 for processing by the processing units 106-112. Potentially, transformed data (e.g., encrypted or decrypted data) is returned to theRAM 114. Thus, theRAM 114 may represent a nexus between thecomponent 100 and other system components (e.g., processor cores requesting cryptographic operations on data in RAM 114). TheRAM 114 may be external to the die hosting thecomponent 100. - The sample implementation shown includes a
programmable processor core 102 that controls operation of thecomponent 100. As shown, thecore 102 receives commands to perform cryptographic operations on data. Such commands can identify the requesting agent (e.g., core), a specific set of operations to perform (e.g., cryptographic protocol), the data to operate on (e.g., the location of a packet payload), and additional cryptographic context data such as a cryptographic key, initial vector, and/or residue from a previous cryptographic operation. In response to a command, thecore 102 can execute program instructions that transfer data betweenRAM 114, shared memory, and the processing units 106-112. - A program executed by the
core 102 can perform a requested cryptographic operation in a single pass through program code. As an example,FIG. 2 illustrates processing of a command to encrypt packet “A” stored inRAM 114 by a program executed bycore 102. For instance, another processor core (not shown) may send the command to component 100 to prepare transmission of packet “A” across a public network. As shown, the sample program: (1) reads the packet and any associated cryptography context (e.g., keys, initial vectors, or residue) into shared memory fromRAM 114; (2) sends the data to an aligningprocessing unit 106 that writes the data back into sharedmemory 114 aligned on a specified byte boundary; (3) sends the data to acipher processing unit 108 that performs a transformative cipher operation on the data before sending the transformed data tomemory 104; and (4) transfers the transformed data to RAM 114. Thecore 102 may then generate a signal or message notifying the processor core that issued the command that encryption is complete. - The
processor core 102 may be a multi-threaded processor core including storage for multiple program counters and contexts associated with multiple, respective, threads of program execution. That is, inFIG. 2 ,thread 130 may be one of multiple threads. Thecore 102 may switch between thread contexts to mask latency associated with processing unit 106-112 operation. For example,thread 130 may include an instruction (not shown) explicitly relinquishingthread 130 execution after an instruction sending data to thecipher processing unit 108 until receiving an indication that the transformed data has been written into sharedmemory 104. Alternately, thecore 102 may use pre-emptive context switching that automatically switches contexts after certain events (e.g., requesting operation of a processing unit 106-112 or after a certain amount of execution time). Thread switching enables a different thread to perform other operations such as processing of a different packet in what would otherwise be wastedcore 102 cycles. Throughput can be potentially be increased by adding additional contexts to thecore 102. In a multi-threaded implementation, threads can be assigned to commands in a variety of ways, for example, by a dispatcher thread that assigns threads to commands or by threads dequeuing commands when the threads are available. -
FIG. 3 illustrates a sample implementation of aprocessor 124 including acryptographic system component 100. As shown, thecomponent 100 receives commands from processor core(s) 118-122. In this sample implementation,core 102 is integrated into thesystem component 100 and services commands from the other cores 118-122. In an alternate implementation,processing core 102 may not be integrated within the component. Instead cores 118-122 may have direct control overcomponent 100 operation. Alternately, one of cores 118-122, may be designated for controlling thecryptographic component 100 and servicing requests received from the other cores 118-122. This latter approach can lessen the expense and die footprint of thecomponent 100. - As shown in
FIG. 4 , the different processing units 106-112 may feature the same uniform interface architecture to the sharedmemory logic 104. This uniformity eases the task of programming by making interaction with each processing unit very similar. The interface architecture also enables the set of processing units 106-112 included within thecomponent 100 to be easily configured. For example, to increase throughput, acomponent 100 can be configured to include multiple copies of the same processing unit. For instance, if thecomponent 100 is likely to be included in a system that will perform a large volume of authentication operations, thecomponent 100 may be equipped with multiple hash processing units. Additionally, the architecture enables new processing units to be easily integrated into thecomponent 100. For example, when a new cryptography algorithm emerges, a processing unit to implement the algorithm can be made available. - In the specific implementation shown in
FIG. 4 , each processing unit includes aninput buffer 142 that receives data from sharedmemory logic 104 and anoutput buffer 140 that stores data to transfer to sharedmemory logic 104. Theprocessing unit 106 also includesprocessing logic 144 such as programmable or dedicated hardware (e.g., an Application Specific Integrated Circuit (ASIC)) to operate on data received byinput buffer 142 and write operation results to buffer 140. In the example shown, buffers 140, 142 may include memory and logic (not shown) that queue data in the buffers based on the order in which data is received. For example, the logic may feature head and tail pointers into the memory and may append newly received data to the tail. - In the sample implementation shown, the
input buffer 140 is coupled to the sharedmemory logic 104 by adifferent bus 146 than thebus 148 coupling theoutput buffer 140 to the sharedmemory logic 104. Thesebuses buses component 100, shielding internal operation of thecomponent 100. Potentially, the input buffers 140 of multiple processing units may share thesame bus 146; likewise for the output buffers 140, 148. Of course, a variety of other communication schemes may be implemented such as a single shared bus instead of dual-buses or dedicated connections between the sharedmemory logic 104 and the processing units 106-112. - Generally, each processing unit is affected by at least two commands received by the shared memory logic 104: (1) a processing unit READ command that transfers data from the shared
memory logic 104 to the processingunit input buffer 142; and (2) a processing unit WRITE command that transfers data from theoutput buffer 140 of the processing unit to the sharedmemory logic 104. Both commands can identify the target processing unit and the data being transferred. The uniformity of these instructions across different processing units can easecomponent 100 programming. In the specific implementation shown, a processing unit READ instruction causes a data push from shared memory to a respective target processing unit's 106-112input buffer 142 viabus 146, while a processing unit WRITE instruction causes a data pull from a target processing unit's 106-112output buffer 140 into shared memory viabus 148. Thus, to process data, acore 102 program may issue a command to first push data to the processing unit and later issue a command to pull the results written into the processing unit'soutput buffer 144. Of course, a wide variety of other inter-component 100 communication schemes may be used. -
FIG. 5 depicts sharedmemory logic 104 of the sample implementation. As shown, thelogic 104 includes a READ queue and a WRITE queue for each processing unit (labeled “PU”). Commands to transfer data to/from the banks of shared memory (banks a-n) are received at aninlet queue 180 and sorted into the queues 170-171 based on the target processing unit and the type of command (e.g., READ or WRITE). In addition to commands targeting processing units, thelogic 104 also permits cores external to the component (e.g., cores 118-122) to READ (e.g., pull) or WRITE (e.g., push) data from/to the memory banks and features an additional pair of queues (labeled “cores”) for these commands. Arbiters 182-184 dequeue commands from the queues 170-171. For example, each arbiter 182-184 may use a round robin or other servicing scheme. The arbiters 182-184 forward the commands to another queue 172-178 based on the type of command. For example, commands pushing data to an external core are enqueued inqueue 176 while commands pulling data from an external core enqueued inqueue 172. Similarly, commands pushing data to a processing unit are enqueued inqueue 178 while commands pulling data from a processing unit are enqueued inqueue 174. When a command reaches the head of a queue, thelogic 104 initiates a transfer of data/to from the memory banks to the processingunit using buses component 100 to the cores 118-122. Thelogic 104 also includes circuitry to permit transfer (push and pulls) of data between the memory banks and theexternal RAM 114. - The
logic 104 shown inFIG. 5 is merely an example, and a wide variety of other architectures may be used. For example, an implementation need not sort the commands into per processing unit queues, although this queuing can ensure fairness among request. Additionally, the architecture reflected inFIG. 5 could be turned on its head. That is, instead of thelogic 104 receiving commands that deliver and retrieve data to/from the memory banks, commands may be routed to the processing units which in turn issue requests to access the shared memory banks. - Many cryptographic protocols, such as public-key exchange protocols, require modular multiplication (e.g., [A×B] mod m) and/or modular exponentiation (e.g., Aˆexponent mod m) of very large numbers. While computationally expensive, these operations are critical to many secure protocols such as a Diffie-Helman exchange, DSA signatures, RSA signatures, and RSA encryption/decryption.
FIG. 6 depicts adedicated hardware multiplier 156 coupled to multiple processing units 150-154. The processing units 150-154 can send data (e.g., a pair of variable length multi-word vector operands) to themultiplier 156 and can consume the results. To multiply very large numbers, the processing units 150-154 can decompose a multiplication into a set of smaller partial products that can be more efficiently performed by themultiplier 156. For example, multiplication of two 1024-bit operands can be computed as four sets of 512-bit×512 bit multiplications or sixteen sets of 256-bit×256-bit multiplications. - The most efficient use of the
multiplier 156 may vary depending on the problem at hand (e.g., the size of the operands). To provide flexibility in how the processing units 150-154 use themultiplier 156, the processing units 150-154 shown inFIG. 6 may be programmable. The programs may be dynamically downloaded to the processing units 150-154, along with data to operate on, from the sharedmemory logic 104 viainterface 158. The program selected for download to a given processing unit 150-154 can change in accordance with the problem assigned to the processing unit 150-154 (e.g., a particular protocol and/or operand size). The programmability of the units 150-154permits component 100 operation to change as new security protocols, algorithms, and implementations are introduced. In addition, a programmer can carefully tailor processing unit 150-154 operation based on the specific algorithm and operand size required by a protocol. Since the processing units 150-154 can be dynamically reprogrammed on the fly (during operation of the component 100), the same processing units 150-154 can be used to perform operations for different protocols/protocol options by simply downloading the appropriate software instructions. - As described above, each processing unit 150-154 may feature an input buffer and an output buffer (see
FIG. 4 ) to communicate with sharedmemory logic 104. Themultiplier 156 and processing units 150-154 may communicate using these buffers. For example, a processing unit 150-154 may store operands to multiply in a pair of output queues in the output buffer for consumption by themultiplier 156. Themultiplier 156 results may be then transferred to the processing unit 150-154 upon completion. The same processing unit 150-154 input and output buffers may also be used to communicate with sharedmemory logic 104. For example, the input buffer of a processing unit 150-154 may receive program instructions and operands from sharedmemory logic 104. The processing unit 150-154 may similarly store the results of program execution in an output buffer for transfer to the sharedmemory logic 104 upon completion of program execution. - To coordinate these different uses of a processing unit's input/output buffers, the processing units 150-154 provide multiple modes of operation that can be selected by program instructions executed by the processing units. For example, in “I/O” mode, the buffers of programming unit 150-154 exclusively exchange data with shared
memory logic unit 104 viainterface 158. In “run” mode, the buffers of the unit 150-154 exclusively exchange data withmultiplier 156 instead. Additional processing unit logic (not shown), may interact with theinterface 158 and themultiplier 156 to indicate the processing unit's current mode. - As an example, in operation, a core may issue a command to shared
memory logic 104 specifying a program to download to a target processing unit and data to be processed. The sharedmemory logic 104, in turn, sends a signal, viainterface 158, awakening a given processing unit from a “sleep” mode into I/O mode. The input buffer of the processing unit then receives a command from the sharedmemory logic 104 identifying, for example, the size of a program being downloaded, initial conditions, the starting address of the program instructions in shared memory, and program variable values. To avoid unnecessary loading of program code, if the program size is specified as zero, the previously loaded program will be executed. This optimizes initialization of a processing unit when requested to perform the same operation in succession. - After loading the program instructions, setting the variables and initial conditions to the specified values, an instruction in the downloaded program changes the mode of the processing unit from I/O mode to run mode. The processing unit can then write operands to multiply to its output buffers and receive delivery of the
multiplier 156 results in its input buffer. Eventually, the program instructions write the final result into the output buffer of the processing unit and change the mode of the processing back to I/O mode. The final results are then transferred from the unit's output buffer to the sharedmemory logic 104 and the unit returns to sleep mode. -
FIG. 7 depicts a sample implementation of aprogrammable processing unit 150. As shown, theprocessing unit 150 includes anarithmetic logic unit 216 that performs operations such as addition, subtraction, and logical operations such as boolean AND-ing and OR-ing of vectors. Thearithmetic logic unit 216 is coupled to, and can operate on, operands stored indifferent memory resources processing unit 150. For example, as shown, thearithmetic logic unit 216 can operate on operands provided by a memory divided into a pair ofdata banks data bank arithmetic logic unit 216. As described above, thearithmetic logic unit 216 is also coupled to and can operate on operands stored in input queue 220 (e.g., data transferred to theprocessing unit 150, for example, from the multiplier or shared memory logic 104). The size of operands used by thearithmetic logic unit 216 to perform a given operation can vary and can be specified by program instructions. - As shown, the
arithmetic logic unit 216 may be coupled to ashifter 218 that can programmatically shift thearithmetic logic unit 216 output. The resulting output of thearithmetic logic unit 216/shifter 218 can be “re-circulated” back into adata bank arithmetic logic unit 216/shifter 218 can be written to anoutput buffer 222 divided into two parallel queues. Again, theoutput queues 222 can store respective sets of multiplication operands to be sent to themultiplier 156 or can store the final results of program execution to be transferred to shared memory. - The components described above form a cyclic datapath. That is, operands flow from the
input buffer 220,data banks arithmetic logic unit 216 and either back into thedata banks control store 204 and executed bycontrol logic 206. Thecontrol logic 206 has a store ofglobal variables 208 and a set of variable references 202 (e.g., pointers) into data stored indata banks - A sample instruction set that can be implemented by
control logic 206 is described in the attached Appendix A. Other implementations may vary in instruction operation and syntax. - Generally, the
control logic 206 includes instructions (“setup” instructions) to assign variable values, instructions (“exec” and “fexec” instrucions) to perform mathematical and logical operations, and control flow instructions such as procedure calls and conditional branching instructions. The conditional branching instructions can operate on a variety of condition codes generated by thearithmetic logic unit 216/shifter 218 such as carry, msb (if the most significant bit=1), lsb (if the least significant bit=1), negative, zero (if the last quadword=0), and zero_vector (if the entire operand=0). Additionally, theprocessing unit 150 provides a set of user accessible bits that can be used as conditions for conditional instructions. - The
control logic 206 includes instructions that cause data to move along theprocessing unit 150 datapath. For example,FIG. 8 depicts the sample operation of a “FIFO” instruction that, when the processing unit is in “run” mode, pops data from theinput queue 220 for storage in a specifieddata bank input queue 220 to thecontrol store 204. -
FIG. 9 depicts sample operation of an “EXEC” instruction that supplies operands to thearithmetic logic unit 216. In the example shown, the source operands are supplied bydata banks output queue 222. As shown inFIG. 10 , an EXEC instruction can alternately store results back into one of thedata banks 212, 214 (in the case shown, bank B 214). -
FIG. 11 depicts sample operation of an “FEXEC” (FIFO EXEC) instruction that combines aspects of the FIFO and EXEC instructions. Like an EXEC instruction, an FEXEC instruction supplies operands to thearithmetic logic unit 216. However, instead of operands being supplied exclusively by thedata banks input queue 222. - Potentially, different ones of the datapath instructions can be concurrently operating on the datapath. For example, as shown in
FIG. 12 , an EXEC instruction may follow a FIFO instruction during the execution of a program. While these instructions may take multiple cycles to complete, assuming the instructions do not access overlapping portions of thedata banks control logic 206 may issue the EXEC instruction before the FIFO instruction completes. To ensure that the concurrent operation does not deviate from the results of in-order operation, thecontrol logic 206 may determine whether concurrent operation would destroy data coherency. For example, if the preceding FIFO instruction writes data to a portion of data bank A that sources an operand in the subsequent EXEC instruction, thecontrol logic 206 awaits writing of the data by the FIFO instruction into the overlapping data bank portion before starting operation of the EXEC instruction on the datapath. - In addition to concurrent operation of multiple datapath instructions, the
control logic 206 may execute other instructions concurrently with operations caused by datapath instructions. For example, thecontrol logic 206 may execute control flow logic instructions (e.g., a conditional branch) and variable assignment instructions before previously initiated datapath operations complete. More specifically, in the implementation shown, FIFO instructions may issue concurrently with any branch instruction or any setup instruction except a mode instruction. FIFO instructions may issue concurrently with any execute instruction provided the destination banks for both are mutually exclusive. FEXEC and EXEC instructions may issue concurrently with any mode instructions and instructions that do not rely on the existence of particular condition states. EXEC instructions, however, may not issue concurrently with FEXEC instructions. - The
processing unit 150 provides a number of features that can ease the task of programming cryptographic operations. For example, programs implementing many algorithms can benefit from recursion or other nested execution of subroutines or functions. As shown inFIG. 13 , the processing unit may maintain different scopes 250-256 of variables and conditions that correspond to different depths of nested subroutine/function execution. The control logic uses one of the scopes 250-256 as the current scope. For example, the current scope inFIG. 13 isscope 252. While a program executes, the variable and condition values specified by this scope are used by thecontrol logic 206. For example, a reference to variable “A0” by an instruction would be associated with A0 of thecurrent scope 252. Thecontrol logic 206 can automatically increment or decrement the scope index in response to procedure calls (e.g., subroutine calls, function calls, or method invocations) and procedure exits (e.g., returns), respectively. For example, upon a procedure call, the current scope may advance toscope 254 before returning toscope 252 after a procedure return. - As shown, each scope 250-256 features a set of pointers into data banks A and
B - In addition to providing access to data in the current scope, the processing unit instruction set also provides instructions (e.g., “set scope <target scope>”) that provide explicit access to scope variables in a target scope other than the current scope. For example, a program may initially setup, in advance, the diminishing scales associated with an ensuing set of recursive/nested subroutine calls. In general, the instruction set includes an instruction to set each of the scope fields. In addition, the instruction set includes an instruction (e.g., “copy_scope”) to copy an entire set of scope values from the current scope to a target scope. Additionally, the instruction set includes instructions to permit scope values to be computed based on the values included in a different scope (e.g., “set variable relative”).
- In addition to the scope support described above, the
processing unit 150 also can include logic to reduce the burden of exponentiation. As described above, many cryptographic operations require exponentiation of large numbers. For example,FIG. 14 depicts anexponent 254 raising some number, g, to the 6,015,455,113-th power. To raise a number to thislarge exponent 254, many algorithms reduce the operation to a series of simpler mathematical operations. For example, an algorithm can process theexponent 254 as a bit string and proceeding bit-by-bit from left to right (most-significant-bit to least-significant-bit). For example, starting with an initial value of “1”, the algorithm can square the value for each “0” encountered in the bit string. For each “1” encountered in the bit string, the algorithm can square the value and multiply by g. For example, to determine the value of 2ˆ9, the algorithm would operate on the binary exponent of 1001b as follows:value initialization 1 exponent bit 1 - 1 1{circumflex over ( )}2 * 2 = 2 bit 2 - 0 2{circumflex over ( )}2 = 4 bit 3 - 0 4{circumflex over ( )}2 = 16 bit 4 - 1 16{circumflex over ( )}2 * 2 = 512 - To reduce the computational demands of this algorithm, an exponent can be searched for windows of bits that correspond to pre-computed values. For example, in the trivially small example of 2ˆ9, a bit pattern of “10” corresponds to gˆ2 (4). Thus, identifying the “10” window value in exponent “1001” enables the algorithm to simply square the value for each bit within the window and multiply by the pre-computed value. Thus, an algorithm using windows could proceed:
value initialization 1 exponent bit 1 - 1 1{circumflex over ( )}2 = 1 bit 2 - 0 1{circumflex over ( )}2 = 1 window “10” value 1 * 4 = 4 bit 3 - 0 4{circumflex over ( )}2 = 16 bit 4 - 1 16{circumflex over ( )}2 * 2 = 512 - Generally, this technique reduces the number multiplications needed to perform an exponentiation (though not in this trivially small example). Additionally, the same window may appear many times within an
exponent 254 bit string, thus the same pre-computed value can be used. - Potentially, an
exponent 254 may be processed in regularly positioned window segments of N-bits. For example, a first window may be the four most significant bits of exponent 254 (e.g., “0001”), a second window may be the next four most significant bits (e.g., “0110”) and so forth. Instead of regularly occurring windows, however,FIG. 14 depicts a scheme that uses sliding windows. That is, a window of some arbitrary size of N-bits can be found at any point within the exponent rather than aligned on an N-bit boundary. For example,FIG. 14 shows abit string 256 identifying the location of 4-bit windows found withinexponent 254. For example, an exponent window of “1011” is found atlocation 256 a and an exponent window of “1101” is found atlocation 256 b. Upon finding a window, the window bits are zeroed. For example, as shown, a window of “0011” is found atlocation 256 c. Zeroing the exponent bits enables a window of “0001” to be found atlocation 256 d. -
FIG. 15 showslogic 210 used to implement a sliding window scheme. As shown, thelogic 210 includes a set of M register bits (labeledC 4 to C −4) that perform a left shift operation that enableswindowing logic 250 to access M-bits of an exponent string at a time as the exponent bits stream through thelogic 210. Based on the register bits and an identification of awindow size 252, thewindowing logic 250 can identify the location of a window-size pattern of non-zero bits with the exponent. By searching within a set of bits larger than the window-size, thelogic 250 can identify windows irrespective of location within the exponent bit string. Additionally, the greater swath of bits included in the search permits thelogic 250 to select from different potential windows found within the M-bits (e.g., windows with the most number of “1” bits). For example, inFIG. 14 , theexponent 254 begins with bits of “0001”, however this potential window is not selected in favor of the window “1011” using “look-ahead” bits (C−1−C−4). - Upon finding a window of non-zero bits, the
logic 210 can output a “window found” signal identifying the index of the window within the exponent string. Thelogic 210 can also output the pattern of non-zero bits found. This pattern can be used as a lookup key into a table of pre-computed window values. Finally, thelogic 210 zeroes the bits within the window and continues to search for window-sized bit-patterns. - The
logic 210 shown can be included in a processing unit. For example,FIG. 7 depicts thelogic 210 as receiving the output ofshifter 218 which rotates bits of an exponent through thelogic 210. Thelogic 210 is also coupled to controllogic 206. Thecontrol logic 206 can feature instructions that control operation of the windowing logic (e.g., to set the window size and/or select fixed or sliding window operation) and to respond tologic 210 output. For example, thecontrol logic 206 can include a conditional branching instruction that operates on “window found” output of the control logic. For example, a program can branch on a window found condition and use the output index to lookup a pre-computed value for the window. - As described above, the processing units may have access to a
dedicated hardware multiplier 156. Before turning to sample implementation (FIG. 17 ),FIG. 16 illustrates sample operation of a multiplier implementation. InFIG. 16 themultiplier 156 operates on two operands, A 256 andB 258, over a series of clock cycles. As shown, the operands are handled by the multiplier as sets of segments, though the number of segments and/or the segment size for each operand differs. For instance, in the example shown, the N-bits of operand A are divided into 8-segments (0-7) while operand B is divided into 2-segments (0-1). - As shown, the multiplier operates by successively multiplying a segment of operand A with a segment of operand B until all combinations of partial products of the segments are generated. For example, in
cycle 2, the multiplier multipliessegment 0 of operand B (B0) withsegment 0 of operand A (A0) 262 a while incycle 17 2621 the multiplier multipliessegment 1 of operand B (B1) withsegment 7 of operand A (A7). The partial products are shown inFIG. 16 as boxed sets of bits. As shown, based on the respective position of the segments within the operands, the set of bits are shifted with respect to one another. For example, multiplication of the least significant segments of A and B (B0×A0) 262 a results in the least significant set of resulting bits with multiplication of the most significant segments of A and B (B1×A7) 2621 results in the most significant set of resulting bits. The addition of the results of the series of partial products represents the multiplication of operands A 256 andB 258. - Sequencing computation of the series of partial products can incrementally yields bits of the final multiplication result well before the final cycle. For example,
FIG. 16 identifies when bits of a given significance can be retired as arrowed lines spanning the bits. For example, after completing B0×A0 incycle 2, the least significant bits of the final result are known since subsequent partial product results do not affect these bits. Similarly, after completing B0×A1 incycle 3, bits can be retired since onlypartial products cycle 6 and B1×A0 incycle 7 exactly overlap. Thus, no bits are retired incycle 6. -
FIG. 17 shows a sample implementation of amultiplier 156 in greater detail. Themultiplier 156 can process operands as depicted inFIG. 16 . As shown, themultiplier 156 features a set of multipliers 306-312 configured in parallel. While the multipliers may be N-bit×N-bit multipliers, the N-bits may not be a factor of 2. For example, for a 512-bit×512-bit multiplier 156, each multiplier may be a 67-bit×67-bit multiplier. Additionally, themultiplier 156 itself is not restricted to operands that are a power of two. - The
multipliers 156 are supplied segments of the operands in turn, for example, as shown inFIG. 16 . For instance, in a first cycle,segment 0 of operand A is supplied to each multiplier 306-312 while sub-segments d-a ofsegment 0 of operand B are respectively supplied to each multiplier 306-312. That is,multiplier 312 may receivesegment 0 of operand A andsegment 0, sub-segment a of operand B whilemultiplier 310 receivessegment 0 of operand A andsegment 0, sub-segment, b of operand B in a given cycle. - The outputs of the multipliers 306-312 are shifted 314-318 based on the significance of the respective segments within the operands. For example,
shifter 318 shifts the results of Bnb×An 314 with respect to the results ofBna×An 312 to reflect the significance of sub-segment b relative to sub-segment a. - The shifted results are sent to an
accumulator 320. In the example shown, themultiplier 156 uses a carry/save architecture where operations produce a vector that represents the results absent any carries to more significant bit positions and a vector that stores the carries. Addition of the two vectors can be postponed until the final results are needed. WhileFIG. 17 depicts amultiplier 156 that features a carry/save architecture other implementations may use other schemes (e.g., a carry/propagate adder), though a carry/save architecture may be many times more area and power efficient. - As shown, in
FIG. 16 , sequencing of the segment multiplications can result in the output of bits by the multipliers 306-312 that are not affected by subsequent output by the multipliers 306-312. For example, inFIG. 16 , the least significant bits output by the multipliers 306-312 can sent to theaccumulator 320 in cycle-2. Theaccumulator 320 can retire such bits as they are produced. For example, theaccumulator 320 can output retired bits to a pair ofFIFOs multiplier 156 includeslogic accumulator 320 sends the least significant 64-bits to theFIFOs accumulator 320 vectors can be right shifted by 64-bits. As shown, the logic can shift theaccumulator 320 vectors by a variable amount. - As described above, the
FIFOs accumulator 320. TheFIFOs adder 330 that sums the retired portions of carry/save vectors. TheFIFOs adder 330 such that theadder 330 is continuously fed retired portions in each successive cycle until the final multiplier result is output. In other words, as shown inFIG. 16 , not all cycles (e.g., cycle-6) result in retiring bits. WithoutFIFOs adder 330 would stall when these cycles-without-retirement filter down through themultiplier 156. Instead, by filling theFIFOs FIFO FIFOs adder 330. TheFIFOs final multiplier 156 result. Instead theFIFOs adder 330 and large enough to accommodate the burst of retired bits in the final cycles. - The
multiplier 156 acts as a pipeline that propagates data through the multiplier stages in a series of cycles. As shown the multiplier features twoqueues queues queues FIFOs logic accumulator 320 vectors to start multiplication of two new dequeued operands. Additionally, due to the pipeline architecture, the multiplication of two operands may begin before the multiplier receives the entire set of segments in the operands. For example, the multiplier may begin A×B as soon as segments A0 and B0 are received. In such operation, theFIFOs adder 330 for a given pair of operands but can also smooth output of theadder 330 across different sets of operands. For example, after an initial delay as the pipeline fills, themultiplier 156 may output portions of the final multiplication results for multiple multiplication problems with each successive cycle. That is, after the cycle outputting the most significant bits of A×B, the least significant bits of C×D are output. - The
multiplier 156 can obtain operands, for example, by receiving data from the processing unit output buffers. To determine which processing unit to service, the multiplier may feature an arbiter (not shown). For example, the arbiter may poll each processing unit in turn to determine whether a given processing unit has a multiplication to perform. To ensuremultiplier 156 cycles are not wasted, the arbiter may determine whether a given processing unit has enqueued a sufficient amount of the operands and whether the processing unit has sufficient space in its input buffer to hold the results before selecting the processing unit for service. - The
multiplier 156 is controlled by a state machine (not shown) that performs selection of the segments to supply to the multipliers, controls shifting, initiates FIFO dequeuing, and so forth. - Potentially, a given processing unit may decompose a given algorithm into a series of multiplications. To enable a processing unit to quickly complete a series of operations without interruption from other processing units competing for use of the
multiplier 156, the arbiter may detect a signal provided by the processing unit that signals the arbiter to continue servicing additional sets of operands provided by the processing unit currently being serviced by the multiplier. In the absence of such a signal, the arbiter resumes servicing of the other processing units for example by resuming round-robin polling of the processing units. - Though the description above described a variety of processing units, a wide variety of processing units may be included in the
component 100. For example,FIG. 18 depicts an example of a “bulk” processing unit. As shown, the unit includes an endian swapper to change data between big-endian and little-endian representations. The bulk processing unit also includes logic to perform CRC (Cyclic Redundancy Check) operations on data as specified by a programmable generator polynomial. -
FIG. 19 depicts an example of an authentication/hash processing unit. As shown the unit stores data (“common authentication data structures”) that are used for message authentication that are shared among the different authentication algorithms (e.g., configuration and state registers). The unit also includes dedicated hardware logic responsible for the data processing for each algorithm supported (e.g., MD5 logic, SHA logic, AES logic, and Kasumi logic). The overall operation of the unit is controlled by control logic and a finite state machine (FSM). The FSM controls the loading and unloading of data in the authentication data buffer, tracks the amount of data in the data buffer, sends a start signal to the appropriate authentication core, controls the source of data that gets loaded into the data buffer, and sends information to padding logic to help determine padding data. -
FIG. 20 depicts an example of a cipher processing unit. The unit can perform encryption and decryption, among other tasks, for a variety of different cryptographic algorithms. As shown, the unit includes registers to store state information including a configuration register (labeled “config”), counter register (labeled “ctr”), key register, parameter register, RC4 state register, and IV (Initial Vector) register. The unit also includes multiplexors and XOR gates to support CBC (Cipher Block Chaining), F8, and CTR (Counter) modes. The unit also includes dedicated hardware logic for multiple ciphers that include the logic responsible for the algorithms supported (e.g., AES logic, 3DES logic, Kasumi logic, and RC4 logic). The unit also includes control logic and a state machine. The logic block is responsible for controlling the overall behavior of the cipher unit including enabling the appropriate datapath depending on the mode the cipher unit is in (e.g., in encryption CBC mode, the appropriate IV is chosen to generate the encrypt IV while the decrypt IV is set to 0), selecting the appropriate inputs into the cipher cores throughout the duration of cipher processing (e.g., the IV, the counter, and the key to be used), and generating control signals that determine what data to send to the output datapath based on the command issued by thecore 102. This block also initiates and generates the necessary control signals for RC4 key expansion and AES key conversion. - The processing units shown in
FIGS. 18-20 are merely examples of different types of processing units and the component may feature many different types of units other than those shown. For example, the component may include a unit to perform pseudo random number generation, a unit to perform Reed-Solomon coding, and so forth. - The techniques describe above can be implemented in a variety of ways and in different environments. For example, the techniques may be integrated within a network processor. As an example,
FIG. 21 depicts an example ofnetwork processor 400 that can be programmed to process packets. Thenetwork processor 400 shown is an Intel® Internet eXchange network Processor (IXP). Other processors feature different designs. - The
network processor 400 shown features a collection ofprogrammable processing cores 402 on a single integrated semiconductor die 400. Eachcore 402 may be a Reduced Instruction Set Computer (RISC) processor tailored for packet processing. For example, thecores 402 may not provide floating point or integer division instructions commonly provided by the instruction sets of general purpose processors.Individual cores 402 may provide multiple threads of execution. For example, acore 402 may store multiple program counters and other context data for different threads. - As shown, the
network processor 400 also features aninterface 420 that can carry packets between theprocessor 400 and other network components. For example, theprocessor 400 can feature a switch fabric interface 420 (e.g., a Common Switch Interface (CSIX)) that enables theprocessor 400 to transmit a packet to other processor(s) or circuitry connected to a switch fabric. Theprocessor 400 can also feature an interface 420 (e.g., a System Packet Interface (SPI) interface) that enables theprocessor 400 to communicate with physical layer (PHY) and/or link layer devices (e.g., MAC or framer devices). Theprocessor 400 may also include an interface 404 (e.g., a Peripheral Component Interconnect (PCI) bus interface) for communicating, for example, with a host or other network processors. - As shown, the
processor 400 includes other resources shared by thecores 402 such as thecryptography component 100, internal scratchpad memory, andmemory controllers network processor 400 also includes a general purpose processor 406 (e.g., a StrongARM® XScale® or Intel Architecture core) that is often programmed to perform “control plane” or “slow path” tasks involved in network operations while thecores 402 are often programmed to perform “data plane” or “fast path” tasks. - The
cores 402 may communicate withother cores 402 via the shared resources (e.g., by writing data to external memory or the scratchpad 408). Thecores 402 may also intercommunicate via neighbor registers directly wired to adjacent core(s) 402. Thecores 402 may also communicate via a CAP (CSR (Control Status Register) Access Proxy) 410 unit that routes data betweencores 402. -
FIG. 22 depicts asample core 402 in greater detail. Thecore 402 architecture shown inFIG. 22 may also be used in implementing thecore 102 shown inFIG. 1 . As shown thecore 402 includes aninstruction store 512 to store program instructions. Thecore 402 may include an ALU (Arithmetic Logic Unit), Content Addressable Memory (CAM), shifter, and/or other hardware to perform other operations. Thecore 402 includes a variety of memory resources such aslocal memory 502 and general purpose registers 504. Thecore 402 shown also includes read and writetransfer registers core 402 also includes next neighbor registers 506, 516 that store information being directly sent to/received fromother cores 402. The data stored in the different memory resources may be used as operands in the instructions. As shown, thecore 402 also includes a commands queue 524 that buffers commands (e.g., memory access commands) being sent to targets external to the core. - To interact with the
cryptography component 100, threads executing on thecore 402 may send commands via the commands queue 524. These commands may identify transfer registers within thecore 402 as the destination for command results (e.g., a completion message and/or the location of encrypted data in memory). In addition, thecore 402 may feature an instruction set to reduce idle core cycles while waiting, for example for completion of a request by thecryptography component 100. For example, thecore 402 may provide a ctx_arb (context arbitration) instruction that enables a thread to swap out of execution until receiving a signal associated withcomponent 100 completion of an operation. -
FIG. 23 depicts a network device that can process packets using a cryptography component. As shown, the device features a collection of blades 608-620 holding integrated circuitry interconnected by a switch fabric 610 (e.g., a crossbar or shared memory switch fabric). As shown the device features a variety of blades performing different operations such as I/O blades 608 a-608 n, data plane switch blades 618 a-618 b, trunk blades 612 a-612 b, control plane blades 614 a-614 n, and service blades. The switch fabric, for example, may conform to CSIX or other fabric technologies such as HyperTransport, Infiniband, PCI, Packet-Over-SONET, RapidIO, and/or UTOPIA (Universal Test and Operations PHY Interface for ATM). - Individual blades (e.g., 608 a) may include one or more physical layer (PHY) devices (not shown) (e.g., optic, wire, and wireless PHYs) that handle communication over network connections. The PHYs translate between the physical signals carried by different network mediums and the bits (e.g., “0”-s and “1”-s) used by digital systems. The line cards 608-620 may also include framer devices (e.g., Ethernet, Synchronous Optic Network (SONET), High-Level Data Link (HDLC) framers or other “
layer 2” devices) 602 that can perform operations on frames such as error detection and/or correction. Theblades 608 a shown may also include one ormore network processors switch fabric 610, to a blade providing an egress interface to forward the packet. Potentially, the network processor(s) 606 may perform “layer 2” duties instead of theframer devices 602. Thenetwork processors - While
FIGS. 21-23 described specific examples of a network processor and a device incorporating network processors, the techniques may be implemented in a variety of architectures including general purpose processors, network processors and network devices having designs other than those shown. Additionally, the techniques may be used in a wide variety of network devices (e.g., a router, switch, bridge, hub, traffic generator, and so forth). Further, many of the techniques described above may be found in components other than components to perform cryptographic operations. - The term circuitry as used herein includes hardwired circuitry, digital circuitry, analog circuitry, programmable circuitry, and so forth. The programmable circuitry may operate on computer programs disposed on a computer readable medium.
- Other embodiments are within the scope of the following claims.
Claims (17)
1. A processor component, comprising:
a set of register bits to perform a shift register operation;
window detection logic to detect a window of bits in the set of register bits, the window of bits having at least one non-zero bit and having a number of bits fewer than the number of register bits in the set of register bits; and
in response to detecting the window, outputting, at least, the detected window of bits.
2. The processor component of claim 1 ,
further comprising, in response to detecting the window, zeroing the non-zero bits in the set of register bits corresponding to the detected window.
3. The processor component of claim 1 ,
further comprising, outputting a signal indicating a window was found in the set of register bits.
4. The processor component of claim 3 ,
wherein the processor comprises a processor having a branch instruction that operates on the signal.
5. The processor component of claim 1 ,
further comprising logic to receive a window size identifying a bit-width of the window.
6. A method, comprising:
repeatedly shifting bits of an exponent into a set of register bits;
detecting windows of non-zero bits in the set of register bits, the windows having a number of bits fewer than the number of register bits in the set of register bits; and
performing lookups of the detected windows of non-zero bits in a set of pre-computed values.
7. The method of claim 6 ,
further comprising zeroing the non-zero bits in the set of register bits corresponding to the windows after detecting the windows.
8. The method of claim 6 ,
further comprising executing a branch instruction that conditionally branches based on a signal indicating a window has been detected.
9. The method of claim 6 ,
further comprising specifying a window size.
10. The method of claim 6 ,
further comprising shifting out bits from the set of register bits; and
for non-zero bit values shifted out, squaring a value used in an exponentiation operation associated with the exponent.
11. The method of claim 6 ,
further comprising multiplying a value used in an exponentiation operation associated with the exponent using a pre-computed value in the set of pre-computed values identified by a detected window.
12. A computer program product, disposed on a computer readable medium, comprising instructions for causing a processor to:
repeatedly shift bits of an exponent into a set of register bits;
detect windows of non-zero bits in the set of register bits, the windows having a number of bits fewer than the number of register bits in the set of register bits; and
perform lookups of the detected windows of non-zero bits in a set of pre-computed values.
13. The computer program of claim 12 ,
further comprising causing the processor to zero the non-zero bits in the set of register bits corresponding to the windows after detecting the windows.
14. The computer program of claim 12 ,
wherein the instructions comprise a branch instruction that conditionally branches based on a signal indicating a window has been detected.
15. The computer program of claim 12 ,
further comprising instructions to specify a window size.
16. The computer program of claim 12 ,
wherein the instructions cause the processor to:
shift out bits from set of register bits; and
for non-zero bit values shifted out, square a value used in an exponentiation operation associated with the exponent.
17. The computer program of claim 12 ,
wherein the instructions comprise instructions to multiply a value used in exponentiation operation associated with the exponent using a pre-computed value in the set of pre-computed values identified by a detected window.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/647,892 US20070192626A1 (en) | 2005-12-30 | 2006-12-28 | Exponent windowing |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/323,329 US20070157030A1 (en) | 2005-12-30 | 2005-12-30 | Cryptographic system component |
US11/647,892 US20070192626A1 (en) | 2005-12-30 | 2006-12-28 | Exponent windowing |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/323,329 Continuation US20070157030A1 (en) | 2005-12-30 | 2005-12-30 | Cryptographic system component |
Publications (1)
Publication Number | Publication Date |
---|---|
US20070192626A1 true US20070192626A1 (en) | 2007-08-16 |
Family
ID=38226059
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/323,329 Abandoned US20070157030A1 (en) | 2005-12-30 | 2005-12-30 | Cryptographic system component |
US11/354,670 Expired - Fee Related US7475229B2 (en) | 2005-12-30 | 2006-02-14 | Executing instruction for processing by ALU accessing different scope of variables using scope index automatically changed upon procedure call and exit |
US11/647,892 Abandoned US20070192626A1 (en) | 2005-12-30 | 2006-12-28 | Exponent windowing |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/323,329 Abandoned US20070157030A1 (en) | 2005-12-30 | 2005-12-30 | Cryptographic system component |
US11/354,670 Expired - Fee Related US7475229B2 (en) | 2005-12-30 | 2006-02-14 | Executing instruction for processing by ALU accessing different scope of variables using scope index automatically changed upon procedure call and exit |
Country Status (1)
Country | Link |
---|---|
US (3) | US20070157030A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100169570A1 (en) * | 2008-12-31 | 2010-07-01 | Michael Mesnier | Providing differentiated I/O services within a hardware storage controller |
US20120096281A1 (en) * | 2008-12-31 | 2012-04-19 | Eszenyi Mathew S | Selective storage encryption |
US8667254B1 (en) * | 2008-05-15 | 2014-03-04 | Xilinx, Inc. | Method and apparatus for processing data in an embedded system |
US10503654B2 (en) | 2016-09-01 | 2019-12-10 | Intel Corporation | Selective caching of erasure coded fragments in a distributed storage system |
CN111586076A (en) * | 2020-05-26 | 2020-08-25 | 清华大学 | Remote control and telemetry information tamper-proof encryption and decryption method and system based on mixed password |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070157030A1 (en) * | 2005-12-30 | 2007-07-05 | Feghali Wajdi K | Cryptographic system component |
US7725512B1 (en) * | 2006-04-26 | 2010-05-25 | Altera Corporation | Apparatus and method for performing multiple exclusive or operations using multiplication circuitry |
US8229109B2 (en) * | 2006-06-27 | 2012-07-24 | Intel Corporation | Modular reduction using folding |
US8046775B2 (en) * | 2006-08-14 | 2011-10-25 | Marvell World Trade Ltd. | Event-based bandwidth allocation mode switching method and apparatus |
US7941643B2 (en) * | 2006-08-14 | 2011-05-10 | Marvell World Trade Ltd. | Multi-thread processor with multiple program counters |
US8464069B2 (en) * | 2007-02-05 | 2013-06-11 | Freescale Semiconductors, Inc. | Secure data access methods and apparatus |
US8261049B1 (en) | 2007-04-10 | 2012-09-04 | Marvell International Ltd. | Determinative branch prediction indexing |
US8689078B2 (en) | 2007-07-13 | 2014-04-01 | Intel Corporation | Determining a message residue |
US8042025B2 (en) * | 2007-12-18 | 2011-10-18 | Intel Corporation | Determining a message residue |
US7886214B2 (en) * | 2007-12-18 | 2011-02-08 | Intel Corporation | Determining a message residue |
US8189792B2 (en) * | 2007-12-28 | 2012-05-29 | Intel Corporation | Method and apparatus for performing cryptographic operations |
US9191211B2 (en) * | 2009-02-27 | 2015-11-17 | Atmel Corporation | Data security system |
US9990201B2 (en) * | 2009-12-22 | 2018-06-05 | Intel Corporation | Multiplication instruction for which execution completes without writing a carry flag |
US9544133B2 (en) * | 2009-12-26 | 2017-01-10 | Intel Corporation | On-the-fly key generation for encryption and decryption |
CN103502935B (en) * | 2011-04-01 | 2016-10-12 | 英特尔公司 | The friendly instruction format of vector and execution thereof |
US10157061B2 (en) | 2011-12-22 | 2018-12-18 | Intel Corporation | Instructions for storing in general purpose registers one of two scalar constants based on the contents of vector write masks |
US9329936B2 (en) | 2012-12-31 | 2016-05-03 | Intel Corporation | Redundant execution for reliability in a super FMA ALU |
FR3035241B1 (en) * | 2015-04-16 | 2017-12-22 | Inside Secure | METHOD OF SHARING A MEMORY BETWEEN AT LEAST TWO FUNCTIONAL ENTITIES |
US10963295B2 (en) * | 2017-09-08 | 2021-03-30 | Oracle International Corporation | Hardware accelerated data processing operations for storage data |
US11943367B1 (en) * | 2020-05-19 | 2024-03-26 | Marvell Asia Pte, Ltd. | Generic cryptography wrapper |
CN113419869B (en) | 2021-08-25 | 2021-12-03 | 苏州浪潮智能科技有限公司 | Method, device and equipment for generating out-of-order data and storage medium |
Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6092158A (en) * | 1997-06-13 | 2000-07-18 | Intel Corporation | Method and apparatus for arbitrating between command streams |
US6209087B1 (en) * | 1998-06-15 | 2001-03-27 | Cisco Technology, Inc. | Data processor with multiple compare extension instruction |
US6209098B1 (en) * | 1996-10-25 | 2001-03-27 | Intel Corporation | Circuit and method for ensuring interconnect security with a multi-chip integrated circuit package |
US6282290B1 (en) * | 1997-03-28 | 2001-08-28 | Mykotronx, Inc. | High speed modular exponentiator |
US6298411B1 (en) * | 1999-01-05 | 2001-10-02 | Compaq Computer Corporation | Method and apparatus to share instruction images in a virtual cache |
US20020027988A1 (en) * | 1998-08-26 | 2002-03-07 | Roy Callum | Cryptographic accelerator |
US20020108048A1 (en) * | 2000-12-13 | 2002-08-08 | Broadcom Corporation | Methods and apparatus for implementing a cryptography engine |
US20030084309A1 (en) * | 2001-10-22 | 2003-05-01 | Sun Microsystems, Inc. | Stream processor with cryptographic co-processor |
US6567832B1 (en) * | 1999-03-15 | 2003-05-20 | Matsushita Electric Industrial Co., Ltd. | Device, method, and storage medium for exponentiation and elliptic curve exponentiation |
US20030123120A1 (en) * | 2001-12-31 | 2003-07-03 | Hewlett Gregory J. | Pulse width modulation sequence generation |
US20030174699A1 (en) * | 2002-03-12 | 2003-09-18 | Van Asten Kizito Gysbertus Antonius | High-speed packet memory |
US20040039928A1 (en) * | 2000-12-13 | 2004-02-26 | Astrid Elbe | Cryptographic processor |
US20040083354A1 (en) * | 2002-10-24 | 2004-04-29 | Kunze Aaron R. | Processor programming |
US6745220B1 (en) * | 2000-11-21 | 2004-06-01 | Matsushita Electric Industrial Co., Ltd. | Efficient exponentiation method and apparatus |
US6748412B2 (en) * | 2001-09-26 | 2004-06-08 | Intel Corporation | Square-and-multiply exponent processor |
US20040133788A1 (en) * | 2003-01-07 | 2004-07-08 | Perkins Gregory M. | Multi-precision exponentiation method and apparatus |
US20040225885A1 (en) * | 2003-05-05 | 2004-11-11 | Sun Microsystems, Inc | Methods and systems for efficiently integrating a cryptographic co-processor |
US20040230813A1 (en) * | 2003-05-12 | 2004-11-18 | International Business Machines Corporation | Cryptographic coprocessor on a general purpose microprocessor |
US6850999B1 (en) * | 2002-11-27 | 2005-02-01 | Cisco Technology, Inc. | Coherency coverage of data across multiple packets varying in sizes |
US6854044B1 (en) * | 2002-12-10 | 2005-02-08 | Altera Corporation | Byte alignment circuitry |
US20050141715A1 (en) * | 2003-12-29 | 2005-06-30 | Sydir Jaroslaw J. | Method and apparatus for scheduling the processing of commands for execution by cryptographic algorithm cores in a programmable network processor |
US6968354B2 (en) * | 2001-03-05 | 2005-11-22 | Hitachi, Ltd. | Tamper-resistant modular multiplication method |
US20050278502A1 (en) * | 2003-03-28 | 2005-12-15 | Hundley Douglas E | Method and apparatus for chaining multiple independent hardware acceleration operations |
US20070130445A1 (en) * | 2005-12-05 | 2007-06-07 | Intel Corporation | Heterogeneous multi-core processor having dedicated connections between processor cores |
US20070157030A1 (en) * | 2005-12-30 | 2007-07-05 | Feghali Wajdi K | Cryptographic system component |
US20090091826A1 (en) * | 2005-07-19 | 2009-04-09 | Nitto Denko Corporation | Polarizing plate and image display apparatus |
Family Cites Families (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5075840A (en) * | 1989-01-13 | 1991-12-24 | International Business Machines Corporation | Tightly coupled multiprocessor instruction synchronization |
US5291611A (en) * | 1991-04-23 | 1994-03-01 | The United States Of America As Represented By The Secretary Of The Navy | Modular signal processing unit |
US5983004A (en) * | 1991-09-20 | 1999-11-09 | Shaw; Venson M. | Computer, memory, telephone, communications, and transportation system and methods |
JPH08185320A (en) * | 1994-12-28 | 1996-07-16 | Mitsubishi Electric Corp | Semiconductor integrated circuit |
US6356636B1 (en) * | 1998-07-22 | 2002-03-12 | Motorola, Inc. | Circuit and method for fast modular multiplication |
US6442715B1 (en) * | 1998-11-05 | 2002-08-27 | Stmicroelectrics N.V. | Look-ahead reallocation disk drive defect management |
US6442751B1 (en) * | 1998-12-14 | 2002-08-27 | International Business Machines Corporation | Determination of local variable type and precision in the presence of subroutines |
US6397241B1 (en) * | 1998-12-18 | 2002-05-28 | Motorola, Inc. | Multiplier cell and method of computing |
US6668317B1 (en) * | 1999-08-31 | 2003-12-23 | Intel Corporation | Microengine for parallel processor architecture |
US6983350B1 (en) * | 1999-08-31 | 2006-01-03 | Intel Corporation | SDRAM controller for parallel processor architecture |
US6427196B1 (en) * | 1999-08-31 | 2002-07-30 | Intel Corporation | SRAM controller for parallel processor architecture including address and command queue and arbiter |
US6606704B1 (en) * | 1999-08-31 | 2003-08-12 | Intel Corporation | Parallel multithreaded processor with plural microengines executing multiple threads each microengine having loadable microcode |
US6532509B1 (en) * | 1999-12-22 | 2003-03-11 | Intel Corporation | Arbitrating command requests in a parallel multi-threaded processing system |
US6307789B1 (en) * | 1999-12-28 | 2001-10-23 | Intel Corporation | Scratchpad memory |
US6625654B1 (en) * | 1999-12-28 | 2003-09-23 | Intel Corporation | Thread signaling in multi-threaded network processor |
US6463072B1 (en) * | 1999-12-28 | 2002-10-08 | Intel Corporation | Method and apparatus for sharing access to a bus |
US6584522B1 (en) * | 1999-12-30 | 2003-06-24 | Intel Corporation | Communication between processors |
US6631462B1 (en) * | 2000-01-05 | 2003-10-07 | Intel Corporation | Memory shared between processing threads |
US7240204B1 (en) | 2000-03-31 | 2007-07-03 | State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of Oregon State University | Scalable and unified multiplication methods and apparatus |
US7681018B2 (en) * | 2000-08-31 | 2010-03-16 | Intel Corporation | Method and apparatus for providing large register address space while maximizing cycletime performance for a multi-threaded register file set |
US20020091826A1 (en) * | 2000-10-13 | 2002-07-11 | Guillaume Comeau | Method and apparatus for interprocessor communication and peripheral sharing |
US7225281B2 (en) * | 2001-08-27 | 2007-05-29 | Intel Corporation | Multiprocessor infrastructure for providing flexible bandwidth allocation via multiple instantiations of separate data buses, control buses and support mechanisms |
US7487505B2 (en) * | 2001-08-27 | 2009-02-03 | Intel Corporation | Multithreaded microprocessor with register allocation based on number of active threads |
US6868476B2 (en) * | 2001-08-27 | 2005-03-15 | Intel Corporation | Software controlled content addressable memory in a general purpose execution datapath |
US6738831B2 (en) * | 2001-12-12 | 2004-05-18 | Intel Corporation | Command ordering |
US7437724B2 (en) * | 2002-04-03 | 2008-10-14 | Intel Corporation | Registers for data transfers |
US6941438B2 (en) * | 2003-01-10 | 2005-09-06 | Intel Corporation | Memory interleaving |
US20050010761A1 (en) * | 2003-07-11 | 2005-01-13 | Alwyn Dos Remedios | High performance security policy database cache for network processing |
US7373514B2 (en) * | 2003-07-23 | 2008-05-13 | Intel Corporation | High-performance hashing system |
US7747020B2 (en) * | 2003-12-04 | 2010-06-29 | Intel Corporation | Technique for implementing a security algorithm |
US20050138366A1 (en) * | 2003-12-19 | 2005-06-23 | Pan-Loong Loh | IPSec acceleration using multiple micro engines |
US7543142B2 (en) * | 2003-12-19 | 2009-06-02 | Intel Corporation | Method and apparatus for performing an authentication after cipher operation in a network processor |
US20050135604A1 (en) * | 2003-12-22 | 2005-06-23 | Feghali Wajdi K. | Technique for generating output states in a security algorithm |
US20050149744A1 (en) * | 2003-12-29 | 2005-07-07 | Intel Corporation | Network processor having cryptographic processing including an authentication buffer |
US7529924B2 (en) * | 2003-12-30 | 2009-05-05 | Intel Corporation | Method and apparatus for aligning ciphered data |
US7171604B2 (en) * | 2003-12-30 | 2007-01-30 | Intel Corporation | Method and apparatus for calculating cyclic redundancy check (CRC) on data using a programmable CRC engine |
US7433469B2 (en) * | 2004-04-27 | 2008-10-07 | Intel Corporation | Apparatus and method for implementing the KASUMI ciphering process |
US7653196B2 (en) * | 2004-04-27 | 2010-01-26 | Intel Corporation | Apparatus and method for performing RC4 ciphering |
US7627764B2 (en) * | 2004-06-25 | 2009-12-01 | Intel Corporation | Apparatus and method for performing MD5 digesting |
US7539718B2 (en) * | 2004-09-16 | 2009-05-26 | Intel Corporation | Method and apparatus for performing Montgomery multiplications |
US20060059219A1 (en) * | 2004-09-16 | 2006-03-16 | Koshy Kamal J | Method and apparatus for performing modular exponentiations |
US7555630B2 (en) * | 2004-12-21 | 2009-06-30 | Intel Corporation | Method and apparatus to provide efficient communication between multi-threaded processing elements in a processor unit |
US7418543B2 (en) * | 2004-12-21 | 2008-08-26 | Intel Corporation | Processor having content addressable memory with command ordering |
JP4450737B2 (en) * | 2005-01-11 | 2010-04-14 | 富士通株式会社 | Semiconductor integrated circuit |
US7900022B2 (en) | 2005-12-30 | 2011-03-01 | Intel Corporation | Programmable processing unit with an input buffer and output buffer configured to exclusively exchange data with either a shared memory logic or a multiplier based upon a mode instruction |
US7725624B2 (en) | 2005-12-30 | 2010-05-25 | Intel Corporation | System and method for cryptography processing units and multiplier |
-
2005
- 2005-12-30 US US11/323,329 patent/US20070157030A1/en not_active Abandoned
-
2006
- 2006-02-14 US US11/354,670 patent/US7475229B2/en not_active Expired - Fee Related
- 2006-12-28 US US11/647,892 patent/US20070192626A1/en not_active Abandoned
Patent Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6209098B1 (en) * | 1996-10-25 | 2001-03-27 | Intel Corporation | Circuit and method for ensuring interconnect security with a multi-chip integrated circuit package |
US6282290B1 (en) * | 1997-03-28 | 2001-08-28 | Mykotronx, Inc. | High speed modular exponentiator |
US6092158A (en) * | 1997-06-13 | 2000-07-18 | Intel Corporation | Method and apparatus for arbitrating between command streams |
US6209087B1 (en) * | 1998-06-15 | 2001-03-27 | Cisco Technology, Inc. | Data processor with multiple compare extension instruction |
US20020027988A1 (en) * | 1998-08-26 | 2002-03-07 | Roy Callum | Cryptographic accelerator |
US6298411B1 (en) * | 1999-01-05 | 2001-10-02 | Compaq Computer Corporation | Method and apparatus to share instruction images in a virtual cache |
US6567832B1 (en) * | 1999-03-15 | 2003-05-20 | Matsushita Electric Industrial Co., Ltd. | Device, method, and storage medium for exponentiation and elliptic curve exponentiation |
US6745220B1 (en) * | 2000-11-21 | 2004-06-01 | Matsushita Electric Industrial Co., Ltd. | Efficient exponentiation method and apparatus |
US20020108048A1 (en) * | 2000-12-13 | 2002-08-08 | Broadcom Corporation | Methods and apparatus for implementing a cryptography engine |
US20040039928A1 (en) * | 2000-12-13 | 2004-02-26 | Astrid Elbe | Cryptographic processor |
US6968354B2 (en) * | 2001-03-05 | 2005-11-22 | Hitachi, Ltd. | Tamper-resistant modular multiplication method |
US6748412B2 (en) * | 2001-09-26 | 2004-06-08 | Intel Corporation | Square-and-multiply exponent processor |
US20030084309A1 (en) * | 2001-10-22 | 2003-05-01 | Sun Microsystems, Inc. | Stream processor with cryptographic co-processor |
US20030123120A1 (en) * | 2001-12-31 | 2003-07-03 | Hewlett Gregory J. | Pulse width modulation sequence generation |
US20030174699A1 (en) * | 2002-03-12 | 2003-09-18 | Van Asten Kizito Gysbertus Antonius | High-speed packet memory |
US20040083354A1 (en) * | 2002-10-24 | 2004-04-29 | Kunze Aaron R. | Processor programming |
US6850999B1 (en) * | 2002-11-27 | 2005-02-01 | Cisco Technology, Inc. | Coherency coverage of data across multiple packets varying in sizes |
US6854044B1 (en) * | 2002-12-10 | 2005-02-08 | Altera Corporation | Byte alignment circuitry |
US20040133788A1 (en) * | 2003-01-07 | 2004-07-08 | Perkins Gregory M. | Multi-precision exponentiation method and apparatus |
US20050278502A1 (en) * | 2003-03-28 | 2005-12-15 | Hundley Douglas E | Method and apparatus for chaining multiple independent hardware acceleration operations |
US20040225885A1 (en) * | 2003-05-05 | 2004-11-11 | Sun Microsystems, Inc | Methods and systems for efficiently integrating a cryptographic co-processor |
US20040230813A1 (en) * | 2003-05-12 | 2004-11-18 | International Business Machines Corporation | Cryptographic coprocessor on a general purpose microprocessor |
US20050141715A1 (en) * | 2003-12-29 | 2005-06-30 | Sydir Jaroslaw J. | Method and apparatus for scheduling the processing of commands for execution by cryptographic algorithm cores in a programmable network processor |
US20090091826A1 (en) * | 2005-07-19 | 2009-04-09 | Nitto Denko Corporation | Polarizing plate and image display apparatus |
US20070130445A1 (en) * | 2005-12-05 | 2007-06-07 | Intel Corporation | Heterogeneous multi-core processor having dedicated connections between processor cores |
US20070157030A1 (en) * | 2005-12-30 | 2007-07-05 | Feghali Wajdi K | Cryptographic system component |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8667254B1 (en) * | 2008-05-15 | 2014-03-04 | Xilinx, Inc. | Method and apparatus for processing data in an embedded system |
US20100169570A1 (en) * | 2008-12-31 | 2010-07-01 | Michael Mesnier | Providing differentiated I/O services within a hardware storage controller |
US20120096281A1 (en) * | 2008-12-31 | 2012-04-19 | Eszenyi Mathew S | Selective storage encryption |
US10503654B2 (en) | 2016-09-01 | 2019-12-10 | Intel Corporation | Selective caching of erasure coded fragments in a distributed storage system |
CN111586076A (en) * | 2020-05-26 | 2020-08-25 | 清华大学 | Remote control and telemetry information tamper-proof encryption and decryption method and system based on mixed password |
Also Published As
Publication number | Publication date |
---|---|
US20070174372A1 (en) | 2007-07-26 |
US20070157030A1 (en) | 2007-07-05 |
US7475229B2 (en) | 2009-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7725624B2 (en) | System and method for cryptography processing units and multiplier | |
US7475229B2 (en) | Executing instruction for processing by ALU accessing different scope of variables using scope index automatically changed upon procedure call and exit | |
US7900022B2 (en) | Programmable processing unit with an input buffer and output buffer configured to exclusively exchange data with either a shared memory logic or a multiplier based upon a mode instruction | |
US8073892B2 (en) | Cryptographic system, method and multiplier | |
US7620821B1 (en) | Processor including general-purpose and cryptographic functionality in which cryptographic operations are visible to user-specified software | |
US8417961B2 (en) | Apparatus and method for implementing instruction support for performing a cyclic redundancy check (CRC) | |
US7953221B2 (en) | Method for processing multiple operations | |
US20170288855A1 (en) | Power side-channel attack resistant advanced encryption standard accelerator processor | |
US20100250965A1 (en) | Apparatus and method for implementing instruction support for the advanced encryption standard (aes) algorithm | |
US20100246814A1 (en) | Apparatus and method for implementing instruction support for the data encryption standard (des) algorithm | |
US20070192571A1 (en) | Programmable processing unit providing concurrent datapath operation of multiple instructions | |
US8020142B2 (en) | Hardware accelerator | |
US20060059221A1 (en) | Multiply instructions for modular exponentiation | |
US7684563B1 (en) | Apparatus and method for implementing a unified hash algorithm pipeline | |
US9317286B2 (en) | Apparatus and method for implementing instruction support for the camellia cipher algorithm | |
US7570760B1 (en) | Apparatus and method for implementing a block cipher algorithm | |
US20080148011A1 (en) | Carry/Borrow Handling | |
US20100246815A1 (en) | Apparatus and method for implementing instruction support for the kasumi cipher algorithm | |
CN110659505A (en) | Accelerator for encrypting or decrypting confidential data and additional authentication data | |
US7720219B1 (en) | Apparatus and method for implementing a hash algorithm word buffer | |
US20080263115A1 (en) | Very long arithmetic logic unit for security processor | |
Le et al. | Efficient and High-Speed CGRA Accelerator for Cryptographic Applications | |
Lee et al. | Validating word-oriented processors for bit and multi-word operations | |
Greathouse | Processors with On-Die Cryptography Accelerators | |
Ahmed et al. | RE-CONFIGURABLE PROGRAMMABLE SECURITY PROCESSOR FOR NETWORK APPLICATIONS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FEGHALI, WAJDI;HASENPLAUGH, WILLIAM;WOLRICH, GILBERT M.;AND OTHERS;REEL/FRAME:019759/0408;SIGNING DATES FROM 20070202 TO 20070226 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |