WO2007098024A2

WO2007098024A2 - Allocation of resources among an array of computers

Info

Publication number: WO2007098024A2
Application number: PCT/US2007/004081
Authority: WO
Inventors: Charles H. Moore
Original assignee: Vns Portfolio Llc
Priority date: 2006-02-16
Filing date: 2007-02-16
Publication date: 2007-08-30
Also published as: KR20090003217A; JP2009527814A; EP1984836A4; WO2007098024A3; EP1984836A2

Abstract

A computer array (10) has a plurality of computers (12). The computers (12) communicate directly with neighbor computers and indirectly with other computers in the array. The computers pass data words that include data and/or instructions. As many as 4 instructions can be included in one 18 bit data word. Since four instructions are communicated at a time it is possible to communicate an entire micro-loop made up of as many as four instructions. The computers of the present invention can execute instruction directly from their input registers.

Description

ALLOCATION OF RESOURCES AMONG AN ARRAY OF COMPUTERS

Inventor: Charles H. Moore

BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates to the field of computers and computer processors, and more particularly to a method and means for a unique type of interaction between computers. The predominant current usage of the present inventive computer array is in the combination of multiple computers on a single microchip.

Description of the Background Art It is known in the prior art to use multiple computer processors, working together, to accomplish a task. Multi-threading and several other schemes have been used to allow processors to cooperate. However, it is generally recognized that there is much room for improvement in this area. Furthermore, it is a trend now to combine several processors on a single chip, thereby exacerbating the problem and increasing the urgency to find a solution for causing computers to work together in an efficient manner. Now it is thought that, for a number of reasons, the best arrangement of multiple processors for many applications might be an array consisting of many computers, each having processing capabilities and at least some dedicated memory. In such an example, the computers will each not be particularly powerful in its own right, but rather the computing power will be achieved through close cooperation of the computers.

Copending applications in the name of this same inventor have described and claimed a number of inventive aspects of such computer arrays, including some specifics as to how such computers may be arranged, and how communications channels between them might occur. However, implementation of the relatively new concept of computer arrays will require yet more innovations in order to operate with the greatest efficiency. Clearly there any many questions to be answered regarding how best to arrange, communicate between, divide tasks among, and otherwise use computer arrays. Some of these questions may have been answered, but there may well be room for improvement even over the existing solutions. In other cases, solutions may require addressing questions of first impression in order to solve new problems that did not exist in the prior art.

SUMMARY

Accordingly, it is an object of the present invention to provide a method and apparatus for efficiently using the computing power available in an array of computers

It is still another object of the present invention to provide an apparatus and method for providing substantial computing power inexpensively. It is yet another object of the present invention to provide an apparatus and method for increasing the operational speed of a multi-computer array.

It is still another object of the present invention to provide an apparatus and method for accomplishing computationally intensive tasks.

It is yet another object of the present invention to increase the speed and efficiency by which one of a group of computers can communicate with and/or utilize the resources of one or more of the other computers.

Briefly, a known embodiment of the present invention is an array of computers, each computer having its own memory and being capable of independent computational functions. In order to accomplish tasks cooperatively, the computers must pass data and/or instructions from one to another. According to one embodiment of the invention, the computers have connecting data paths between orthogonally adjacent computers such that each computer can communicate directly with as many as four "neighbors". If it is desired for a computer to communicate with another that is not an immediate neighbor, then communications will be channeled through other computers to the desired destination.

Since, according to the described embodiment the present invention, data words containing as many as four instructions can be passed in parallel, both between computers and also to and from the internal memories of each computer, according to the present invention each data word can consist of a min-program, which will be referred to herein as micro-loops. It should be remembered that in a large array of processors large tasks are ideally divided into a plurality of smaller tasks, each of which smaller tasks can readily be accomplished by a processor with somewhat limited capabilities. Therefore, it is thought that four instruction loops will be quite useful. This fact is made even more noticeable by the associated fact that, since the computers do have limited facilities, it will be expedient for them, from time to time, to "borrow" facilities from a neighbor. This will present an ideal opportunity for the use of the micro-loops. While a computer might need to borrow processing power, or the like, from a neighbor another likely possibility is that it may need to borrow some memory from a neighbor, using it in a manner somewhat similar to its own internal memory. By passing a micro-loop to a neighbor instructing it to read or write a series of data, such memory borrowing can be readily accomplished. Such a micro loop might contain, for example, an instruction to write from a particular internal memory location, increment that location, and then repeat for a given number of iterations.

The above example of passing a micro-loop to a neighbor is an example of yet another aspect of the invention, which is presently being referred to as "Forthlets" because they are presently implemented in the Forth computer language - although the application of the invention is not limited strictly to use with Forth. A Forthlet is a mini-program that can be transmitted directly to a computer for execution. In prior art computers, an instruction must be read and stored before execution but, as will be seen in light of the detailed description herein, that is not necessary according to the present invention. Indeed, it is anticipated that an important aspect of the invention will be that a computer can generate a Forthlet and pass it off to another computer for execution. Forthlets can be "pre-written" by a programmer and stored for use. Indeed, Forthlets can be accumulated into a "library" for use as needed. However, it is also within the scope of the invention that Forthlets can be generated, according to pre-programmed criteria, within a computer. By way of example, in an embodiment of the invention, I/O registers are treated as memory addresses which means that the same (or similar) instructions that read and write memory can also perform I/O operations. In the case of multi-core chips, there is a powerful ramification of this choice for I/O structure. Not only can the core processor read and execute instructions from its local ROM and RAM, it can also read and execute instructions presented to it on I/O ports or registers. Now the concept of tight loops transferring data becomes incredibly powerful. It allows instruction streams to be presented to the cores at I/O ports and executed directly from them. Therefore, one core can send a code object to an adjoining core processor which can execute it directly. Code objects can now be passed among the cores, which execute them at the registers. The code objects arrive at a very highspeed since each core is essentially working entirely within its own local address space with no apparent time spent transferring code instructions.

As discussed above, each instruction fetch brings a plurality (four in the presently described embodiment) of instructions into the core processor. Although this sort of built-in "cache" is certainly small, it is extremely effective when the instructions themselves take advantage of it. For instance, micro for — next loops can be constructed that are contained entirely within the bounds of a single 18-bit instruction word. These types of constructs are ideal when combined with the automatic status signaling built into the I/O registers because that means large blocks of data can be transferred with only a single instruction fetch. And with this sort of instruction packing, the concept of executing instructions being presented on a shared I/O register from a neighboring processor core takes on new power because now each word appearing in that register represents not one, but four instructions. These types of software / hardware structures and their staggering impact on performance in multi-core chips are simply not available to traditional languages - they are only possible in an instruction set where multiple instructions are packed within a single word and complete loops can be executed from within that word.

These and other objects and advantages of the present invention will become clear to those skilled in the art in view of the description of modes of carrying out the invention, and the industrial applicability thereof, as described herein and as illustrated in the several figures of the drawing. The objects and advantages listed are not an exhaustive list of all possible advantages of the invention. Moreover, it will be possible to practice the invention even where one or more of the intended objects and/or advantages might be absent or not required in the application. Further, those skilled in the art will recognize that various embodiments of the present invention may achieve one or more, but not necessarily all, of the described objects and/or advantages. Accordingly, the objects and/or advantages described herein are not essential elements of the present invention, and should not be construed as limitations.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a diagrammatic view of a computer array, according to the present invention;

Fig. 2 is a detailed diagram showing a subset of the computers of Fig. 1 and a more detailed view of the interconnecting data buses of Fig. 1;

Fig. 3 is a block diagram depicting a general layout of one of the computers of Figs. 1 and 2; Fig. 4 is a diagrammatic representation of an instruction word according to the present inventive application; and

Fig. 5 is a schematic representation of the slot sequencer 42 of Fig. 3.

DETAILED DESCRIPTION OF THE INVENTION

This invention is described in the following description with reference to the Figures, in which like numbers represent the same or similar elements. While this invention is described in terms of modes for achieving this invention's objectives, it will be appreciated by those skilled in the art that variations may be accomplished in view of these teachings without deviating from the spirit or scope of the present invention.

The embodiments and variations of the invention described herein, and/or shown in the drawings, are presented by way of example only and are not limiting as to the scope of the invention. Unless otherwise specifically stated, individual aspects and components of the invention may be omitted or modified, or may have substituted therefore known equivalents, or as yet unknown substitutes such as may be developed in the future or such as may be found to be acceptable substitutes in the future. The invention may also be modified for a variety of applications while remaining within the spirit and scope of the claimed invention, since the range of potential applications is great, and since it is intended that the present invention be adaptable to many such variations.

While the following embodiment is described using an example of a computer array having both asynchronous communications between computers and individually asynchronously operating computers, the applications of the present invention are, by no means, limited to that context.

A known mode for carrying out the invention is an array of individual computers. The inventive computer array is depicted in a diagrammatic view in Fig. 1 and is designated therein by the general reference character 10. The computer array 10 has a plurality (twenty four in the example shown) of computers 12 (sometimes also referred to as "cores" or "nodes" in the example of an array). In the example shown, all of the computers 12 are located on a single die 14. According to the present invention, each of the computers 12 is a generally independently functioning computer, as will be discussed in more detail hereinafter. The computers 12 are interconnected by a plurality (the quantities of which will be discussed in more detail hereinafter) of interconnecting data buses 16. In this example, the data buses 16 are bidirectional asynchronous high speed parallel data buses, although it is within the scope of the invention that other interconnecting means might be employed for the purpose. In the present embodiment of the array 10, not only is data communication between the computers 12 asynchronous, the individual computers 12 also operate in an internally asynchronous mode. This has been found by the inventor to provide important advantages. For example, since a clock signal does not have to be distributed throughout the computer array 10, a great deal of power is saved. Furthermore, not having to distribute a clock signal eliminates many timing problems that could limit the size of the array 10 or cause other known difficulties.

One skilled in the art will recognize that there will be additional components on the die 14 that are omitted from the view of Fig. 1 for the sake of clarity. Such additional components include power buses, external connection pads, and other such common aspects of a microprocessor chip.

Computer 12e is an example of one of the computers 12 that is not on the periphery of the array 10. That is, computer 12e has four orthogonally adjacent computers 12a, 12b, 12c and 12d. This grouping of computers 12a through 12e will be used hereinafter in relation to a more detailed discussion of the communications between the computers 12 of the array 10. As can be seen in the view of Fig. 1, interior computers such as computer 12e will have four other computers 12 with which they can directly communicate via the buses 16. In the following discussion, the principles discussed will apply to all of the computers 12 except that the computers 12 on the periphery of the array 10 will be in direct communication with only three or, in the case of the corner computers 12, only two other of the computers 12.

Fig. 2 is a more detailed view of a portion of Fig. 1 showing only some of the computers 12 and, in particular, computers 12a through 12e, inclusive. The view of Fig. 2 also reveals that the data buses 16 each have a read line 18, a write line 20 and a plurality (eighteen, in this example) of data lines 22. The data lines 22 are capable of transferring all the bits of one eighteen-bit instruction word generally simultaneously in parallel. It should be noted that, in one embodiment of the invention, some of the computers 12 are mirror images of adjacent computers. However, whether the computers 12 are all oriented identically or as mirror images of adjacent computers is not an aspect of this presently described invention. Therefore, in order to better describe this invention, this potential complication will not be discussed further herein. According to the present inventive method, a computer 12, such as the computer 12e can set one, two, three or all four of its read lines 18 such that it is prepared to receive data from the respective one, two, three or all four adjacent computers 12. Similarly, it is also possible for a computer 12 to set one, two, three or all four of its write lines 20 high. Although the inventor does not believe that there is presently any practical value to setting more than one of a computer's 12 write lines 20 high at one time, doing so is not beyond the scope of this invention, as it conceivable that a use for such an operation may occur.

When one of the adjacent computers 12a, 12b, 12c or 12d sets a write line 20 between itself and the computer 12e high, if the computer 12e has already set the corresponding read line 18 high, then a word is transferred from that computer 12a, 12b, 12c or 12d to the computer 12e on the associated data lines 22. Then the sending computer 12 will release the write line 20 and the receiving computer (12e in this example) pulls both the write line 20 and the read line 18 low. The latter action will acknowledge to the sending computer 12 that the data has been received. Note that the above description is not intended necessarily to denote the sequence of events in order. In actual practice, in this example the receiving computer may try to set the write line 20 low slightly before the sending computer 12 releases (stops pulling high) its write line 20. In such an instance, as soon as the sending computer 12 releases its write line 20 the write line 20 will be pulled low by the receiving computer 12e.

In the present example, only a programming error would cause both computers 12 on the opposite ends of one of the buses 16 to try to set high the read line 18 there-between. Also, it would be error for both computers 12 on the opposite ends of one of the buses 16 to try to set high the write line 18 there-between at the same time. Similarly, as discussed above, it is not currently anticipated that it would be desirable to have a single computer 12 set more than one of its four write lines 20 high. However, it is presently anticipated that there will be occasions wherein it is desirable to set different combinations of the read lines 18 high such that one of the computers 12 can be in a wait state awaiting data from the first one of the chosen computers 12 to set its corresponding write line 20 high.

In the example discussed above, computer 12e was described as setting one or more of its read lines 18 high before an adjacent computer (selected from one or more of the computers 12a, 12b, 12c or 12d) has set its write line 20 high. However, this process can certainly occur in the opposite order. For example, if the computer 12e were attempting to write to the computer 12a, then computer 12e would set the write line 20 between computer 12e and computer 12a to high. If the read line 18 between computer 12e and computer 12a has then not already been set to high by computer 12a, then computer 12e will simply wait until computer 12a does set that read line 20 high. Then, as discussed above, when both of a corresponding pair of write line 18 and read line 20 are high the data awaiting to be transferred on the data lines 22 is transferred. Thereafter, the receiving computer 12 (computer 12a, in this example) sets both the read line 18 and the write line 20 between the two computers (12e and 12a in this example) to low as soon as the sending computer 12e releases it.

Whenever a computer 12 such as the computer 12e has set one of its write lines 20 high in anticipation of writing it will simply wait, using essentially no power, until the data is "requested", as described above, from the appropriate adjacent computer 12, unless the computer 12 to which the data is to be sent has already set its read line 18 high, in which case the data is transmitted immediately. Similarly, whenever a computer 12 has set one or more of its read lines 18 to high in anticipation of reading it will simply wait, using essentially no power, until the write line 20 connected to a selected computer 12 goes high to transfer an instruction word between the two computers 12.

There may be several potential means and/or methods to cause the computers 12 to function as described above. However, in this present example, the computers 12 so behave simply because they are operating generally asynchronously internally (in addition to transferring data there-between in the asynchronous manner described). That is, instructions are completed sequentially. When either a write or read instruction occurs, there can be no further action until that instruction is completed (or, perhaps alternatively, until it is aborted, as by a "reset" or the like). There is no regular clock pulse, in the prior art sense. Rather, a pulse is generated to accomplish a next instruction only when the instruction being executed either is not a read or write type instruction (given that a read or write type instruction would require completion by another entity) or else when the read or write type operation is, in fact, completed. Fig. 3 is a block diagram depicting the general layout of an example of one of the computers 12 of Figs. 1 and 2. As can be seen in the view of Fig. 3, each of the computers 12 is a generally self contained computer having its own RAM 24 and ROM 26. As mentioned previously, the computers 12 are also sometimes referred to as individual "cores", given that they are, in the present example, combined on a single chip.

Other basic components of the computer 12 are a return stack 28, an instruction area 30, an arithmetic logic unit ("ALU") 32, a data stack 34_. and a decode logic section 36 for decoding instructions. One skilled in the art will be generally familiar with the operation of stack based computers such as the computers 12 of this present example. The computers 12 are dual stack computers having the data stack 34 and separate return stack 28.

In this embodiment of the invention, the computer 12 has four communication ports 38 for communicating with adjacent computers 12. The communication ports 38 are tri-state drivers, having an off status, a receive status (for driving signals into the computer 12) and a send status (for driving signals out of the computer 12) Of course, if the particular computer 12 is not on the interior of the array (Fig. 1) such as the example of computer 12e, then one or more of the communication ports will not be used in that particular computer, at least for the purposes described herein. The instruction area 30 includes a number of registers 40 including, in this example, an A register 40a, a B register 40b and . a P register 40c. In this example, the A register 40a is a full eighteen-bit register, while the B register 40b and the P register 40c are nine-bit registers. Although the invention is not limited by this example, the present computer 12 is implemented to execute native Forth language instructions. As one familiar with the Forth computer language will appreciate, complicated Forth instructions, known as Forth "words" are constructed from the native processor instructions designed into the computer. The collection of Forth words is known as a "dictionary". In other languages, this might be known as a "library". As will be described in greater detail hereinafter, the computer 12 reads eighteen bits at a time from RAM 24, ROM 26 or directly from one of the data buses 16 (Fig. 2). However, since in Forth most instructions (known as operand-less instructions) obtain their operands directly from the stacks 28 and 34, they are generally only five bits in length such that up to four instructions can be included in a single eighteen-bit instruction word, with the condition that the last instruction in the group is selected from a limited set of instructions that require only three bits. Also depicted in block diagrammatic form in the view of Fig. 3 is a slot sequencer 42. In this embodiment of the invention, the top two registers in the data stack 34 are a T register 44 and an S register 46. Fig. 4 is a diagrammatic representation of an instruction word 48. (It should be noted that the instruction word 48 can actually contain instructions, data, or some combination thereof.) The instruction word 48 consists of eighteen bits 50. This being a binary computer, each of the bits 50 will be a '1' or a 'O¹. As previously discussed herein, the eighteen-bit wide instruction word 48 can contain up to four instructions 52 in four slots 54 called slot zero 54a, slot one 54b, slot two 54c and slot three 54d. In the present embodiment of the invention, the eighteen-bit instruction words 48 are always read as a whole. Therefore, since there is always a potential of having up to four instructions in the instruction word 48, a no-op (no operation) instruction is included in the instruction set of the computer 12 to provide for instances when using all of the available slots 54 might be unnecessary or even undesirable. It should be noted that, according to one particular embodiment of the invention, the polarity (active high as compared to active low) of bits 50 in alternate slots (specifically, slots one 54b and three 54c) is reversed. However, this is not a necessary aspect of the presently described invention and, therefore, in order to better explain this invention this potential complication is avoided in the following discussion.

Fig. 5 is a schematic representation of the slot sequencer 42 of Fig. 3. As can be seen in the view of Fig. 5, the slot sequencer 42 has a plurality (fourteen in this example) of inverters 56 and one NAND gate 58 arranged in a ring, such that a signal is inverted an odd number of times as it travels through the fourteen inverters 56 and the NAND gate 58. A signal is initiated in the slot sequencer 42 when either of the two inputs to an OR gate 60 goes high. A first OR gate input 62 is derived from a bit i4 66 (Fig. 4) of the instruction 52 being executed. If bit i4 is high then that particular instruction 52 is an ALL) instruction, and the i4 bit 66 is '1'. When the i4 bit is '1', then the first OR gate input 62 is high, and the slot sequencer 42 is triggered to initiate a pulse that will cause the execution of the next instruction 52.

When the slot sequencer 42 is triggered, either by the first OR gate input 62 going high or by the second OR gate input 64 going high (as will be discussed hereinafter), then a signal will travel around the slot sequencer 42 twice, producing an output at a slot sequencer output 68 each time. The first time the signal passes the slot sequencer output 68 it will be low, and the second time the output at the slot sequencer output 68 will be high. The relatively wide output from the slot sequencer output 68 is provided to a pulse generator 70 (shown in block diagrammatic form) that produces a narrow timing pulse as an output. One skilled in the art will recognize that the narrow timing pulse is desirable to accurately initiate the operations of the computer 12.

When the particular instruction 52 being executed is a read or a write instruction, or any other instruction wherein it is not desired that the instruction 52 being executed triggers immediate execution of the next instruction 52 in sequence, then the i4 bit 66 is '0' (low) and the first OR gate input 62 is, therefore, also low.

One skilled in the art will recognize that the timing of events in a device such as the computers 12 is generally quite critical, and this is no exception. Upon examination of the slot sequencer 42 one skilled in the art will recognize that the output from the OR gate 60 must remain high until after the signal has circulated past the NAND gate 58 in order to initiate the second "lap" of the ring. Thereafter, the output from the OR gate 60 will go low during that second "lap" in order to prevent unwanted continued oscillation of the circuit.

As can be appreciated in light of the above discussion, when the i4 bit 66 is ¹O', then the slot sequencer 42 will not be triggered - assuming that the second OR gate input 66, which will be' discussed hereinafter, is not high. As discussed, above, the i4 bit 66 of each instruction 52 is set according to whether or not that instruction is a read or write type of instruction. The remaining bits 50 in the instruction 52 provide the remainder of the particular opcode for that instruction. In the case of a read or write type instruction, one or more of the bits may be used to indicate where data is to be read from or written to in that particular computer 12. In the present example of the invention, data to be written always comes from the T register 44 (the top of the data stack 34), however data can be selectively read into either the T register 44 or else the instruction area 30 from where it can be executed. That is because, in this particular embodiment of the invention, either data or instructions can be communicated in the manner described herein and instructions can, therefore, be executed directly from the data bus 16, although this is not a necessary aspect of this present invention. Furthermore, one or more of the bits 50 will be used to indicate which of the ports 38, if any, is to be set to read or write. This later operation is optionally accomplished by using one or more bits to designate a register 40, such as the A register 40a, the B register, or the like. In such an example, the designated register 40 will be preloaded with data having a bit corresponding to each of the ports 38 (and, also, any other potential entity with which the computer 12 may be attempting to communicate, such as memory, an external communications port, or the like.) For example, each of four bits in the particular register 40 can correspond to each of the up port 38a, the right port 38b, the left port 38c or the down port 38d. In such case, where there is a '1' at any of those bit locations, communication will be set to proceed through the corresponding port 38. As previously discussed herein, in the present embodiment of the invention it is anticipated that a read opcode might set more than one port 38 for communication in a single instruction while, although it is possible, it is not anticipated that a write opcode will set more than one port 38 for communication in a single instruction.

The immediately following example will assume a communication wherein computer 12e is attempting to write to computer 12c, although the example is applicable to communication between any adjacent computers 12. When a write instruction is executed in a writing computer 12e, the selected write line 20 (in this example, the write line 20 between computers 12e and 12c) is set high, if the corresponding read line 18 is already high then data is immediately sent from the selected location through the selected communications port 38. Alternatively, if the corresponding read line 18 is not already high, then computer 12e will simply stop operation until the corresponding read line 18 does go high. The mechanism for stopping (or, more accurately, not enabling further operations of) the computer 12a when there is a read or write type instruction has been discussed previously herein. In short, the opcode of the instruction 52 will have a '0' at bit position i4 66, and so the first OR gate input 62 of the OR gate 60 is low, and so the slot sequencer 42 is not triggered to generate an enabling pulse.

As for how the operation of the computer 12e is resumed when a read or write type instruction is completed, the mechanism for that is as follows: When both the read line 18 and the corresponding write line 20 between computers 12e and 12c are high, then both lines 18 and 20 will be released by each of the respective computers 12 that is holding it high. (In this example, the sending computer 12e will be holding the write line 18 high while the receiving computer 12c will be holding the read line 20 high). Then the receiving computer 12c will pull both lines 18 and 20 low. In actual practice, the receiving computer 12c may attempt to pull the lines 18 and 20 low before the sending computer 12e has released the write line 18. However, since the lines 18 and 20 are pulled high and only weakly held (latched) low, any attempt to pull a line 18 or 20 low will not actually succeed until that line 18 or 20 is released by the computer 12 that is latching it high. When both lines 18 and 20 in a data bus 16 are pulled low, this is an

"acknowledge" condition. Each of the computers 12e and 12c will, upon the acknowledge condition, set its own internal acknowledge line 72 high. As can be seen in the view of Fig. 5, the acknowledge line 72 provides the second OR gate input 64. Since an input to either of the OR gate 60 inputs 62 or 64 will cause the output of the OR gate 60 to go high, this will initiate operation of the slot sequencer 42 in the manner previously described herein, such that the instruction 52 in the next slot 54 of the instruction word 48 will be executed. The acknowledge line 72 stays high until the next instruction 52 is decoded, in order to prevent spurious addresses from reaching the address bus.

In any case when the instruction 52 being executed is in the slot three position of the instruction word 48, the computer 12 will fetch the next awaiting eighteen-bit instruction word 48 unless, of course, bit i4 66 is a ¹O'. In actual practice, the present inventive mechanism includes a method and apparatus for "prefetching" instructions such that the fetch can begin before the end of the execution of all instructions 52 in the instruction word 48. However, this also is not a necessary aspect of the present inventive method and apparatus for asynchronous data communications.

The above example wherein computer 12e is writing to computer 12c has been described in detail. As can be appreciated in light of the above discussion, the operations are essentially the same whether computer 12e attempts to write to computer 12c first, or whether computer 12c first attempts to read from computer 12e. The operation cannot be completed until both computers 12 and 12c are ready and, whichever computer 12e or 12c is ready first, that first computer 12 simply "goes to sleep" until the other computer 12e or 12c completes the transfer. Another way of looking at the above described process is that, actually, both the writing computer 12e and the receiving computer 12c go to sleep when they execute the write and read instructions, respectively, but the last one to enter into the transaction reawakens nearly instantaneously when both the read line 18 and the write line 20 are high, whereas the first computer 12 to initiate the transaction can stay asleep nearly indefinitely until the second computer 12 is ready to complete the process.

The inventor believes that a key feature for enabling efficient asynchronous communications between devices is some sort of acknowledge signal or condition. In the prior art, most communication between devices has been clocked and there is no direct way for a sending device to know that the receiving device has properly received the data. Methods such as checksum operations may have been used to attempt to insure that data is correctly received, but the sending device has no direct indication that the operation is completed. The present inventive method, as described herein, provides the necessary acknowledge condition that allows, or at least makes practical, asynchronous communications between the devices. Furthermore, the acknowledge condition also makes it possible for one or more of the devices to "go to sleep" until the acknowledge condition occurs. Of course, an acknowledge condition could be communicated between the computers 12 by a separate signal being sent between the computers 12 (either over the interconnecting data bus 16 or over a separate signal line), and such an acknowledge signal would be within the scope of this aspect of the present invention. However, according to the embodiment of the invention described herein, it can be appreciated that there is even more economy involved here, in that the method for acknowledgement does not require any additional signal, clock cycle, timing pulse, or any such resource beyond that described, to actually affect the communication.

Various modifications may be made to the invention without altering its value or scope. For example, while this invention has been described herein in terms of read instructions and write, instructions, in actual practice there may be more than one read type instruction and/or more than one write type instruction. As just one example, in one embodiment of the invention there is a write instruction that increments the register and other write instructions that do not. Similarly, write instructions can vary according to which register 40 is used to select communications ports 38, or the like, as discussed previously herein. There can also be a number of different read instructions, depending only upon which variations the designer of the computers 12 deems to be a useful choice of alternative read behaviors. Similarly, while the present invention has been described herein in relation to communications between computers 12 in an array 10 on a single die 14, the same principles and method can be used, or modified for use, to accomplish other inter- device communications, such as communications between a computer 12 and its dedicated memory or between a computer 12 in an array 10 and an external device (through an input/output port, or the like). Indeed, it is anticipated that some applications may require arrays of arrays - with the presently described inter device communication method being potentially applied to communication among the arrays of arrays. While specific examples of the inventive computer array 10 and computer 12 have been discussed therein, it is expected that there will be a great many applications for these which have not yet been envisioned. Indeed, it is one of the advantages of the present invention that the inventive method and apparatus may be adapted to a great variety of uses.

All of the above are only some of the examples of available embodiments of the present invention. Those skilled in the art will readily observe that numerous other modifications and alterations may be made without departing from the spirit and scope of the invention. Accordingly, the disclosure herein is not intended as limiting and the appended claims are to be interpreted as encompassing the entire scope of the invention.

INDUSTRIAL APPLICABILITY The inventive computer array 10, computers 12 and associated method 74 are intende_;d to be widely used in a great variety of computer applications. It is expected that it they will be particularly useful in applications where significant computing power is required, and yet power consumption and heat production are important considerations.

As discussed previously herein, the applicability of the present invention is such that many types of inter-device computer communications can be improved thereby. It is anticipated that the inventive method, wherein some computers can be allowed to "go to sleep" when not in use, will be used to reduce power consumption, reduce heat production, and improve the efficiency of communication between computers and computerized devices in a great variety of applications and implementations. Since the computer array 10, computer 12 and method 74 of the present invention may be readily produced and integrated with existing tasks, input/output devices, and the like, and since the advantages as described herein are provided, it is expected that they will be readily accepted in the industry. For these and other reasons, it is expected that the utility and industrial applicability of the invention will be both significant in scope and long-lasting in duration.

Claims

I CLAIM:

1. A computer array, comprising: a plurality of computers; and a plurality of data paths connecting the computers; wherein: each computer has a direct communication -path with at least some of its closest neighbors.

2. The computer array of claim 1 , wherein: each computer has a direct communication path with all of its closest neighbors.

3. The computer array of claim 1 , wherein: the definition of neighbor restricted to those computers that are directly adjacent to the subject computer.

4. The computer array of claim 1 , wherein: the definition of neighbor includes those computers that are either directly adjacent to the subject computer horizontally or directly adjacent to the subject computer vertically.

5. The computer array of claim 1 , wherein: the subject computer can communicate with other computers in the array that are not directly adjacent thereto by passing messages though at least some of the other computers.

6. A computer, wherein: instructions are optionally executed directly from an input register.

7. In a computer having multiple word instruction sets, an improvement comprising: including a loop instruction in the multiple word instruction set to loop back to the first instruction in the multiple word instruction set.

8. The improvement of claim 7, wherein: the loop instruction is the last instruction in the multiple word instruction set.

9. The improvement of claim 7, and further including: a no-op instruction for inclusion within the multiple word instruction set where fewer. than four instructions are needed to comprise an entire loop.

10. In an array of computers, an improvement comprising: causing a first one of said computers to communicate an instruction group to a second one of said computers; and causing said second one of said computers to execute said instruction group directly from an input register.

11. The improvement of claim 10, wherein: said first one of said computers and said second one of said computers have a direct communication path there between.

12. The improvement of claim 10, wherein: said instruction group is routed through at least a third computer between said first computer and said second computer.

13. The improvement of claim 10, wherein: said instruction group has associated therewith instructions for routing said instruction group from said first computer to said second computer. NOTICE: This correspondence chart is provided for informational purposes only. is not a part of the official Patent Application.

CORRESPONDENCE CHART

computer array computers die data bus read line write line data lines

RAM

ROM return stack instruction area

ALU data stack decode section internal communications port a up port b right port c left port d down port registers a A register b B register c P register slot sequencer

T register

S register instruction word bits

20 instructions slots a slot O b slot i c slot 2 d slot 3 inverters NAND gate OR gate first OR gate input second OR gate input i4 bit slot sequencer output pulse generator acknowledge line

21