WO2009069072A1

WO2009069072A1 - Multiple input-queuing system

Info

Publication number: WO2009069072A1
Application number: PCT/IB2008/054936
Authority: WO
Inventors: Huzaifa Najmi
Original assignee: Nxp B.V.
Priority date: 2007-11-29
Filing date: 2008-11-25
Publication date: 2009-06-04

Abstract

The system according to the invention provides a means of an efficient and high performance multiple input-single or multiple output system, which minimizes memory requirement. The main advantage of the system according to the invention is thus a significant gain in efficiency of memory usage due to the application of circular buffers to implement the necessary queues required by the mapper.

Description

Multiple input-queuing system

FIELD OF THE INVENTION

The invention relates to the field of computer and communication systems, and in particular to a system that receives multiple input-streams that are routed to a common output port.

BACKGROUND OF THE INVENTION

Multiple-input, common-output systems are common in the art. Multiple hosts, for example, may communicate data to a common server; multiple processors may access a common memory device; multiple data streams may be routed to a common transmission media; and so on. Generally, the input to the multiple-input system is characterized by bursts of activities from one or more input-streams. During these bursts of activities, the arrival rate of input data generally exceeds the allowable departure rate of the data to a subsequent receiving system, and buffering must be provided to prevent a loss of data.

Conventionally, one of two types of systems are employed to manage the routing of multiple input-streams to a common output, dependent upon whether the design priority is maximum memory-utilization efficiency, or maximum performance.

In a memory-efficient embodiment, a common buffer is provided for queuing the data from the input streams, and each process that is providing an input-stream controls access to this common buffer, in accordance with a given control protocol. Data is unloaded from this common buffer to provide the common output. Because a common buffer is used to receive the flow from the various input-streams, the size of the buffer can be optimized for a given aggregate arrival rate. That is, because it is extremely unlikely that all input-streams will be active contemporaneously, the common buffer is sized substantially smaller than the size required to accommodate maximum flow from all streams simultaneously. The performance of such an embodiment, however, is dependent upon the poorest performing process that is providing an input-stream, because a poor process can tie up the common buffer while all of the other processes await access to the common buffer. A multiple-input queuing system of this type is disclosed in US 5,233,603.

The system contains a single buffer memory connected to multiple input and output lines. A multiplexer and a de-multiplexer connect the input and output lines to the memory. Different memory areas are provided for different inputs and outputs. In another embodiment this patent discloses the use of different memory elements, each connected to a different, predetermined output line. The input lines access these memory elements via a shared bus. In yet another embodiment a separate buffer memory is used for each combination of an input and an output line.

GB-A-2349296 discloses a network switch with a multiple-input, single- output buffering system. For each input a predetermined buffer memory is provided.

To maintain independence among processes that are providing the multiple inputs, conventional high-performance multiple-input systems typically employ multiple input buffers, as illustrated by system 100' of fig. 1. Each buffer 110' provides a queue for receiving data from its corresponding input-stream 101 '. In the example of Fig. 1, a receiving system (not shown in fig. 1) asserts an "Unload(n)" command to select the next-available data-item from the n^th queue, and this selected data- item Q_n is subsequently communicated to the receiving system. The selection of the particular input data stream n is typically effected based on a prioritization scheme. Not illustrated, the system 100' typically includes a means for notifying the receiving system that data from an input-stream is available, and the receiving system selects from among the available streams based on a priority that is associated with the stream. Alternative protocols for controlling the flow of data from a plurality of input-streams are commonly employed, including, for example, transmission control in the system 100' and a combination of transmission and reception control by the system 100' and the receiving system, respectively. In like manner, the selection of the particular input-stream may include any of a variety of schemes, including a first-in- first-out selection (FIFO), a round-robin selection, and so on, in addition to, or in lieu of the aforementioned priority scheme.

The design choices for a multiple-input system include a choice of the size D of the input queues. Based on the estimated input and output flow rates, a queue size D can be determined to minimize the likelihood of an overflow of the queue. For ease of understanding, the queues associated each input-stream 101 ' of system 100' are illustrated as being similarly sized. If it is known that a particular input-stream has a flow rate that substantially differs from the other input-streams, it may be allocated a smaller or larger queue size. As illustrated, the system 100' is configured to allow a maximum bursts of D data-items from any of the input-streams, based on the expected processing speed of the subsequent receiving system. Queuing theory techniques are common in the art for determining an optimal value of D, given an expected distribution of arrivals of data-items at any input-stream and an expected distribution of removals of the data-items by the subsequent receiving system.

Because the queue size D is based on estimated arrival rates of data-items from each input-stream, each queue is sized to accommodate a worst-case estimate of arrivals.

Although a particular input-stream may frequently come near to filling its queue, the likelihood of all of the input-streams simultaneously coming near to filling all of their queues is generally extremely low. Viewed another way, the number of unused memory locations among all of the queues at any given time is generally extremely high, and thus the memory-utilization efficiency of the conventional multiple-queue multiple-input system 100' is extremely low.

In EP 1 481 317 Bl a state-of-the-art multiple-input queuing system, which helps in reduction of the area consumed by memory devices is disclosed. To allow for an independently controlled unloading of the individual data-items from the multiple-input common buffer, the system maintains a mapping of the memory locations of the buffer that is allocated to each data-item in each input-stream. To minimize the memory and overhead associated with maintaining a mapping of each data- item, memory locations that are allocated to each input-stream are maintained in a sequential, first-in, first-out queue. When a subsequent receiving device acknowledges that it is ready to receive a data-item from a particular input-stream, the identification of the allocated memory location is removed from the input-stream's queue, and the data-item that is at the allocated memory in the common buffer is provided to the receiving device. The multiple-input queuing system according to EP 1481317 Bl will be described in more detail below. The multiple-input queuing system of EP 1481317 Bl is disadvantageous in that the usage of memory is not optimal. It is thus an object of the present invention to further improve the efficiency of the memory usage.

SUMMARY OF THE INVENTION

These and other objects are solved by a multiple-input queuing system according to claim 1. The dependant claims relate to advantageous embodiments of the invention.

The system according to the invention provides a means of an efficient and high performance multiple input-single or multiple output system, which minimizes memory requirement. The main advantage of the system according to the invention is thus a significant gain in efficiency of memory usage due to the application of circular buffers to implement the necessary queues required by the mapper. Circular buffers as such are known from the prior art. Circular buffers have a single read and write pointer pointing to the next memory location in the buffer for read and write procedures, respectively. The pointers are modulo D incremented by one each time a read/write is performed in the D unit sized buffer. A quantitative estimation of the memory saved by the system according to the invention when compared to the multiple-input queuing system known from EP 1 481 317 Bl will be made below in connection with the drawings.

Another aspect of the present application is a method of buffering data-items from a plurality of input-streams, including: receiving an allocation request from one or more input-streams of the plurality of input- streams in an allocator, allocating a selected memory-element of a plurality of memory-elements of a plurality of buffers to a selected input-stream of said input-streams, storing a received data-item from the selected input-stream to the selected memory-element in a buffer, storing an address of the selected memory-element corresponding to the selected input- stream in a queue of a mapper, wherein the queue is designed as a circular buffer, receiving an unload request that identifies the selected input-stream, and providing the received data-item from the selected memory-element, based on an identification of the selected memory- element corresponding to the selected input-stream. A further aspect of the present application is a computer readable medium having a computer program stored thereon. The computer program comprises instructions operable to cause a processor to perform at least the above-mentioned method.

The present application can be useful in the field of computer and communications systems and can in particular be applied to a system that receives multiple input-streams that are routed to a common output port. This may also be useful in implementing multimedia or any other algorithms using circular queues with multiple entities storing a finite set of data, which is in general similar. Multiple hosts, for example, may communicate data to a common server; multiple processors may access a common memory device; multiple data streams may be routed to a common transmission media and so on.

Another example of application may be lookup tables (LUT) storing address pointers to different but similar sized buffers.

It should be denoted that only elements relevant to the principle of the present application are specified above. Individual components, which are not relevant for the principle the invention is based on, are partly omitted. However, a person skilled in the art is able to implement such components if the components are needed.

These and other aspects of the present patent application become apparent from and will be elucidated with reference to the following figures. The features of the present application and of its exemplary embodiments as presented above are understood to be disclosed also in all possible combinations with each other.

BRIEF DESCRIPTION OF THE DRAWINGS

In the figures show:

Fig. 1 an example block diagram of a prior art multiple-input queuing system; Fig. 2 an example block diagram of an advanced prior art multiple-input queuing system;

Fig. 3 an example block diagram of a multiple-input queuing system in accordance with this invention; Fig. 4 a detailed block diagram of the mapper in the system according to Fig. 3;

Fig. 5 an example allocation scheme for the allocator in the system according to fig. 3; and Fig. 6 - 9 table showing the calculated memory saving by the system according to the invention for different parameters D, P and I and

Fig. 10 - 29 diagrams showing the calculated memory saving by the system according to the invention for different parameters D, P and I.

DETAILED DESCRIPTION OF THE DRAWINGS

In the following detailed description of the drawings exemplary embodiments of the prior art on the one hand and of the present invention on the other hand will describe and point out the architecture of the system according to the invention. The main advantage of the invention is a significant gain in efficiency of memory usage due to the application of circular buffers to implement the necessary queues required by the mapper.

Fig. 2 illustrates an example block diagram of a multiple-input queuing system 300' as described in EP 1 481 317 Bl. The system 300' includes a dual-port memory 220', wherein writes to the memory 220' are controlled by an allocator 240' and reads from the memory 220' are controlled by a mapper 250'. The write and read processes to and from the memory 220' are symbolically represented by switch 210' and switch 260', respectively. As illustrated in fig. 2, the memory 220' includes P addressable memory- elements and each memory-element is of sufficient width W to contain a data-item from any of the input-streams 101 '. Preferably, the parameter P in system 300' is at least as large as parameter D in system 100' of Fig. 1. However, the system 100' includes a total of N D memory-elements of width W, whereas the memory 220' includes a total of P memory- elements of width W.

The allocator 240' is configured to provide the location of a currently unused memory-element within the memory 220', to which the next data-item from the input-streams 101 ' is directed. As indicated by the dashed lines between the input- streams 101 ' and the allocator 240', the allocator 240' is configured to receive a notification whenever an input- stream 101 ' has a new data-item to be transmitted. The allocator 240' is further configured to note the removal of data-items from the individual memory-elements. As each data-item is removed, the memory-element that had contained this data-item is now available for receiving new data-items, as a currently-unused memory-element. An overflow of the memory 220' only occurs if all P memory-elements are filled with data-items that have not yet been removed. The mapper 250' is configured to assure that data-items are unloaded/removed from the memory 220' in an appropriate order. In a typical embodiment a receiving system (not shown) calls for data-items in a sequence that may differ from the sequence in which the data-items are received at the multiple-input queuing system 300'. The system 300' may be configured to allow the receiving system to specify the input-stream M(n), from which the next data-item is to be sent. In this manner, for example, a process at an input-stream M(n) may initiate a request to send m data-items to the receiving system, and the receiving system subsequently sends m "unload (n)" commands to the queuing system 300' to receive these m data-items, independent of the arrival of other data- items at system 300' from the other input- streams 101 '. That is, relative to each input-stream, the data-items are provided to the receiving system in sequence, but the receiving system may call for the data-items from selected input-streams independent of the order of arrival of data-items from other input- streams.

To allow the receiving system to request a sequence of data-items from a selected input-stream 101', the allocator 240' communicates the allocation of each memory- element location p to each input-stream n as a stream-element pair (n, p), to the mapper 250'. The mapper 250' thereby maintains a list of each memory-element location indicator pn that is sequentially assigned to each arriving data-item from each input-stream, n. When the receiving system requests the "next" data-item from a particular input-stream n the mapper 250' extracts the next location indicator pn from the list associated with the input-stream n and uses that location indicator pn to provide the contents of the memory-element p as the output Qn, via the switch 260'. This location indicator pn is removed from the list associated with the input-stream n, and the allocator 240' thereafter includes the memory-element p as a currently-unused memory location.

In the prior art embodiment of Fig. 2, the mapper 250' includes multiple first- in- first-out (FIFO) queues 355', each queue 355' being associated with one corresponding input-stream 101 ' to the multiple-input queuing system 300'. When the allocator 240' allocates a memory-element p to an input-stream M(n), the address of this memory-element p is stored in the queue corresponding to input-stream M(n), the index n being used to select the queue 355' corresponding to input-stream M(n). As each new data-item is received from an input-stream, the address p at which the data-item is stored, is stored in the queue corresponding to the input-stream in sequential order.

Each queue 355' in the example mapper 250' of Fig. 2 is illustrated as having a queue-length of D, consistent with the prior art queue lengths illustrated in Fig. 1. Note, however, that the width of the queues 110' of Fig. 1 is W, so that the total size of each queue 110' is D W. Because each queue 355' of Fig. 2 is configured to store an address to the P memory-elements, the total size of each queue 355' is D 1Og₂P. Normally, the width of the address, 1Og₂P is generally substantially less than the width of a data-item. For example, if the data-items are 32-bits wide, and the buffer 220' is configured to hold 1024 data-items (Iog₂(1024) = 10), the queues 355' of Fig. 2 will be less than a third (10/32) of the size of the buffers 110' of Fig. 1. When the receiving system requests the next data-item from a selected input-stream, via an "Unload (n)" command, a multiplexer 350' selects the queue corresponding to the selected input-stream M(n) and the next available index p_n is removed from the selected queue 355'. The index p_n is used to select the corresponding memory- element p, via a multiplexer 260', to provide the output Qn corresponding to the "Unload(n)" request from the receiving system.

After the data-item in the memory-element p is selected for output, the allocator 240' marks the memory-element p as a currently unused memory- element, thereby allowing it to be allocated to newly arriving data-items, as required. Also illustrated in Fig. 2 is an example embodiment of a multiple-input, multiple-output, switch 210' that is configured to route a data-item from an input-stream 101 ' to a selected memory-element p in a memory 220'. The example switch 210' includes a multiplexer 310' corresponding to each memory-element of the memory 220', that is enabled via a select (np) command from the allocator 240'. In this prior art embodiment, each multiplexer 310' associated with each memory-element is configured to receive a select (np) command, wherein np identifies the select input-stream that has been allocated to the memory- element. In this manner, the data-item from the n^th input-stream is routed to the p^th memory- element. This allows for the storage of data-items from multiple cotemporaneous input- streams 101 '. The total memory required by the system illustrated in Fig. 1 is calculated as follows:

Ml = N D W The total memory required by the system illustrated in Fig. 2 is calculated as follows:

M2 = P W + N D log₂P,

wherein P W is the memory required by memory 220' and N D 1Og₂P is the memory required by the mapper 250'. Hence memory saved is calculated as:

MSl -> 2 = Ml - M2 = [N D W] - [P W + N D log₂P] = N D . W - IOg₂P) - P W

Assuming that W» log₂P, a substantial amount of memory, is saved when the multiple-input system of Fig. 1 is replaced by the multiple-input queuing system according to Fig. 2.

As will be explained below the memory saving achieved with the multiple- input queuing system according to the invention when compared to the prior art system of Fig. 2 is calculated as:

MS2 ^■* 3 = M2 - M3= N D . 1Og₂(P^1"171 I¹⁷¹) - 1)

Thus, memory is saved when P^1"171 I¹⁷¹ 2. Typically, I 2 and P 1.

Hence, memory is saved in most cases. Moreover, as seen in the above equation, the amount of memory saved is also proportional to the system design parameters D and N. As seen in the charts of Fig. 10 through 29 the memory saved by this system is in the range of 45% up to 85% compared to the system illustrated in Fig. 2.

Fig. 3 and 4 show an example block diagram of a multiple-input queuing system in accordance with the invention. The system according to the invention specifically modifies the mapper, the memory and the allocator of the prior system according to Fig. 2. Comparable components in both systems, like mapper, allocator, buffer and so on, hold the same reference signs.

The system comprises a memory unit 220 that includes at least one buffer B(b). Presently, the system includes a plurality of I independent buffers B(b), b = 0, 1, ...(I- 1). Each buffer B(b) includes a plurality of addressable memory elements configured to store data-items of a plurality of input streams M(n), n = 0, 1,...(N-I), wherein 1 I N. I is a design parameter of the system and can be determined based on the expected input-output flow of the system. Determining an adequate value for I is common in the art. This means that the system according to the invention requires a smaller number of queues (N/I) for N input-streams M(n) compared to the prior art system of Fig. 2 (N queues for N input- streams). Further, the system according to the invention uses I separate buffers B (b), each of size P/I, instead of a single buffer of size P as in case of the prior art system of Fig. 2.

Each memory element is of sufficient width of W bytes to store any data- item of one of the input-streams M (n). Each independent buffer B (b) is of length P/I such the total number of memory elements over all buffers B (b) is I P/I = P. This means that the combined size of these buffers B (b) is the same as that required by the prior art system of Fig. 2. However, the design according to the present invention has the disadvantage that any given buffer B (b) of size P/I in the memory unit 220 may be utilized by N/1 input-streams M (n) only. In contrast, the design mentioned in patent EP 1 481 317 Bl allows all N streams to utilize the entire buffer memory P. To minimize the impact of the above restriction the parameter I is preferably kept to a small value. A typical value of I may be I = 2. The practical choice of I, however, will be influenced by all the parameters, i.e. N, D, P and most important the input-output data flow rate.

The system according to the invention further comprises an allocator 240, which controls writes to the memory unit 220. According to the invention the allocator 240 is configured to allocate a memory element of one of the buffers B (O)-B (1-1) having an address A for storing a data-item from a selected input-stream M (n). The allocator 240 is further configured to receive notifications whenever an input-stream M (n) has a data item to be transmitted as indicated by the dashed arrows in Fig. 3. Moreover, it is presently configured to receive allocation requests from other incoming input-streams M (n) and to determine a relative priority of the allocation requests from the other input-streams and the request from the selected input stream M (n) when multiple input-streams contemporaneously have data to transmit. This may be achieved by using arbitration logic thereby identifying the selected input-streams M (n) based on the relative priority. The arbitration logic of the allocator 240 may be designed such that the reception of further data- items of the selected input-stream M (n) is prevented until the data-item of the selected input- stream M (n) stored in one of plurality of buffers B (O)-B (1-1) is output by the system. Other priority schemes may be implemented including dynamic prioritization based on the content of each data-item, or based on a prior history of transmissions from one or more of the input- streams M (n), and others. Alternatively, a simple round-robin input selection scheme may be used, wherein the allocator 240 sequentially samples each input-stream M (n) for new data.

The allocator 240 is further configured to provide the address A of a currently unused memory element in a selected buffer B (b) in memory unit 220, to which the next data-item from the input-stream M (n) is directed. To achieve this the system presently comprises a multiple-input switch 210 configured to route the data-item from the selected input-stream M (n) to the selected memory-element in selected buffer B (b). In turn, the multiple-input switch 210 comprises a plurality of multiplexers 310 coupled to the allocator 240, each multiplexer 310 being coupled to one specific memory element of the P memory elements in the buffers B (O)-B (1-1). As mentioned above, every buffer B (b) is designed to hold data corresponding to N/I input-streams. The allocator 240 may use different schemes to determine the buffer B (b), to which the data item from a particular input-stream M (n) will be written. Few of these schemes are illustrated below and may or may not be dynamically changed.

Scheme 1 :

The allocator may store M (0) -... M (1-1) date elements of input streams in buffer B (0). In general, B (b) contains streams from M (b*(N/I)) to M ([(b+1) *(N/I)] -1).

Scheme 2:

The allocator ensures that the sum of probability of occurrence of data elements of all input streams for any buffer B (b) is approximately the same. This scheme ensures an optimized usage of all buffers B (b), by allocating streams based on the probability of occurrence of data items.

According to the invention the queues Q (q) are designed as circular buffers. The circular buffer each may comprise a single read pointer R (q, b) and a single write pointer W (q, b), b and q as defined above. Presently, the circular buffers are designed as I- dimensional circular buffers each having I read and I write pointers. The dimension "I" indicates that up to I entities may simultaneously store data in a single location of the circular buffer. In case of I = 2 the circular buffers are 2-dimensional circular buffers. Presently, the circular buffers of the system of the present invention each comprise D buffer elements, i.e. "D" is the size of each buffer. Correspondingly, the circular buffers are modulo D incremented by one each time a read/write is performed. As shown in Fig. 4, each buffer element is divided into two parts, namely Data (f) and Owner (f), f = 0, 1, ...(D-I). Data (f) is the actual data stored in the buffer and Owner (f) stores a kind of meta-data, indicating to which entities the data in the corresponding Data (f) belongs. Thus, the size of Owner (f) is I bits wide, with 1 bit for every entity, which can contribute a value in Data (f). A "1" for the b^th bit in Owner (f) indicates that the corresponding Data (f) is owned by the b^th entity. It is possible to have "0" as well as all "I" entities owning the data in the corresponding Data (f) indicated by the number of "I" bits in Owner (f). The width of each Data (f) in general is unrestricted and is based on the specific requirement. The greater the size of Data (f), the more memory will be saved using I- dimensional circular buffers. It should also be noted, that this scheme is particularly useful in case the Data (f) values are repeated by different entities. This is typically the case when Data (f) stores addresses of similar sized buffers. Presently, Data (f) holds the address A of a memory location of one or more buffers B (b) in the memory unit 220. Data (f) is thus log₂ (P/I) bits wide. Every bit of Owner (f) indicates all the buffers B (b), which have contributed to the value in the corresponding Data (f). The corresponding width of the Owner (f) part is I bits, the overall width W of a buffer element is thus log₂ (P/I) + I bits, i.e. (log₂ (P/I) + I)/8 bytes. Multiple buffers may write the same address An in Data (f) of Q (q) simultaneously updating the respective bit in Owner (f) of Q (q). Since the size of Data (f) is generally much greater than the size of Owner (f), a considerable amount of memory is saved.

The mapper 250 selects the queue Q (q) in accordance with the above- mentioned scheme used by the allocator 240. Each queue element indicates all the buffers B (b), which have contributed the value A to Data (f), using one unique bit per buffer B (b) in Owner (f). This can be ensured by writing a '1 ' in the b^th LSB of Owner (f) ("least significant bit" as commonly referred to in the art) for every buffer B (b) that has contributed to Data (f).

In the following a preferred allocation scheme according to the invention is presented:

The scheme has the following effects:

Each buffer B (b) holds data items received from N/I number of input streams. The data item from input stream M (n), n = 0,1,N-I, is stored in buffer B (b), where b = n div (N/I).

The address of data items from only one stream per buffer is stored in the same queue Q (q). If A is the address of a memory location selected by the allocate 240 to store the data item from M (n) in B (b), A needs to be stored appropriately in queue Q (q), where q = n mod (N/I).

All addresses a corresponding to data items received from any input streams M (n), for all n, is stored in one and only one queue Q (q) for all q.

Receiving a data item from input M(n) in the system according to the invention includes the following steps:

The next valid write location Data [W (b)] in Q (q) is determined with the help of the write pointer W (b). W (b) always points to the next location in Q (q), whose b^th bit in Owner [W (b)] is not set.

If W (b) = R (b) and Owner[R (b)] = 1, an overflow occurs. Note that all Owner (b) were initialized to zero. Continue to step3 otherwise.

The following cases now have to be considered:

Case l : Owner [W (b)] = 0: - AA,, iiss ssttoorreedd iinn Data [W (b)], b^th bit of Owner [W (b)] is set and W (b) is modulo D incremented by 1.

Case 2 : Owner [W (b)] 0 and b t^mh bit of Owner [W (b)] = 0 and

A = Data [W (b)]): - Bth bit of Owner [W (b)] is set and W (b) is modulo D incremented by 1.

Case 3 : Owner [W (b)] 0 and b^th bit of Owner [W (b)] = 0 and

A Data [W (b)]: W (b) is modulo D incremented by 1 and step 1 is carried out again.

Case 4: Owner [W (b)] 0 and b^th bit of Owner [W (b)] 0:

W (b) is modulo D incremented by 1 and step 1 is carried out again.

Indicate to the allocator that the write was successful, enabling the allocator to actually write the data-item received from M (n) in the A^th location of B (b). Notify the allocator, if an overflow has occurred. The arbitration logic of the allocator may then ensure that M (n) is not selected for further data reception until data items corresponding to M (n) are being output by the system as already described above.

It may be noticed that the data is received and output by the system in the same order as it was received for a particular input stream M (n). This is ensured by the use of R (b) and W (b).

It may also be noted that b and q uniquely identify the corresponding input stream M (n) using the relation, n = (b (N/I) + q).

As seen above it is possible for multiple buffers B (b) to store address A in the same queue Q (q).

Outputting a data item from input-stream M (n) in the system according to the invention includes the following steps:

The receiver asserts a Unload (n) command to the mapper to indicate it would like to read the next data item corresponding to input stream M (n).

The mapper selects Q(q), where q = n mod (N/I), and determines the next valid read location in the following manner:

If (R (b) = W (b) and Owner[R (b)] = 0, where b = n div (N/I) then there are 0 available bytes to be read corresponding to M (n). This should be indicated to the allocate, which may take appropriate action to either block the receiver until an input data item from M (n) is available or it can return after a timeout or simply return immediately. Continue to step 4 otherwise.

Here the following cases have to be considered:

Case 1 : b^th bit of Owner[R(b)] = 1 :

- Read A from Data[R (b)].

- Output the data item stored at location A in B (b).

- Indicate to the allocator that the location A in B (b) is free for future use.

- Reset b^th bit of Owner[R (b)] to 0. - R (b) is modulo D incremented by 1. Step 1 is carried out again.

Case 2 : b^th bit of Owner[R(b)] = 0 :

- R (b) is modulo D incremented by 1. Step 1 is carried out again.

As already mentioned, the main advantage of the system according to the invention is that it saves a significant amount of memory compared to the prior art system of Fig. 2. This will elucidated by the following memory calculations:

The total memory required by the system illustrated in Fig. 2 is calculated as follows:

M2 = P W + N D log₂P,

wherein P W is the memory required by memory 220' and N D 1Og₂P is the memory required by the mapper 150'. Hence memory saved is calculated as:

The total memory required by the system according to the invention as illustrated in fig. 3 and fig 4 is calculated as follows:

M3 = (Memory required by memory 220) + (Memory required by Mapper 250 )

= (P/I) I W+((log₂(P/I) + 1) D (NZI)) = P W + ((NZT) D (log₂(PZI) + I)) Thus, we obtain the memory saved by this invention compared to the system demonstrated in Fig.2 as:

MS2 -> 3 = M2 - M3

= {P W + N D log₂P} - {P W + ((NZI) D (1Og₂(PZT) +

I))}

[N D lo_g2P] - [(NZI) D (1Og₂(PZI) + 1)]

(N D) (1Og₂P - (IZI) (lo_g2P - lo_g2l + I)}

(N D) ((1-(1ZI)) 1Og₂P + (1/0 log₂l - l} = (N D) . 1Og₂(P^1"171 I¹V 1)

Thus, memory is saved i.e. MS2 -> 3 > O when

1Og₂(P^1"171 I^17I) > 1 or P⁽¹-^(17I)) I^(17I) > 2

The worst case value of I is 2 since I . 2. Thus, the minimum value of P for which MS2 -> 3 > 0 is true can be derived as follows:

For I = 2, MS2 -> 3 > 0 p^(i/2) 2^(1/2) > 2 p^(i/2) _> 2/(2^(1/2)) p(l/2) _{> 2}(l/2)

P > 2

Hence, unlike the prior art system illustrated in fig. 2, using the system illustrated in fig 3 and fig 4 in this invention, memory is saved even for the worst case scenario when I =2 and P = 2. Typically, P is much greater than 2 and hence the memory saved is much larger. Fig. 6 to 9 and 10 to 29 illustrate by way of tables and charts how much memory is saved for different values for the design parameters N, D, P and I. The memory saved ranges from a minimum of 45 % to as high as 85% for the parameters used.

Claims

CLAIMS:

1. A multiple-input queuing system comprising: a memory unit (220) comprising at least one buffer (B (b)) including a plurality of memory elements configured to store data-items of a plurality of input streams (M (O)-M (N-I)); - an allocator (240) configured to allocate a memory element of said plurality of memory elements each having an address (A) for storing a data-item from a selected input- stream (M (n)) of said plurality of input streams (M (O)-M (N-I)); a mapper (350) having a plurality of queues (Q(0)-Q((N/I)-l)), the mapper (350) being configured to store the address (A) of said memory element in a queue (Q(q)) of said plurality of queues (Q(0)-Q((N/I)-l)), to receive a request (Unload(n)) for an output corresponding to the selected input-stream (M(n)), to determine the address (A) associated with said memory element containing said data- item of said selected input-stream (M(n)) and to provide said data-item from said memory element as the requested output (Qn) through at least one output unit wherein the plurality of queues (Q(0)-Q((N/I)-l)) of the mapper are designed as circular buffers.

2. The system according to claim 1, wherein the circular buffers each comprise a single read pointer R(q, b) and a single write pointer W(q, b).

3. The system according to claim 2, wherein the circular buffers comprise a plurality (D) of buffer elements and wherein the pointers (R (q, b), W (q, b)) of each circular buffer are modulo D and incremented by one each time a read/write is carried out.

4. The system according to claim 1 , wherein the circular buffers are multidimensional (I) circular buffers.

5. The system according to claim 1, wherein the buffer elements are each divided into two parts (Data(f), Owner(f)).

6. The system according to claim 1, wherein the allocator (240) is configured to receive an allocation request from the selected input-stream (M(n)) when the selected input- stream (M(n)) has a data-item to be transmitted.

7. The system according to claim 6, wherein the allocator is configured to receive allocation requests from other incoming input-streams of said plurality of input-streams (M(O)-M(N-I)) and to determine a relative priority of the allocation requests from the other input-streams and the request from the selected input stream (M(n)) using arbitration logic thereby identifying the selected input-stream based on the relative priority.

8. The system according to claim 1, wherein the arbitration logic of the allocator (240) is configured to prevent the reception of further data-items of the selected input-stream (M(n)) until the data-item of the selected input-stream (M(n)) stored in one of plurality of buffers (B(O)-B(I- 1 )) is output by the system.

9. The system according to claim 1, wherein the mapper (250) is configured to provide an output of data-items in the same order as they were received from the selected input-stream (M (n)).

10. The system according to claim 1, wherein the mapper (250) comprises a multiplexer/selector (350) configured to select a queue (Q (q)) corresponding to the selected input-stream (M (n)) upon reception of said request (Unload (n)).

11. The system according to claim 1 , wherein the system further comprises a multiple-input switch (210) configured to route a data-item from the selected input-stream (M (n)) to a selected memory-element in selected buffer (B (b)) of said plurality of buffers (B (O)-B (I-I)).

12. The system according to claim 12, wherein the multiple-input switch (210) comprises a plurality of multiplexers (310) coupled to the allocator (240), each multiplexer (310) being coupled to one memory element of said plurality of memory elements in said plurality of buffers (B (O)-B (1-1)).

13. The system according to claim 1, wherein the system further comprises an output-multiplexer (260) coupled to the memory and the mapper and configured to couple a memory element of said plurality of memory elements in said plurality of buffers (B (O)-B (I- I)) to an output of the system.

14. A method of buffering data-items from a plurality of input-streams (M (O)-M (N-I)), including: receiving an allocation request from one or more input-streams of the plurality of input-streams (M (O)-M (N-I)) in an allocator, - allocating a selected memory-element of a plurality of memory-elements of a plurality of buffers (B (O)-B (1-1)) to a selected input-stream (M (n)) of said input-streams (M (O)-M (N-I)), storing a received data- item from the selected input-stream (M (n)) to the selected memory-element in a buffer (B(b)), - storing an address (A) of the selected memory-element corresponding to the selected input-stream (M(n)) in a queue (Q(n)) of a mapper, wherein the queue (Q(n)) is designed as a circular buffer, receiving an unload request (Unload(n)) that identifies the selected input- stream (M(n)), and - providing the received data-item from the selected memory-element, based on an identification (A) of the selected memory-element corresponding to the selected input- stream (M(n)).

15. The method according to claim 14, wherein the input-stream (M (n)) of said input-streams (M (O)-M (N-I)) is not selected for further data reception until a data- item corresponding to the input-stream (M (n)) is output.

16. The method according to claim 14, wherein data-items from a selected input- stream (M (n)) are output in the same order as they were received from the selected input- stream (M (n)).

17. A computer readable medium having a computer program stored thereon, the computer program comprising: instructions operable to cause a processor to perform a method according to claim 14.