CN101854341B - Pattern matching method and device for data streams - Google Patents

Pattern matching method and device for data streams Download PDF

Info

Publication number
CN101854341B
CN101854341B CN200910132546.1A CN200910132546A CN101854341B CN 101854341 B CN101854341 B CN 101854341B CN 200910132546 A CN200910132546 A CN 200910132546A CN 101854341 B CN101854341 B CN 101854341B
Authority
CN
China
Prior art keywords
pattern
fragment
patterns
mode subset
pattern matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200910132546.1A
Other languages
Chinese (zh)
Other versions
CN101854341A (en
Inventor
郑凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to CN200910132546.1A priority Critical patent/CN101854341B/en
Publication of CN101854341A publication Critical patent/CN101854341A/en
Application granted granted Critical
Publication of CN101854341B publication Critical patent/CN101854341B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a pattern matching method and a pattern matching device for data streams. In the method, a pattern set comprising a plurality of patterns is divided into a plurality of mutually exclusive pattern subsets with a given detection window length, so pattern matching checks are performed on the mutually exclusive subsets in a plurality of pattern matching engines respectively, the searching times of the pattern matching engines are greatly decreased, and the working efficiency of a system is improved correspondingly.

Description

Method for mode matching and device for data flow
Technical field
The present invention relates to a kind of pattern matching for data flow (FPM) method and apparatus.
Background technology
As a novel firewall technology, deep-packet detection (DPI) technology has been widely used in intrusion detection/system of defense (IDS/IPS), has stoped spam/anti-virus, has prevented the fields such as data leak, information filtering.Deep packet inspection technical in depth checks each packet and the payload thereof by fire compartment wall, and how the rule set of DPI engine wherein based on technology such as fingerprint matching, heuristic technique, abnormality detection and statistical analysis decide handle packet.For whether each packet detecting in data flow for example has attack signature, in DPI engine, generally adopted pattern matching/signature search technology, each the suspicious byte in data flow is compared.Yet the amount of calculation that existing pattern matching algorithm is required and the traffic are all very huge.Conventionally, DPI application need to have a large amount of transmission volume suitable with it for the set of patterns that comprises a large amount of patterns, and required computing capability is directly proportional (because DPI not only detects packet header to the linear speed of monitored network interface, but also detect payload), this makes DPI be difficult to deal with thousands of million even linear speed and the huge sets of patterns of ten thousand megabits.
Because pattern matching algorithm has higher requirement to CPU processing speed, and the development of microelectric technique has approached its limit, may there will be very soon " storage wall (memory wall) " problem, be that storage speed has restricted processing speed, therefore adopting parallel (parallelism) algorithm may be that exploitation is for the only way of the extendible pattern matching engine (PM engine) of high performance network intruding detection system (NIDS).The pattern matching algorithm of some executed in parallel has been proposed at present.At present, TCAM (three-state content addressing memory) chip is especially suitable as the pattern matching engine of parallel processing, thereby realizes the hardware-accelerated of convection model coupling.The advantage of TCAM technology be that seek rate is fast, low in energy consumption, and obtained the equipment suppliers' such as Cisco, 3Com extensive support.Therefore, be necessary to provide the solution that execution efficiency is high, can realize the operation of high efficiency parallel pipeline.
Summary of the invention
The present invention aims to provide stream mode matching process and the device that a kind of efficiency is higher, wherein by utilizing the mode subset of mutual exclusion to complete the parallel mode matching of pipeline system.
For this reason, one aspect of the present invention provides a kind of method for mode matching for data flow, the method comprises the following steps: the data flow of input is divided into a plurality of fragments, and these fragments are distributed to respectively in a plurality of pattern matching engines, in the mode subset of a plurality of mutual exclusions under given detection window length of each pattern matching engine storage wherein; And any one pattern matching engine in described a plurality of pattern matching engine carries out pattern matching contrast according to the mode subset of wherein storage to distributed fragment, and in the situation that the mode subset match hit of any one pattern matching engine storage of this fragment and this this fragment is exported.
In order to realize pipeline, alternatively, when the fragment of finding to distribute to it when certain pattern matching engine can not match with corresponding mode subset, this fragment is delivered to another pattern matching engine and proceed contrast; And while all not finding coupling after certain fragment is crossed by all pattern matching engine inspections, report that match hit does not occur this fragment, finish the inspection to this fragment.
In order inerrably each data slot to be carried out to matching detection, the data flow of input is divided into the fragment that length is not less than detection window length, and preferably the length of fragment equals detection window length.Optionally, when data flow is divided into a plurality of fragment, make to need the pattern of contrast not cross over the cut-point of at least two fragments.For this reason, " anti-pattern (the negative pattern) " of the target pattern that can contrast as required carrys out the cut-point on specified data stream, in this cut-point place segmentation, wherein said anti-pattern adds that any suffix and/or prefix can not form described target pattern.
According to the present invention, the mode subset of described a plurality of mutual exclusions can be by the set of patterns that comprises a plurality of patterns is divided and obtained.Optionally, when a set of patterns is divided into the mode subset of mutual exclusion, first this set of patterns is divided into exclusive mode subset as much as possible, and then less mode subset is merged, with the equilibrium of implementation pattern sub-set size.
The present invention also provides a kind of stream mode coalignment for deep-packet detection on the other hand, comprising: a plurality of pattern matching engines, wherein in the mode subset of a plurality of mutual exclusions under given detection window length of each pattern matching engine storage; Stream is cut apart and allocation units, for the data flow of input is divided into a plurality of fragments, and these fragments are distributed to respectively to described pattern matching engine, any one pattern matching engine in wherein said a plurality of pattern matching engine carries out pattern matching contrast according to the mode subset of wherein storage to distributed fragment, and in the situation that the mode subset match hit of any one pattern matching engine storage of this fragment and this this fragment is exported.This stream mode coalignment is correspondingly carried out each step according to said method of the present invention.
Due to when a large set of patterns is divided into the subset of several mutual exclusions, for a fragment, may not need to check all subsets, thereby the present invention has utilized the alternative of mode subset to reduce the number of times of searching of PM engine, make pipeline processes become possibility, thereby greatly improved the operating efficiency of parallel PM engine.By this parallel mode matching based on " mutual exclusion " mode subset, without adopting redundant memory to come extension storage space.
Accompanying drawing explanation
Fig. 1 shows the illustrative diagram of the device of carrying out parallel flow pattern matching according to the present invention;
Fig. 2 shows a kind of exemplary algorithm of set of patterns being divided according to mutual exclusion principle; And
Fig. 3 shows and according to the method for the invention, data is flow to the example of row mode coupling.
Embodiment
Inventor finds by research: under given detection window length, one comprises and is permitted multimodal large set of patterns and always can be divided into several subsets of " mutual exclusion " each other, thereby can by a plurality of parallel PM engines respectively the subset based on these mutual exclusions data stream fragment is processed, to realize efficient pipelining, greatly improve matching speed.
Here, " mutual exclusion (the exclusive) " relation between pattern refers to that same data slot can not mate with two patterns simultaneously.And the mode subset of mutual exclusion refers to such situation: during the equal mutual exclusion of arbitrary pattern PB in the arbitrary pattern PA in a mode subset SA and another mode subset SB, think that these two mode subset SA and SB are " mutual exclusions ".
Inventor notices: after the length w of detection window is given, some pattern under any circumstance (no matter what input traffic is, also no matter pattern in which type of mode, store) all can with the relation of other pattern formation mutual exclusions.For example, the length of supposing detection window is w=7 byte, and pattern P1 is " ABCD ", and pattern P2 is " wxyz ".Obviously, what the character string of no matter inputting is, and these two patterns be how to be stored in TCAM (for example with "? ABCD " or " ABCD " form storage, wherein "? " represent asterisk wildcard), these two always mutual exclusions of pattern.Because can be simultaneously and the character string that matches of pattern P1, P2 must be more than or equal to 4+4=8 byte, always over the detection window length of 7 bytes.
For two pattern P1 and P2, their minimum merges length (MCL) and can represent with function MCL (P1, P2), and this is minimum merges length and equal to comprise P1 and P2 as the minimum length of the pattern of its substring simultaneously.In the above example, pattern P1 and P2 comprise respectively four bytes, and they itself are mutual exclusions, so it is minimum, and to merge length be exactly 4+4=8 byte.For for example pattern P3=" ABC ", the situation of pattern P4=" CAB ", its minimum length M CL (P1, P2)=4 that merge, in the situation that detection window length is 7, these two pattern P3 and P4 are not just mutual exclusions; And when detection window length is 3, P3 and P4 are exactly mutual exclusion.That is to say, it is minimum when merging length M CL (PA, PB) and being greater than detection window length w that and if only if, these two pattern PA and PB under detection window length w always " mutual exclusion ".
Fig. 1 shows the schematic diagram of the device of carrying out parallel flow pattern matching according to the present invention.First, to entering before for example network data (packet) of stream mode coalignment 1 carries out matching detection, as common intruding detection system, do, in stream damper, the data of these arrivals are reassembled into continuous data flow.For the data flow assembling being carried out to the PARALLEL MATCHING of pipeline system, by stream, cut apart and allocation units 101 are divided into several little fragments by data flow, so that at each pattern matching engine PM1, PM2 ..., in PMk, carry out respectively pattern matching contrast.
It is to realize parallel processing for small grain size that ready data flow is divided into little fragment.In order inerrably each data slot to be carried out to matching detection, the length of data stream being cut apart to each data slot obtaining should not be less than the detection window length of each PM engine.In order to realize optimum system operation efficiency, make full use of the advantage of parallel processing, the length of data slot preferably equals the length w of detection window, makes like this live load of each PM engine more balanced.A kind of scheme of splitting traffic is, according to the length requirement of data slot, to take out successively the data slot of this length from data flow.For example, data flow is abcdefg, and the length requirement of data slot is 5, and the data slot being partitioned into is abcde, bcdef, cdefg.
As an alternative, can adopt another kind of data flow splitting scheme.According to this splitting scheme, the target pattern contrasting as required is not crossed over the principle of the cut-point of at least two fragments and is carried out splitting traffic.For this reason, " anti-pattern " of the target pattern that can contrast as required carrys out the cut-point on specified data stream, and in this cut-point place segmentation, wherein said anti-pattern adds that any suffix and/or prefix can not form the subset of described target pattern.This preferred data flow splitting scheme is equally also applicable to the present invention, and it can realize better load balancing, and is conducive to reduce the number of times of execution pattern coupling.
Before carrying out coupling, the mutual exclusion subset division unit 102 in Fig. 1 merges length according to the minimum of given detection window length w and each match pattern, by one, comprises the mode subset SS that set of patterns that many patterns are large is divided into a plurality of " mutual exclusions " 1, SS 2..., SS k, the equal mutual exclusion of each pattern comprising in each pattern comprising in arbitrary mode subset and another mode subset.
Thisly according to alternative, to being divided in of set of patterns, under actual conditions, always can realize, and there are many different splitting schemes (for example adopting the method for exhaustion), as long as the minimum meeting between each pattern comprising in each pattern of comprising in arbitrary mode subset and another mode subset merges the condition that length M CL is all greater than detection window length w, it distinguishes the size that is only amount of calculation.Afterwards, resulting each mutual exclusion subset SS 1, SS 2..., SS kdistributed to respectively a corresponding pattern matching engine PM1, PM2 ..., PMk, in actual applications, can store each mutual exclusion subset respectively on the sheet identical with number of subsets in TCAM.Then, stream is cut apart and allocation units 101 are distributed to each PM engine according to the principle of load balancing by resulting each data slot as far as possible and processed, and contrasts with the mode subset being stored in this PM engine.
If the modal length comprising in initial large set of patterns has surpassed the detection window length that TCAM supports, can according to the support figure place of TCAM, this longer pattern be divided into a plurality of shorter patterns in advance, make its length be no more than detection window length, and then resulting shorter pattern is divided into mode subset according to mutual exclusion principle.For example, to suppose that a longer pattern is for " ABCDEF ", in order being encased in, to support 5 bytes, detection window length is in the TCAM of 5 bytes, this mode division can be become to the pattern of " ABCDE " and " BCDEF " two 5 byte longs.
In the emulation experiment that adopts a typical module collection that has comprised 1993 patterns to complete, adopt four PM engines, the mode subset that will obtain is four.When detection window length w=10 byte, first 1993 patterns are divided into 4369 shorter patterns, make it adapt to detection window length, and then according to mutual exclusion principle, these patterns are divided into the mode subset of four mutual exclusions, in each mode subset, comprise respectively 1088~1105 patterns, distributed to four PM engines and carried out coupling, now can be realized the roughly live load of equilibrium of each PM engine.When w=12 byte, 1993 patterns are divided into 3730 shorter patterns, in resulting four exclusive mode subsets, comprise respectively 930~939 patterns, now still can realize the roughly live load of equilibrium of PM engine.But when w=16 byte, 1993 patterns are divided into 3099 shorter patterns, in four exclusive mode subsets that finally obtain, comprise respectively 636~1207 patterns, and now the live load of each PM engine just can not have been kept in balance again.Therefore the detection window length that suggestion suitably selects TCAM to support in actual applications, to realize good load balancing effect, this can be adjusted according to the situation of actual rule collection.
In addition, it should be noted that the mutual exclusion subset division unit 102 in Fig. 1 is only optional element for pattern matching, because this division can complete off-line, thereby in each pattern matching engine, set in advance the mode subset of mutual exclusion.As mentioned above, detection window length may cause very large impact to the equilibrium situation of the live load of PM engine, and detection window length to take the byte number that TCAM was supported be the upper limit.On the other hand, the number of the mode subset that obtain also has a significant impact the dividing condition of mode subset, and the number of mode subset depends on the quantity of adopted TCAM.Further, the quantity of TCAM and the byte number supported depend on again convenience and the cost factor of hardware designs.Therefore, preferably, the in the situation that of given set of patterns, for different mode subset numbers and detection window length, carry out many experiments off-line, thereby mark off, both meet hardware designs conveniently and cost factor, make again the balanced as far as possible mutual exclusion subset division of live load of PM engine.
Because the mode subset stored in each PM engine is mutual exclusion, thereby the data slot of having found to distribute to it when some PM engines be stored in mode subset in this PM engine while matching, do not need by other PM engines, to be checked again, can directly report that this fragment match hit has occurred and exported this fragment for subsequent treatment, and the fragment allocation that the next one need to be checked to this PM engine to proceed to process, thereby make full use of memory resource.Described subsequent treatment is for example that this is identical with the subsequent treatment of the applicable cases such as common IDS/IPS, repeats no more here according to occurred match hit situation and known rule set, as virus characteristic storehouse contrasts.
If when set of patterns is cut apart, certain longer mode division has been become to some shorter patterns, due to match pattern match pattern not necessarily, therefore can in subsequent treatment, further check whether this data slot further mates this pattern.In the above example, a longer pattern " ABCDEF " is divided into " ABCDE " and " BCDEF " two shorter patterns.The data slot of coupling " ABCDE " this shorter pattern is further checked whether mate " ABCDEF " this longer pattern in subsequent treatment.
Otherwise, if there is no to occur the match hit for this data slot in this PM engine, and also have other PM engine this fragment not to be detected, this fragment is cut apart by stream and allocation units 101 are redistributed to next PM engine and carried out coupling contrast.Repeat, until found match hit or all do not find match hit in a certain PM engine in all PM engines this procedural order.
Alternatively, in order to realize the processing of above-mentioned pipeline system, described stream is cut apart and allocation units 101 can be enclosed an indication vector in each fragment, pattern matching engine was after checking a fragment, the indication vector of this fragment can be modified, thereby indicates this fragment by which pattern matching engine, to be checked.For example, can in each fragment, enclose the indication vector (k represents the number of PM engine) of a k bit after data flow being divided into a plurality of shorter fragments simultaneously, its initial all bits are for example all set as " 0 ".After a data slot is detected by some PM engines, the corresponding bit in its indication vector is set to " 1 ".
For example, after a certain fragment is crossed by all PM engine inspections (all bits in described indication vector are all set to " 1 "), all do not find match hit, report that this fragment is " totally ", does not comprise suspicious data content, thereby no longer it is checked.Certainly, also can give tacit consent to described fragment is " totally ", when not finding match hit, with regard to not needing, specially reports.
It will be appreciated by those skilled in the art that, the description of carrying out with reference to the device shown in Fig. 1 is in fact corresponding to following flow process: the data flow of input is divided into a plurality of fragments, and these fragments are distributed to respectively in a plurality of pattern matching engines, in the mode subset of a plurality of mutual exclusions under given detection window length of each pattern matching engine storage wherein; And according to the mode subset of wherein storage, distributed fragment is carried out respectively to pattern matching contrast by described pattern matching engine, and the in the situation that of match hit, this fragment output is carried out to subsequent treatment.
Owing to a set of patterns being divided under given window, the mode subset of a plurality of mutual exclusions always can realize, and therefore has many different splitting schemes.Fig. 2 has provided a kind of exemplary algorithm of set of patterns being divided according to mutual exclusion principle, and it attempts to realize with rational amount of calculation the division of mutual exclusion subset.The computational complexity of this algorithm can be used O (N 2) represent, N wherein represents the size of set of patterns,, along with the increase of set of patterns, the amount of calculation of this algorithm becomes square increase.The main thought of this algorithm is: first the set of patterns that comprises a plurality of patterns is divided into exclusive mode subset as much as possible, now without the equilibrium situation of considering each subset, and then less mode subset is merged, with the equilibrium of implementation pattern sub-set size.For example, for one, comprise N pattern P 1, P 2..., P nset of patterns (suppose pattern P all in set of patterns 1, P 2..., P nlength be all no more than detection window length w, otherwise also will be as mentioned above larger mode division be become to shorter pattern, to be limited in detection window length w), the mode subset SS corresponding to N sky of model number can be set at first 1, SS 2..., SS n, then according to flow process shown in Fig. 2, compare to pattern one by one and merge.
As shown in the flow chart of Fig. 2, in step S201, first from be not yet divided into the pattern existing mode subset, sort out a pattern as present mode, as P j.Then in step S202, calculate under given detection window length w the present mode P sorting out jwith existing mode subset SS 1, SS 2..., SS nminimum merge length (MCL).Here the MCL of a so-called pattern and a mode subset refers to the minimum value of the minimum merging length of each pattern in this pattern and this mode subset.Then in step S203, judge this pattern P jwhether be all greater than detection window length w with the MCL of each subset, if, be equivalent to this pattern and be all greater than detection window length w with the MCL that has been divided into each pattern in existing mode subset, think this present mode P jwith all patterns in existing mode subset be all mutual exclusion, flow process proceeds to step S204, is about to P jitself is as a new exclusive mode subset SS j.Otherwise, if find this present mode P in step S203 jfor example, with at least one mode subset (SS m, SS n) MCL be less than detection window length w, flow process proceeds to S205.At S205, by present mode P jtogether with all and present mode P jmCL be less than detection window length mode subset together as a new mode subset.Particularly, if found, exist and present mode P jmCL be less than a plurality of mode subset SS of detection window length w m, SS n, by these mode subsets SS m, SS nbe merged into a new mode subset SS j, and by this present mode P jbe divided into this new mode subset SS jin; If only find an existing mode subset SS mwith present mode P jmCL be less than detection window length w, can be directly by present mode P jbe divided into this mode subset SS min.Then, in step S206, judge this present mode P jwhether be last pattern.If not, flow process turns back to step S201, sorts out next pattern as present mode from be not yet divided into the pattern mode subset, repeats above-mentioned steps.If find that at step S206 this present mode has been last pattern, all patterns have all been divided completely, have obtained the subset of several mutual exclusions.Yet the size of these subsets may be very unbalanced.Thereby, at step S207, utilize and known for example for " greedy algorithm (greedyalgorithm) " of " knapsack " problem, some less subsets are merged, thereby make the size of subset become more balanced.For example,, at subset SS 1, SS 2and SS 3in mutually exclusive situation, by SS 2and SS 3the subset obtaining after merging is also and SS certainly 1mutual exclusion, therefore can guarantee the correctness of mutual exclusion subset division.Certainly, the flow process shown in Fig. 2 is only as example, also can adopt other algorithms that set of patterns is divided into some mutual exclusion subsets.
In Fig. 3, illustrated and according to the method for the invention, data have been flow to an actual example of row mode coupling.Here the set of patterns that supposition need to be searched is { ABCD, DEFG, XYYZ, XYZZ, 1234,4321}, when the data slot that will detect is no more than 7 bytes, three subset { ABCD wherein obviously, DEFG}, { XYYZ, XYZZ} and { 1234,4321} is mutually exclusive.As seen from Figure 3, the pattern matching engine of originally having stored 6 patterns is replaced by three less pattern matching engines (TCAM) PM1, PM2, PM3, in each PM engine, store 2 patterns, all by these three PM engines, three data slots are processed respectively at any time simultaneously.
As can see from Figure 3, suppose that the input traffic that will check is by a long character string " ABCDEF gAbCDE1 212C123 4AbCDE1 3AbCDE33...123456... " form; " GA " wherein, " 21 ", " 4A " and " 3A " etc. can be considered to " anti-pattern " of this target pattern collection, and they add that any suffix and/or prefix can not form the subset of the target pattern that will contrast.Using these " anti-patterns " as cut-point, input traffic is divided into a plurality of fragments, as S1=" ABCDEFG ", S2=" ABCDE12 ", S3=" 12C1234 ", S4=" ABCDE13 ", S5=" ABCDE33 ", ..., Sn=" 123456 ", etc.
As shown in Figure 3, at first stage, fragment S1, S2 and S3 are distributed to respectively three pattern matching engine PM1, PM2 and PM3.When execution pattern is mated, find that match hit has occurred the mode subset of storing in S1 and S3 and corresponding pattern matching engine, thereby in follow-up phase, S1 and S3 have completed their matching tasks (not needing again it to be done to further coupling contrast) in this system, and S2 still will be continued by other pattern matching engine to check.At second stage, S2 is assigned to first pattern matching engine PM1, and two new fragment S4 below and S5 are distributed to respectively to second and the 3rd pattern matching engine PM2 and PM3.Like this, this process is carried out successively continuously for whole data flow.
Below the performance of parallel mode matching scheme of the present invention is analyzed:
The probability of supposing generation match hit is x%, and supposes and used y TCAM chip as pattern matching engine, and each TCAM chip is processed a mutual exclusion subset.
For the fragment (suspicious fragment) that match hit may occur, the desired value of the number of the TCAM that it need to travel through is y/2; And for there is no the fragment that match hit is possible, this desired value is y, need to be checked one time by all TCAM chips.Therefore the TCAM rate matched that can save is with reference to following formula:
1 - ( y / 2 ) × x % + y × ( 1 - x % ) y = x % 2
For the ideal situation that match hit does not occur, i.e. x%=0, its efficiency and existing method are identical, yet IDS application often considers, are the frequent worst conditions that occurs suspicious feature code.Under these circumstances, effractor may initiate ddos attack, and wherein suspicious feature code is from a large amount of " corpse machine (corpse machine) ", and match hit rate may approach 100%, and many practical applications all can face this situation.
In experiment, applicant is for by MIT[1] provide with ddos attack hacker packet and SNORT intrusion detection set of patterns, carried out simulation test, 219,148,064 byte (adds that packet header is exactly 329,322,084 bytes) in the packet with attack signature, occurred 56,072,428 times pattern matching is hit, match hit probability is now about 26%, and therefore in this case, the TCAM rate matched that can save has reached 13%.
In addition, " network workload/application class/identification " in the situation that, match hit rate may be much more taller than the situation of IDS/IPS because the content of packet finally all should with set of patterns in one match, match hit rate x% levels off to 100%.In this case, more favourable according to the solution of the present invention, the speed that TCAM searches can save about 50%.
Should be understood that, the embodiment by Fig. 1 to 3 detailed description is only as example above, and is not limitation of the present invention.To the division of the cutting apart of input traffic, mode subset and load balancing and pipeline processes, also can adopt implementation, only otherwise deviating from the present invention adopts mutual exclusion principle to carry out the thought of parallel processing.Pattern matching engine is also not limited to TCAM, also can consider to adopt other to have the processor of PARALLEL MATCHING operational capability.
Embodiments of the present invention can by hardware, software, firmware or it be in conjunction with realizing.One skilled in the art would recognize that also and can in computer program set on the signal bearing medium for any suitable data treatment system, embody the present invention.Sort signal bearing medium can be transmission medium or for the recordable media of machine sensible information, comprises magnetizing mediums, light medium or other suitable media.The example of recordable media comprises: the disk in hard disk drive or floppy disk, the CD for CD-ROM drive, tape, and thinkable other media of those skilled in the art.One skilled in the art would recognize that any communication terminal with suitable programmed device all can carry out the step of the inventive method as embodied in program product.
It should be noted that for the present invention is easier to understand, description has above been omitted to be known for a person skilled in the art and may to be essential some ins and outs more specifically for realization of the present invention.
The object that specification of the present invention is provided is in order to illustrate and to describe, rather than is used for exhaustive or limits the invention to disclosed form.For those of ordinary skill in the art, many modifications and changes are all apparent.
Therefore; selecting and describing execution mode is in order to explain better principle of the present invention and practical application thereof; and those of ordinary skills are understood, do not departing under the prerequisite of essence of the present invention, within all modifications and change all fall into protection scope of the present invention defined by the claims.

Claims (16)

1. for a method for mode matching for data flow, comprise the following steps:
The data flow of input is divided into a plurality of fragments, and these fragments are distributed to respectively in a plurality of pattern matching engines, wherein each pattern matching engine is stored in a plurality of mode subsets of mutual exclusion under given detection window length, the length of wherein data stream being cut apart to each fragment obtaining is not less than described detection window length, wherein the mutex relation between pattern refers to that same fragment can not mate with two patterns simultaneously, and the mode subset of mutual exclusion refers to arbitrary pattern in a mode subset and the equal mutual exclusion of arbitrary pattern in another mode subset,
Any one pattern matching engine in described a plurality of pattern matching engine carries out pattern matching contrast according to the mode subset of wherein storage to distributed fragment;
In the situation that the mode subset match hit of this fragment and this any one pattern matching engine storage is reported that this fragment match hit has occurred and this fragment is exported for subsequent treatment, and is no longer checked by other pattern matching engines;
In response to certain pattern matching engine find to distribute to it fragment can not with the mode subset match hit of this pattern matching engine storage, this fragment is delivered to another pattern matching engine and proceed pattern matching contrast, wherein this fragment was not carried out pattern matching contrast in described another pattern matching engine; And
In response to certain fragment, in all pattern matching engines, carry out all there is no match hit after pattern matching contrast, report that match hit does not occur this fragment, finish the inspection to this fragment.
2. the method for claim 1 is wherein enclosed an indication vector in each fragment, is used to refer to this fragment and which pattern matching engine is carried out pattern matching by and contrast.
3. the method for claim 1, wherein, when the data flow of input is divided into a plurality of fragment, makes to need the pattern of contrast not cross over two cut-points between fragment.
4. method as claimed in claim 3, wherein the anti-pattern of pattern of contrast carrys out cut-point on specified data stream and in this cut-point place segmentation, wherein said anti-pattern adds that any suffix and/or prefix can not form the described pattern that needs contrast as required.
5. the method as described in any one in claim 1 to 4, further comprises:
The set of patterns that comprises a plurality of patterns is carried out to preliminary treatment, so that the mode division that length is wherein greater than to described detection window length becomes length to be no more than the pattern of described detection window length;
Pretreated set of patterns is divided into the mode subset of a plurality of mutual exclusions under given detection window length.
6. method as claimed in claim 5, wherein pretreated set of patterns is divided into described a plurality of under given detection window length the step of the mode subset of mutual exclusion comprise:
Described set of patterns is divided into the mode subset of mutual exclusion as much as possible;
The mode subset of less mutual exclusion is merged, with the equilibrium of implementation pattern sub-set size.
7. method as claimed in claim 6, the step that wherein described set of patterns is divided into the mode subset of mutual exclusion as much as possible comprises:
From be not yet divided into the pattern any one mode subset of having divided, select a pattern,
Calculate this pattern and merge length with the minimum of each mode subset of having divided, wherein the minimum of two patterns merging length equals the minimum length using these two patterns as the pattern of its substring simultaneously, the minimum of a pattern and a mode subset merges the minimum value that length refers to the minimum merging length of each pattern in this pattern and this mode subset
Whether all minimum length that merges that judgement calculates is all greater than given detection window length, judgment result is that to be, using this pattern as a new mode subset in response to this; Otherwise, this pattern is merged to mode subset that length is less than detection window length as a new mode subset together with the minimums of all and this pattern,
Repeat above-mentioned steps, until all patterns are all divided in mode subset in set of patterns, thereby obtain the mode subset of several mutual exclusions.
8. the method for claim 1, further comprise: at least one in the mode subset number of the mutual exclusion of change detection window length and expectation, the set of patterns that comprises a plurality of patterns is repeatedly divided to make the mode subset size equalization of divided mutual exclusion.
9. for a mode matching device for data flow, comprising:
A plurality of pattern matching engines, each pattern matching engine is stored in a plurality of mode subsets of mutual exclusion under given detection window length;
Stream is cut apart and allocation units, for the data flow of input is divided into a plurality of fragments, and these fragments is distributed to respectively to described pattern matching engine, and the length of wherein data stream being cut apart to each fragment obtaining is not less than described detection window length, wherein
Any one pattern matching engine in described a plurality of pattern matching engine carries out pattern matching contrast according to the mode subset of wherein storage to distributed fragment, and in the situation that the mode subset match hit of any one pattern matching engine storage of this fragment and this this fragment is exported;
Wherein when certain pattern matching engine finds to distribute to its fragment can not be with the mode subset match hit of this pattern matching engine storage time, described stream is cut apart and allocation units are delivered to another pattern matching engine by this fragment and proceeded pattern matching contrast, and wherein this fragment was not carried out pattern matching contrast in described another pattern matching engine; And
When described stream, cut apart and when allocation units find not exist described another pattern matching engine, report that match hit does not occur this fragment, finishing the inspection to this fragment;
Wherein the mutex relation between pattern refers to that same fragment can not mate with two patterns simultaneously, and the mode subset of mutual exclusion refers to arbitrary pattern in a mode subset and the equal mutual exclusion of arbitrary pattern in another mode subset.
10. device as claimed in claim 9, wherein said stream is cut apart and allocation units are enclosed an indication vector in each fragment, described pattern matching engine is revised the indication vector of this fragment after checking a fragment, thereby indicate this fragment which pattern matching engine to be carried out pattern matching by, contrasts.
11. devices as claimed in claim 9, wherein said stream cut apart and allocation units when the data flow of input is divided into a plurality of fragment, make to need the pattern of contrast not cross over two cut-points between fragment.
12. devices as claimed in claim 11, wherein said stream is cut apart and the anti-pattern of the pattern that allocation units contrast as required carrys out cut-point on specified data stream and in this cut-point place segmentation, wherein said anti-pattern adds that any suffix and/or prefix can not form the described pattern that needs contrast.
13. devices as described in any one in claim 9 to 12, wherein also comprise mutual exclusion subset division unit, for being carried out to preliminary treatment, the set of patterns that comprises a plurality of patterns so that the mode division that length is wherein greater than to described detection window length becomes length to be no more than the pattern of described detection window length, and pretreated set of patterns is divided into the mode subset of a plurality of mutual exclusions under described detection window length.
14. devices as claimed in claim 13, first wherein said mutual exclusion subset division unit is divided into described pretreated set of patterns the mode subset of mutual exclusion as much as possible, and then less mode subset is merged, with the equilibrium of implementation pattern sub-set size.
15. devices as claimed in claim 14, following operation is carried out in wherein said mutual exclusion subset division unit when described set of patterns being divided into the mode subset of mutual exclusion as much as possible:
From be not yet divided into the set of patterns any one mode subset of having divided, select a pattern, calculate this pattern and merge length with the minimum of each mode subset of having divided, wherein the minimum of two patterns merging length equals the minimum length using these two patterns as the pattern of its substring simultaneously, the minimum of a pattern and a mode subset merges the minimum value that length refers to the minimum merging length of each pattern in this pattern and this mode subset
Whether all minimum length that merges that judgement calculates is all greater than given detection window length, judgment result is that to be, using this pattern as a new mode subset in response to this; Otherwise, this pattern is merged to mode subset that length is less than detection window length as a new mode subset together with the minimums of all and this pattern,
Repeat aforesaid operations, until all patterns are all divided in mode subset in set of patterns, thereby obtain the mode subset of several mutual exclusions.
16. devices as claimed in claim 13, further comprise mutual exclusion subset division unit, for changing at least one of mode subset number of the mutual exclusion of detection window length and expectation, the set of patterns that comprises a plurality of patterns is repeatedly divided to make the mode subset size equalization of divided mutual exclusion.
CN200910132546.1A 2009-03-31 2009-03-31 Pattern matching method and device for data streams Expired - Fee Related CN101854341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910132546.1A CN101854341B (en) 2009-03-31 2009-03-31 Pattern matching method and device for data streams

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910132546.1A CN101854341B (en) 2009-03-31 2009-03-31 Pattern matching method and device for data streams

Publications (2)

Publication Number Publication Date
CN101854341A CN101854341A (en) 2010-10-06
CN101854341B true CN101854341B (en) 2014-03-12

Family

ID=42805613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910132546.1A Expired - Fee Related CN101854341B (en) 2009-03-31 2009-03-31 Pattern matching method and device for data streams

Country Status (1)

Country Link
CN (1) CN101854341B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016095103A1 (en) * 2014-12-16 2016-06-23 华为技术有限公司 Storage space management method and device
CN106549969B (en) * 2016-11-21 2019-10-22 英赛克科技(北京)有限公司 Data filtering method and device
CN108259426B (en) * 2016-12-29 2020-04-28 华为技术有限公司 DDoS attack detection method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6631466B1 (en) * 1998-12-31 2003-10-07 Pmc-Sierra Parallel string pattern searches in respective ones of array of nanocomputers
CN1748205A (en) * 2003-02-04 2006-03-15 尖端技术公司 Method and apparatus for data packet pattern matching
CN101296114A (en) * 2007-04-29 2008-10-29 国际商业机器公司 Parallel pattern matching method and system based on stream

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6631466B1 (en) * 1998-12-31 2003-10-07 Pmc-Sierra Parallel string pattern searches in respective ones of array of nanocomputers
CN1748205A (en) * 2003-02-04 2006-03-15 尖端技术公司 Method and apparatus for data packet pattern matching
CN101296114A (en) * 2007-04-29 2008-10-29 国际商业机器公司 Parallel pattern matching method and system based on stream

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
入侵检测中一种新的多模式匹配算法;李庚等;《计算机应用研究》;20080831;第25卷(第8期);2474-2476 *
入侵检测技术中一种改进的字符串匹配算法的研究;陈瀛;《中国优秀硕士学位论文全文数据库信息科技辑》;20080215(第2期);1674-0246 *
李庚等.入侵检测中一种新的多模式匹配算法.《计算机应用研究》.2008,第25卷(第8期),2474-2476.
陈瀛.入侵检测技术中一种改进的字符串匹配算法的研究.《中国优秀硕士学位论文全文数据库信息科技辑》.2008,(第2期),1674-0246.

Also Published As

Publication number Publication date
CN101854341A (en) 2010-10-06

Similar Documents

Publication Publication Date Title
EP3435623B1 (en) Malware detection using local computational models
US9990583B2 (en) Match engine for detection of multi-pattern rules
CN109117634B (en) Malicious software detection method and system based on network traffic multi-view fusion
CN101442540B (en) High speed mode matching algorithm based on field programmable gate array
Yuan Phd forum: Deep learning-based real-time malware detection with multi-stage analysis
US9160639B2 (en) Network flow abnormality detection system and a method of the same
Zheng et al. Algorithms to speedup pattern matching for network intrusion detection systems
CN104978521A (en) Method and system for realizing malicious code marking
Nguyen et al. Toward a deep learning approach for detecting php webshell
Dener et al. Stlgbm-dds: An efficient data balanced dos detection system for wireless sensor networks on big data environment
CN101854341B (en) Pattern matching method and device for data streams
More et al. Trust-based voting method for efficient malware detection
US11308393B2 (en) Cyber anomaly detection using an artificial neural network
US11223641B2 (en) Apparatus and method for reconfiguring signature
Aggarwal et al. Static malware analysis using pe header files api
CN1691581A (en) Multi-pattern matching algorithm based on characteristic value and hardware implementation
Ali et al. Scalable malware clustering using multi-stage tree parallelization
CN106936561A (en) A kind of side-channel attack protective capacities appraisal procedure and system
CN114398887A (en) Text classification method and device and electronic equipment
CN101848091B (en) Method and system for processing data search
Soewito et al. Methodology for evaluating dna pattern searching algorithms on multiprocessor
Gaurav et al. A DDoS Attack Detection System for Industry 5.0 using Digital Twins and Machine Learning
CN112995222B (en) Network detection method, device, equipment and medium
Komil et al. Development method of code detection system on based racewalk algorithm on platform FPGA
KR102053781B1 (en) Apparatus and method for extracting signiture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140312

Termination date: 20210331