WO2012048744A1 - Application identification through data traffic analysis - Google Patents

Application identification through data traffic analysis Download PDF

Info

Publication number
WO2012048744A1
WO2012048744A1 PCT/EP2010/065413 EP2010065413W WO2012048744A1 WO 2012048744 A1 WO2012048744 A1 WO 2012048744A1 EP 2010065413 W EP2010065413 W EP 2010065413W WO 2012048744 A1 WO2012048744 A1 WO 2012048744A1
Authority
WO
WIPO (PCT)
Prior art keywords
collection
per
traffic
scores
application
Prior art date
Application number
PCT/EP2010/065413
Other languages
French (fr)
Inventor
Geza Szabo
Zoltán Richárd TURÁNYI
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to US13/876,288 priority Critical patent/US20130194930A1/en
Priority to PCT/EP2010/065413 priority patent/WO2012048744A1/en
Publication of WO2012048744A1 publication Critical patent/WO2012048744A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/0213Standardised network management protocols, e.g. simple network management protocol [SNMP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/18Protocol analysers

Definitions

  • the present invention relates to a method and apparatus for processing traffic in a packet switched telecommunications network.
  • ISP Internet Service Providers
  • DP I Deep Packet I nspection
  • ISPs may then apply different charging policies, traffic shaping, and offer different QoS guarantees to selected users or applications.
  • Many critical network services may rely on the inspection of packet payload content, instead of only looking at the structured information found in packet headers. It is clear that forwarding or analyzing packets based on content requires new techniques in network devices.
  • N I DS/NI PS Network Intrusion Detection and Prevention Systems
  • Layer 7 network devices switches, firewalls, etc
  • content-based traffic management Such systems frequently perform a set of time-critical operations to verify certain network patterns or behavior while trying to minimize packet processing latency.
  • the present applicant has appreciated the desirability of providing an improved method for processing and analysing traffic in a packet switched telecommunications network.
  • a method of or for use in processing, analysing or profiling traffic in a packet switched telecommunications network During a first phase, for each of a plurality of applications, traffic generated by the application is analysed to identify a collection of one or more characteristic bit sequences for the application, or at least such a plurality of collections is provided.
  • traffic is received from the network, and the following steps are performed for each of at least one of the plurality of collections: (i) for each of at least one of the characteristic bit sequences in the collection: a sequence alignment process is performed on the received traffic against the characteristic bit sequence to derive a per-sequence score; and (ii) a per- collection score is assigned to the collection based on the per-sequence scores for the collection, the per-collection score being indicative of a likelihood that the traffic was generated by the application associated with the collection.
  • the method may comprise managing traffic in the network based on the per-collection scores, or at least arranging for or causing such managing.
  • Managing traffic may comprise at least one of: determining or applying a charging policy in the network, traffic shaping in the network, and determining or applying a QoS guarantee in the network.
  • the method may comprise analysing or profiling the received traffic based on the per- collection scores, or at least arranging for or causing such analysing or profiling.
  • the method may comprise identifying the application that generated the received traffic based on the per-collection scores.
  • the application that generated the received traffic may be identified as being the application associated with the collection having a per-collection score that is indicative of the highest likelihood.
  • At least one of the applications may represent a group or class of applications, for example applications of the same or similar type.
  • the received traffic may comprise a plurality of packets.
  • Accumulated per-collection scores may be maintained for the respective collections, such that at least one step that is performed based on per-collection scores is performed at least partly based on the accumulated per-collection scores.
  • the accumulated per-collection scores may be normalised.
  • the per-collection score for a collection may be derived from at least one of the mean, mode and median of the per-sequence scores for the collection.
  • An apparatus for processing, analysing or profiling traffic in a packet switched telecommunications network.
  • An element is provided for, in relation to each of a plurality of applications: analysing traffic generated by the application to identify a collection of one or more characteristic bit sequences for the application, or at least providing such a plurality of collections.
  • An element is provided for receiving traffic from the network.
  • An element for, in relation to each of at least one of the plurality of collections, performing the following steps: for each of at least one of the characteristic bit sequences in the collection: performing a sequence alignment process on the received traffic against the characteristic bit sequence to derive a per- sequence score; and assigning a per-collection score to the collection based on the per-sequence scores for the collection, the per-collection score being indicative of a likelihood that the traffic was generated by the application associated with the collection.
  • the program may be carried on a carrier medium.
  • the carrier medium may be a storage medium.
  • the carrier medium may be a transmission medium.
  • an apparatus programmed by such a program There is provided a storage medium containing such a program.
  • An embodiment of the present invention offers a technical advantage of addressing the issue mentioned above relating to the prior art.
  • Technical advantages are set out in more detail below.
  • FIG 1 illustrates schematically apparatus according to an embodiment of the present invention
  • Figure 2 is a schematic flowchart illustrating a method according to an embodiment of the present invention
  • Figure 3 is a plot illustrating the total sum of alignment scores per application vs the application motifs
  • Figure 4 is a plot illustrating the sum of ⁇ sum score/number of flows ⁇ value per motif cluster
  • FIG. 5 shows several possible network nodes in which an embodiment of the present invention could be implemented.
  • Figure 6 schematically illustrates parts of the apparatus of Figure 1 in more detail. Detailed description As mentioned above, it is desirable to provide an improved method for processing and analysing traffic in a packet switched telecommunications network.
  • Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels (that is, insertion or deletion mutations) introduced in one or both lineages in the time since they diverged from one another.
  • sequence alignments of proteins the degree of similarity between amino acids occupying a particular position in the sequence can be interpreted as a rough measure of how conserved a particular region or sequence motif is among lineages.
  • Sequence alignment is described, for example, in the book "Sequence Alignment: methods, models, concepts, and strategies" by Michael S. Rosenberg.
  • Motif finding algorithms can be used to create regular expressions ["Randomized a l g o r i t h m s a n d m o t i f i n d i n g , " h 1 1 p : / / b i x .ucsd.edu/bioalgorithms/ presentations/Ch 12_RandAlgs.pdf].
  • Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs.
  • DNA deoxyribonucleic acid
  • Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade.
  • An embodiment of the present invention uses an approximate string matching (ASM) technique based on a sequence alignment procedure for Deep Packet Inspection.
  • ASM defines scores for the characterization of the goodness of fitting for a signature on an input candidate.
  • Apparatus according to an embodiment of the present invention is shown illustratively in Figure 1 , comprising three main units: unit A, unit B and unit C.
  • a schematic flow chart is provided in Figure 2 to illustrate the method performed by the apparatus of Figure 1 .
  • the method of Figure 2 is shown as divided into three phases: phase 1 , phase 2, and phase 3. These three phases 1 , 2 and 3 are performed respectively by units 1 , 2 and 3.
  • Phase 1 is a characteristic bit sequence (or motif) finding phase.
  • Phase 2 is a sequence alignment phase.
  • Phase 3 is a phase in which various steps may be performed based on the results of phase 2.
  • phase 1 is for finding characteristic bit sequences for a plurality of different applications.
  • a characteristic bit sequence for an application can be considered to be a sequence of bits that occurs regularly or consistently in traffic generated by that application (a re-occuring bit sequence), and/or which can be said to characterise the traffic generated by that application. Characteristic bit sequences are often referred to in the literature as motifs or signatures.
  • Unit A has a source A1 of traffic generated by a plurality of different applications, with the application that generated the traffic being known.
  • the source A1 may be a store (for example a temporary store) of traffic collected from network N, or may be a direct feed or input from the network N.
  • traffic may comprise a single packet, though more usually it would comprise a plurality of packets.
  • a single application may generate a lot of traffic, the first several packets of which should be inspected since they comprise the characteristic bit sequences; further packets comprise data only, which are not generally characteristic to the application.
  • Steps S1 to S4 of Figure 2 are performed by processor A2 of unit A.
  • step S1 one of (or the next one of) the applications is selected for processing, and in step S2 the traffic associated with that application is received or retrieved or filtered out from source A1 and analysed to identify a collection of one or more characteristic bit sequences (or motifs or signatures) for the application.
  • step S3 the collection of characteristic bit sequences for the application is stored in storage A3 of unit A.
  • I n step S4 it is determined whether there are further applications of the plurality to process; if so then processing passes back to step S1 and if not then processing continues to step S5.
  • step S2 There are several well known motif finding tools which can be used in step S2.
  • PLoS Comput Biol 4(5): ⁇ 1 000071. doi:10.1371/journal.pcbi.1000071 ] can be used to process the network traffic and create application specific characteristic bit sequences accordingly.
  • the process can end up in several candidate characteristic bit sequences which are expected to be characteristic for different types of application traffic. For example, several characteristic bit sequences can be found for signaling and data transfer flows of the same Peer-to-peer (P2P) application.
  • P2P Peer-to-peer
  • the traffic for three applications App 1 , App 2 and App 3 are illustrated in traffic source A1 .
  • three collections Collection 1 , Collection 2, and Collection 3 of appl ication-specific characteristic bit sequences, corresponding respectively to App 1 , App 2 and App 3, have been found and placed in storage A3 (or sent directly to unit B).
  • Unit B has a source B1 of network traffic.
  • the source B1 may be a store (for example a temporary store) of traffic collected from network N, for processing offline, or may be a direct feed or input from the network N, for processing online or in real time. Receipt of the traffic at unit B is represented by step S5 of Figure 2. Steps S6 to S 1 1 are performed by a sequence alignment processor B2 of unit B. In step S6 one (or the next) of the plurality of collections in the store A3 is selected for processing. Within the selected collection, one (or the next) of the characteristic bit sequences in the collection is selected in step S7 for processing.
  • step S8 a sequence alignment process is performed on the received traffic against the selected characteristic bit sequence to derive a per-sequence score.
  • step S9 it is determined whether there are any further characteristic bit sequences in the current collection to process. If yes, then processing returns to step S7; if not, then processing continues to step S10.
  • step S10 a per-collection score is assigned to the current collection based on the per-sequence scores for the collection.
  • the per-collection score can be considered to be indicative of a likelihood that the traffic received in step S5 was generated by the application associated with the collection.
  • the per-collection score for the collection can be derived from the mean, mode or median of the per- sequence scores for the collection.
  • step S1 1 it is determined whether there are any further collections of characteristic bit sequence from the store A3 to process. If yes, then processing returns to step S6; if not, then processing continues to step S12.
  • step S12 which is performed by the per-collection scores processor C1 of unit C, with the common factor being that step S12 represents a process that uses the per-collection scores from step S10.
  • step S12 may comprise identifying the application that generated the traffic received in step S5 based on the per-collection scores. The application that generated the received traffic may be identified as being the application associated with the collection having a per-collection score that is indicative of the highest likelihood.
  • the scoring scheme is such that a higher per-collection score is indicative of a higher likelihood, this would amount to selecting the application associated with the collection having the highest per-collection score.
  • traffic from unknown application App X is received, and the per-collection scores derived for each of the Collections 1 , 2 and 3 are A, B and C respectively. If per-collection score C is greatest, then App X can be identified as (most likely being) App 3, which is the application associated with Collection C.
  • Step S12 may comprise analysing or profiling the received traffic based on the per- collection scores, or at least arranging for or causing such analysing or profiling.
  • Step S 12 may comprise managing traffic in the network N based on the per-collection scores, or at least arranging for or causing such managing.
  • managing traffic may comprise determining or applying a charging policy in the network. It may comprise traffic shaping in the network. It may comprise determining or applying a QoS guarantee in the network. This is particularly applicable in the situation where steps S5 to S1 1 are repeated multiple times, to gather information relating to a significant amount of network traffic. Repetition of these steps would allow accumulated per-collection scores to be determined, such that further analysis or processing can be based on the accumulated per-collection scores.
  • the per-collection scores are accumulated by summing the respective per-collection scores from different passes through steps S5 to S1 1.
  • the accumulated per-collection scores can be analysed or reviewed to get a sense for which applications are generating most traffic over the network, which in turn may be used to manage traffic in the network as mentioned above.
  • These accumulated per-collection scores may be normalised, for example based on the number of traffic flows that are being processed . I n this respect, in a TCP/I P context a "flow" can be considered to be a TCP/IP connection between two end points, identified e.g. by source/destination port and IP addresses.
  • the traffic for a particular flow can be processed using a method as described above, with the information being used directly to determine which application most likely generated that traffic.
  • the unknown flows can be considered per host, per port (i.e. the same generating client host from the same source port to several destination IPs and ports); this is a regular behaviour of services.
  • a web server where the clients access TCP port 80 from many different I Ps coming from any possible ports. From the view of the web server, the flows can be considered as the 'same' application as they access the same service. If the web server also hosts an SNMP mail server, then flows going to port 25 have similar behavior and also can be considered together.
  • the unknown flows can be considered per host. In such a case it can be determined that the user has a mix of specific applications. This information is also helpful in case the task is user profiling.
  • an active measurement is taken and the task is to categorize the new application into existing ones. For example, suppose that a new P2P client is being released. It is installed and a measurement is done with a PC . The task is to match it to existing motif-application collections whether the application uses BitTorrent protocol, eDonkey, etc. or come completely new type. In such a case normalisation can be also done. It is known in advance that the set of flows belongs to the same application. In each of the second to fourth scenarios described above, it may be appropriate to normalise the per-collection scores based on flow numbers.
  • FIG. 3 shows the accumulated per-collection alignment scores for each collection, depicted in contour form.
  • application traffic known to be generated by Gnutella reading along the horizontal axis labelled Gnutella, one can see a very high score of between 9000 and 10000 against the Gnutella characteristic bit sequence collection, with a very low score (around 0) on the surrounding intersections.
  • the various score contours in between are drawn onto the plot, resulting in very tightly packed contours around the Gnutella-Gnutella intersection.
  • the scoring scheme used for Figure 3 means that the number of flows will influence the overall score, so that a large number of flows each generating a small score for a particular collection will still have a large impact on the overall score for that collection.
  • Figure 4 shows another scoring scheme, where the accumulated per-cluster scores have been normalized with flow number; such a scoring scheme avoids the possible dominating effect that applications generating large flow numbers can have on the overall scores. The results show that the highest scores occur mostly in the diagonal. These scores reflect the existence of unambiguous characteristic bit sequence collections for most of the applications, e.g. BitTorrent, MSN, Gnutella, POP3, etc.
  • an "application” in the context of an embodiment of the present invention can be considered to represent a single application, or a group or class of applications, for example applications of the same or similar type, and the term "application” is to be understood accordingly.
  • the DFA has O(n) complexity where n is the length of input string.
  • the sequence alignment has O(nm) complexity [Hans-Joachim Bockenhauer, Dirk Bongartz: Algorithmic aspects of bioinformatics, Springer, ISBN- 978-3-540-71912-0 2007] where n is the length of the input string, m is the length of the motif.
  • the difference is linear, thus the algorithm may be a proper candidate on e.g., post processing of such traffic which can not be identified with the common DPI techniques.
  • FIG. 5 illustrates several possible network nodes in which an embodiment of the present invention could be implemented.
  • Example network nodes that are suitable for supporting functionality according to an embodiment of the present invention are those such as gateway nodes (e.g. serving and packet gateway nodes) which are in a position to observe the network traffic of several users.
  • Examples shown in Figure 4 are a Radio Base Station (RBS) 2, a Serving GPRS Support Node (SGSN) 4, a Gateway GPRS Support Node (GGSN) 6 in a 3G network, and a Broadband Remote Access Server (BRAS) 8 and a Digital Subscriber Line Access Multiplexer (DSLAM) 10 in a DSL network.
  • a Wireless Local Area Network (WLAN) access point 12 is a relatively low aggregation point and therefore is a less preferred candidate.
  • WLAN Wireless Local Area Network
  • One advantage of an embodiment of the invention is to enable the DPI engines to use such signature sets which would otherwise give false positive hits on their own.
  • '@hotmail.com' for MSN is a good factor of the sum characteristic bit sequence score (as the MSN passports usually creates a hotmail address for the user), but not application specific on its own.
  • every characteristic bit sequence is specific for only one application but using the sum of the characteristic bit sequence scores for one specific application make them a fairly reliable indicator for an application.
  • characteristic bit sequences are the application descriptors which known to be changed deliberately, e.g. for e-mail spam and other text-like characteristics protocols, e.g., VIAGRA -> V.I.A.G.R.A.
  • the characteristic bit sequences are even more robust for protocol version changes over time than regular expressions. For example, new option fields in a protocol do not affect the characteristic bit sequences much.
  • blocks S1 to S4 can be considered to represent respective blocks within unit A2
  • blocks S5 to S1 1 can be considered to represent respective blocks within unit B2
  • block S12 can be considered to represent a block within unit C1 .
  • Figure 6 shows processors P1 to P4 in unit A2 for performing steps S1 to S4 respectively, processors P5 to P 1 1 i n un it B2 for perform ing steps S5 to S 1 1 respectively, and processor P12 in unit C1 for performing step S12.
  • operation of one or more of the above-described components can be provided in the form of one or more processors or processing units, which processing unit or units could be controlled or provided at least in part by a program operating on the device or apparatus.
  • the function of several depicted components may in fact be performed by a single component.
  • a single processor or processing unit may be arranged to perform the function of multiple components.
  • Such an operating program can be stored on a computer-readable medium, or could, for example, be embodied in a signal such as a downloadable data signal provided from an Internet website.
  • the appended claims are to be interpreted as covering an operating program by itself, or as a record on a carrier, or as a signal, or in any other form.
  • units A, B and C as shown in Figure 1 may be provided in a single apparatus in a single location, it is also possible that the three units A, B and C are provided in three separate locations. Example locations are illustrated in Figure 5 and described above.
  • the characteristic bit sequence finding tasks performed in phase 1 by unit A may be performed in advance by a third party, with the results (collections of characteristic bit sequences) from phase 1 being provided subsequently as input to phase 2.
  • the resu lts (per-collection scores) from phase 2 need not be used straight away in phase 3, but instead may be stored and distributed to another location for the performance there of the phase 3 analysis.

Abstract

There is provided a method of processing, analysing or profiling traffic in a packet switched telecommunications network. During a first phase (S1 to S4), for each of a plurality of applications, traffic generated by the application is analysed (S2) to identify a collection of one or more characteristic bit sequences for the application, or at least such a plurality of collections is provided. During a second phase (S5 to S11), traffic is received from the network (S5), and the following steps are performed for each of at least one of the plurality of collections: (i) for each of at least one of the characteristic bit sequences in the collection: a sequence alignment process (S8) is performed on the received traffic against the characteristic bit sequence to derive a per-sequence score; and (ii) a per-collection score is assigned to the collection (S10) based on the per- sequence scores for the collection, the per-collection score being indicative of a likelihood that the traffic was generated by the application associated with the collection.

Description

APPLICATION IDENTIFICATION THROUGH DATA TRAFFIC ANALYSIS Technical field
The present invention relates to a method and apparatus for processing traffic in a packet switched telecommunications network.
Background
Gaining an in-depth understanding of the Internet traffic profile is a challenging task, and an important requirement for most Internet Service Providers (ISP). To this end, Deep Packet I nspection (DP I ) helps I SPs i n the quest for profiling networked applications. With this information in hand, ISPs may then apply different charging policies, traffic shaping, and offer different QoS guarantees to selected users or applications. Many critical network services may rely on the inspection of packet payload content, instead of only looking at the structured information found in packet headers. It is clear that forwarding or analyzing packets based on content requires new techniques in network devices.
First DPI tools and techniques have relied on simple mechanisms that basically compare the content of the packet payload with a set of strings, which essentially represents a given "signature" from an application. Recently, DPI techniques are replacing strings sets with regular expressions due to their increased expressiveness. Systems requiring DPI are Network Intrusion Detection and Prevention Systems (N I DS/NI PS), Layer 7 network devices (switches, firewalls, etc), and content-based traffic management. Such systems frequently perform a set of time-critical operations to verify certain network patterns or behavior while trying to minimize packet processing latency.
Most DPI systems express patterns using regular expressions [Smith, R., Estan, C, Jha, S., and Kong, S. 2008. "Deflating the big bang: fast and scalable deep packet inspection with extended finite automata". SIGCOMM Comput. Commun. Rev. 38, 4 (Oct. 2008), 207-218. DOI= http://doi.acm.org/10.1 145/1402946.1402983]. A natural way to perform pattern matching is through the use of Finite Automaton (FA). FAs are state machines that can recognize patterns expressed by regular expressions.
The most accurate method to recognize protocols would be complete protocol parsing. As these techniques are very resource consuming, DPI is used which searches for characteristic byte signatures in the traffic. This technique is accepted to be the most accurate among the traffic classification techniques [A. Callado, G. Szabo, B. P. Gero, J . Kelner, S. Fernandes, D. Sadok: Survey on I nternet Traffic Identification and Classification, IEEE Communications Surveys and Tutorials, 2009, Vol. 1 1 , Num. 3, pp. 37-52] but it should be noted that this technique remains a heuristic. For example, the chance of encountering a specific DPI signature in a uniformly distributed network traffic - in terms of byte values - is ~1/256L where L is the length of the signature.
The present applicant has appreciated that current DPI based systems consider the result of the DPI system as a final verdict. In case of a match occurs, the traffic is classified to the signature of the application which generated the hit. All information in connection with the reliability of the hit is lost.
Those signatu res which are very characteristic feature of the protocol - e.g., '@hotmail.com' for MSN traffic - on one hand but may create false positive hits on the other can not be used in the DPI systems at all as it would make the whole process unreliable.
In case there are minor changes in the protocol for which a specific regular expression matches, e.g. an insertion of a new optional field, the regular expression has to be updated.
The present applicant has appreciated the desirability of providing an improved method for processing and analysing traffic in a packet switched telecommunications network.
Summary
There is provided a method of or for use in processing, analysing or profiling traffic in a packet switched telecommunications network. During a first phase, for each of a plurality of applications, traffic generated by the application is analysed to identify a collection of one or more characteristic bit sequences for the application, or at least such a plurality of collections is provided. During a second phase, traffic is received from the network, and the following steps are performed for each of at least one of the plurality of collections: (i) for each of at least one of the characteristic bit sequences in the collection: a sequence alignment process is performed on the received traffic against the characteristic bit sequence to derive a per-sequence score; and (ii) a per- collection score is assigned to the collection based on the per-sequence scores for the collection, the per-collection score being indicative of a likelihood that the traffic was generated by the application associated with the collection. The method may comprise managing traffic in the network based on the per-collection scores, or at least arranging for or causing such managing.
Managing traffic may comprise at least one of: determining or applying a charging policy in the network, traffic shaping in the network, and determining or applying a QoS guarantee in the network. The method may comprise analysing or profiling the received traffic based on the per- collection scores, or at least arranging for or causing such analysing or profiling.
The method may comprise identifying the application that generated the received traffic based on the per-collection scores.
The application that generated the received traffic may be identified as being the application associated with the collection having a per-collection score that is indicative of the highest likelihood.
At least one of the applications may represent a group or class of applications, for example applications of the same or similar type.
The received traffic may comprise a plurality of packets. Accumulated per-collection scores may be maintained for the respective collections, such that at least one step that is performed based on per-collection scores is performed at least partly based on the accumulated per-collection scores. The accumulated per-collection scores may be normalised. The per-collection score for a collection may be derived from at least one of the mean, mode and median of the per-sequence scores for the collection.
An apparatus is provided for processing, analysing or profiling traffic in a packet switched telecommunications network. An element is provided for, in relation to each of a plurality of applications: analysing traffic generated by the application to identify a collection of one or more characteristic bit sequences for the application, or at least providing such a plurality of collections. An element is provided for receiving traffic from the network. An element is provided for, in relation to each of at least one of the plurality of collections, performing the following steps: for each of at least one of the characteristic bit sequences in the collection: performing a sequence alignment process on the received traffic against the characteristic bit sequence to derive a per- sequence score; and assigning a per-collection score to the collection based on the per-sequence scores for the collection, the per-collection score being indicative of a likelihood that the traffic was generated by the application associated with the collection.
There is provided a program for controlling an apparatus to perform a method as set out above or which, when loaded into an apparatus, causes the apparatus to become an apparatus as set out above. The program may be carried on a carrier medium. The carrier medium may be a storage medium. The carrier medium may be a transmission medium. There is provided an apparatus programmed by such a program. There is provided a storage medium containing such a program.
An embodiment of the present invention offers a technical advantage of addressing the issue mentioned above relating to the prior art. Technical advantages are set out in more detail below.
Brief description of the drawings
Figure 1 illustrates schematically apparatus according to an embodiment of the present invention;
Figure 2 is a schematic flowchart illustrating a method according to an embodiment of the present invention; Figure 3 is a plot illustrating the total sum of alignment scores per application vs the application motifs; Figure 4 is a plot illustrating the sum of {sum score/number of flows} value per motif cluster;
Figure 5 shows several possible network nodes in which an embodiment of the present invention could be implemented; and
Figure 6 schematically illustrates parts of the apparatus of Figure 1 in more detail. Detailed description As mentioned above, it is desirable to provide an improved method for processing and analysing traffic in a packet switched telecommunications network.
Advanced string matching techniques, known as sequence alignment techniques, are used in bioinformatics. Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels (that is, insertion or deletion mutations) introduced in one or both lineages in the time since they diverged from one another. In sequence alignments of proteins, the degree of similarity between amino acids occupying a particular position in the sequence can be interpreted as a rough measure of how conserved a particular region or sequence motif is among lineages. The absence of substitutions, or the presence of only very conservative substitutions (that is, the substitution of amino acids whose side chains have similar biochemical properties) in a particular region of the sequence, suggest that this region has structural or functional importance. Sequence alignment is described, for example, in the book "Sequence Alignment: methods, models, concepts, and strategies" by Michael S. Rosenberg. Motif finding algorithms can be used to create regular expressions ["Randomized a l g o r i t h m s a n d m o t i f f i n d i n g , " h 1 1 p : / / b i x .ucsd.edu/bioalgorithms/ presentations/Ch 12_RandAlgs.pdf]. Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade.
[Tian Song; Yibo Xue; Dongsheng Wang, "An Algorithm of Large-Scale Approximate Multiple String Matching for Network Security," First I nternational Conference on Communications and Networking in China, 2006. ChinaCom '06., vol., no., pp.1 -5, 25- 27 Oct. 2006 , U RL : http ://ieeexplore . ieee. org/sta m p/sta m p .jsp7a rn u m ber =4149803&isnumber=41 17415] introduces a kind of approximate string matching technique to use it on network traffic, but their focus is on the algorithm and its performance is measured, but a feasible system architecture and the practical use cases were not investigated.
An embodiment of the present invention uses an approximate string matching (ASM) technique based on a sequence alignment procedure for Deep Packet Inspection. ASM defines scores for the characterization of the goodness of fitting for a signature on an input candidate.
Apparatus according to an embodiment of the present invention is shown illustratively in Figure 1 , comprising three main units: unit A, unit B and unit C. A schematic flow chart is provided in Figure 2 to illustrate the method performed by the apparatus of Figure 1 . The method of Figure 2 is shown as divided into three phases: phase 1 , phase 2, and phase 3. These three phases 1 , 2 and 3 are performed respectively by units 1 , 2 and 3. Phase 1 is a characteristic bit sequence (or motif) finding phase. Phase 2 is a sequence alignment phase. Phase 3 is a phase in which various steps may be performed based on the results of phase 2. In more detail, phase 1 is for finding characteristic bit sequences for a plurality of different applications. A characteristic bit sequence for an application can be considered to be a sequence of bits that occurs regularly or consistently in traffic generated by that application (a re-occuring bit sequence), and/or which can be said to characterise the traffic generated by that application. Characteristic bit sequences are often referred to in the literature as motifs or signatures.
Unit A has a source A1 of traffic generated by a plurality of different applications, with the application that generated the traffic being known. The source A1 may be a store (for example a temporary store) of traffic collected from network N, or may be a direct feed or input from the network N. In this sense, traffic may comprise a single packet, though more usually it would comprise a plurality of packets. For example a single application may generate a lot of traffic, the first several packets of which should be inspected since they comprise the characteristic bit sequences; further packets comprise data only, which are not generally characteristic to the application.
Steps S1 to S4 of Figure 2 are performed by processor A2 of unit A. In step S1 , one of (or the next one of) the applications is selected for processing, and in step S2 the traffic associated with that application is received or retrieved or filtered out from source A1 and analysed to identify a collection of one or more characteristic bit sequences (or motifs or signatures) for the application. In step S3 the collection of characteristic bit sequences for the application is stored in storage A3 of unit A. I n step S4 it is determined whether there are further applications of the plurality to process; if so then processing passes back to step S1 and if not then processing continues to step S5.
There are several well known motif finding tools which can be used in step S2. For example, the technique disclosed in [Frith MC, Saunders N FW, Kobe B, Bailey TL, 2008 Discovering Sequence Motifs with Arbitrary Insertions and Deletions. PLoS Comput Biol 4(5): Θ1 000071. doi:10.1371/journal.pcbi.1000071 ] can be used to process the network traffic and create application specific characteristic bit sequences accordingly. With several iterative runs the process can end up in several candidate characteristic bit sequences which are expected to be characteristic for different types of application traffic. For example, several characteristic bit sequences can be found for signaling and data transfer flows of the same Peer-to-peer (P2P) application. In the example shown in Figure 1 , the traffic for three applications App 1 , App 2 and App 3 are illustrated in traffic source A1 . After processing by the characteristic bit sequence finding processor A2, three collections (Collection 1 , Collection 2, and Collection 3) of appl ication-specific characteristic bit sequences, corresponding respectively to App 1 , App 2 and App 3, have been found and placed in storage A3 (or sent directly to unit B).
The analysis and/or identification of unknown traffic is subsequently performed by unit B by performing a sequence alignment process on the unknown traffic against the characteristic bit sequences found by unit A. Unit B has a source B1 of network traffic. The source B1 may be a store (for example a temporary store) of traffic collected from network N, for processing offline, or may be a direct feed or input from the network N, for processing online or in real time. Receipt of the traffic at unit B is represented by step S5 of Figure 2. Steps S6 to S 1 1 are performed by a sequence alignment processor B2 of unit B. In step S6 one (or the next) of the plurality of collections in the store A3 is selected for processing. Within the selected collection, one (or the next) of the characteristic bit sequences in the collection is selected in step S7 for processing.
In step S8 a sequence alignment process is performed on the received traffic against the selected characteristic bit sequence to derive a per-sequence score. In step S9 it is determined whether there are any further characteristic bit sequences in the current collection to process. If yes, then processing returns to step S7; if not, then processing continues to step S10. In step S10, a per-collection score is assigned to the current collection based on the per-sequence scores for the collection. The per-collection score can be considered to be indicative of a likelihood that the traffic received in step S5 was generated by the application associated with the collection. The per-collection score for the collection can be derived from the mean, mode or median of the per- sequence scores for the collection. In step S1 1 it is determined whether there are any further collections of characteristic bit sequence from the store A3 to process. If yes, then processing returns to step S6; if not, then processing continues to step S12. A number of different possibilities are envisaged for step S12, which is performed by the per-collection scores processor C1 of unit C, with the common factor being that step S12 represents a process that uses the per-collection scores from step S10. For example, step S12 may comprise identifying the application that generated the traffic received in step S5 based on the per-collection scores. The application that generated the received traffic may be identified as being the application associated with the collection having a per-collection score that is indicative of the highest likelihood. Where the scoring scheme is such that a higher per-collection score is indicative of a higher likelihood, this would amount to selecting the application associated with the collection having the highest per-collection score. For example, in the illustration shown in Figure 1 , traffic from unknown application App X is received, and the per-collection scores derived for each of the Collections 1 , 2 and 3 are A, B and C respectively. If per-collection score C is greatest, then App X can be identified as (most likely being) App 3, which is the application associated with Collection C.
Step S12 may comprise analysing or profiling the received traffic based on the per- collection scores, or at least arranging for or causing such analysing or profiling. Step S 12 may comprise managing traffic in the network N based on the per-collection scores, or at least arranging for or causing such managing. In this respect, managing traffic may comprise determining or applying a charging policy in the network. It may comprise traffic shaping in the network. It may comprise determining or applying a QoS guarantee in the network. This is particularly applicable in the situation where steps S5 to S1 1 are repeated multiple times, to gather information relating to a significant amount of network traffic. Repetition of these steps would allow accumulated per-collection scores to be determined, such that further analysis or processing can be based on the accumulated per-collection scores. The per-collection scores are accumulated by summing the respective per-collection scores from different passes through steps S5 to S1 1. The accumulated per-collection scores can be analysed or reviewed to get a sense for which applications are generating most traffic over the network, which in turn may be used to manage traffic in the network as mentioned above. These accumulated per-collection scores may be normalised, for example based on the number of traffic flows that are being processed . I n this respect, in a TCP/I P context a "flow" can be considered to be a TCP/IP connection between two end points, identified e.g. by source/destination port and IP addresses. There are several scenarios that can be considered in relation to normalisation:
Firstly, where the unknown flows are considered one-by-one, no normalisation is required. The traffic for a particular flow can be processed using a method as described above, with the information being used directly to determine which application most likely generated that traffic.
Secondly, the unknown flows can be considered per host, per port (i.e. the same generating client host from the same source port to several destination IPs and ports); this is a regular behaviour of services. One basic example of this is a web server, where the clients access TCP port 80 from many different I Ps coming from any possible ports. From the view of the web server, the flows can be considered as the 'same' application as they access the same service. If the web server also hosts an SNMP mail server, then flows going to port 25 have similar behavior and also can be considered together. These examples related to well-known common services, but P2P clients work similar way as it has to have a server-port open for incoming p2p connections.
Thirdly, the unknown flows can be considered per host. In such a case it can be determined that the user has a mix of specific applications. This information is also helpful in case the task is user profiling.
Fourthly, another possible use case is that an active measurement is taken and the task is to categorize the new application into existing ones. For example, suppose that a new P2P client is being released. It is installed and a measurement is done with a PC . The task is to match it to existing motif-application collections whether the application uses BitTorrent protocol, eDonkey, etc. or come completely new type. In such a case normalisation can be also done. It is known in advance that the set of flows belongs to the same application. In each of the second to fourth scenarios described above, it may be appropriate to normalise the per-collection scores based on flow numbers.
By way of example, characteristic bit sequence collections were created for twelve different applications, and these characteristic bit sequence collections were tested on each others' traffic (1 000 sample flows of each application). Figure 3 shows the accumulated per-collection alignment scores for each collection, depicted in contour form. For example, for application traffic known to be generated by Gnutella, reading along the horizontal axis labelled Gnutella, one can see a very high score of between 9000 and 10000 against the Gnutella characteristic bit sequence collection, with a very low score (around 0) on the surrounding intersections. The various score contours in between are drawn onto the plot, resulting in very tightly packed contours around the Gnutella-Gnutella intersection. Reading further along the horizontal axis labelled Gnutella, one can see a lower high score of between 1000 and 2000 against the SSH characteristic bit sequence collection, indicating that the traffic generated by the Gnutella application has at least some similarity with the SSH application, resulting in a non-zero score for SSH. Although the details of Figure 3 is difficult to interpret without the benefit of colour, it should be appreciated that the first contour encountered when moving towards one of the peaks is the 0-1000 contour, and the other listed contours (1000-2000, 2000-3000, etc) are encountered in turn as one moves towards the peak. The scoring scheme used for Figure 3 means that the number of flows will influence the overall score, so that a large number of flows each generating a small score for a particular collection will still have a large impact on the overall score for that collection. Figure 4 shows another scoring scheme, where the accumulated per-cluster scores have been normalized with flow number; such a scoring scheme avoids the possible dominating effect that applications generating large flow numbers can have on the overall scores. The results show that the highest scores occur mostly in the diagonal. These scores reflect the existence of unambiguous characteristic bit sequence collections for most of the applications, e.g. BitTorrent, MSN, Gnutella, POP3, etc.
However, in some cases the collections can be ambiguous considering only one of the scoring schemes. For example, in Figure 3 eDonkey conflicts with DC (which may occur due to multiple protocol usage of the same client), but the case of RTP has no straightforward explanation. Thus it is advisable to take more than one scoring scheme into account in during decision making. It will be appreciated that an "application" in the context of an embodiment of the present invention can be considered to represent a single application, or a group or class of applications, for example applications of the same or similar type, and the term "application" is to be understood accordingly. In this regard, it may be useful to have the abi lity to classify traffic i nto a broad class of applications, such as " P2 P applications", rather than identify the traffic as having been generated by a specific application.
Comparing the calculation complexity of the ASM with Deterministic Finite Automata (DFA) the following can be found. The DFA has O(n) complexity where n is the length of input string. The sequence alignment has O(nm) complexity [Hans-Joachim Bockenhauer, Dirk Bongartz: Algorithmic aspects of bioinformatics, Springer, ISBN- 978-3-540-71912-0 2007] where n is the length of the input string, m is the length of the motif. The difference is linear, thus the algorithm may be a proper candidate on e.g., post processing of such traffic which can not be identified with the common DPI techniques.
Figure 5 illustrates several possible network nodes in which an embodiment of the present invention could be implemented. Example network nodes that are suitable for supporting functionality according to an embodiment of the present invention are those such as gateway nodes (e.g. serving and packet gateway nodes) which are in a position to observe the network traffic of several users. Examples shown in Figure 4 are a Radio Base Station (RBS) 2, a Serving GPRS Support Node (SGSN) 4, a Gateway GPRS Support Node (GGSN) 6 in a 3G network, and a Broadband Remote Access Server (BRAS) 8 and a Digital Subscriber Line Access Multiplexer (DSLAM) 10 in a DSL network. A Wireless Local Area Network (WLAN) access point 12 is a relatively low aggregation point and therefore is a less preferred candidate.
One advantage of an embodiment of the invention is to enable the DPI engines to use such signature sets which would otherwise give false positive hits on their own. For example, '@hotmail.com' for MSN is a good factor of the sum characteristic bit sequence score (as the MSN passports usually creates a hotmail address for the user), but not application specific on its own. As not necessarily every characteristic bit sequence is specific for only one application but using the sum of the characteristic bit sequence scores for one specific application make them a fairly reliable indicator for an application.
It is also an advantage of an embodiment of the invention when such characteristic bit sequences are the application descriptors which known to be changed deliberately, e.g. for e-mail spam and other text-like characteristics protocols, e.g., VIAGRA -> V.I.A.G.R.A.
The characteristic bit sequences are even more robust for protocol version changes over time than regular expressions. For example, new option fields in a protocol do not affect the characteristic bit sequences much.
Each of the blocks illustrated in Figure 2 can be considered to represent physical means for performing the function associated with the block. Thus, blocks S1 to S4 can be considered to represent respective blocks within unit A2, blocks S5 to S1 1 can be considered to represent respective blocks within unit B2, and block S12 can be considered to represent a block within unit C1 . This is illustrated in more detail in Figure 6, which shows processors P1 to P4 in unit A2 for performing steps S1 to S4 respectively, processors P5 to P 1 1 i n un it B2 for perform ing steps S5 to S 1 1 respectively, and processor P12 in unit C1 for performing step S12. It will be appreciated that operation of one or more of the above-described components can be provided in the form of one or more processors or processing units, which processing unit or units could be controlled or provided at least in part by a program operating on the device or apparatus. The function of several depicted components may in fact be performed by a single component. A single processor or processing unit may be arranged to perform the function of multiple components. Such an operating program can be stored on a computer-readable medium, or could, for example, be embodied in a signal such as a downloadable data signal provided from an Internet website. The appended claims are to be interpreted as covering an operating program by itself, or as a record on a carrier, or as a signal, or in any other form. It will also be appreciated that although units A, B and C as shown in Figure 1 may be provided in a single apparatus in a single location, it is also possible that the three units A, B and C are provided in three separate locations. Example locations are illustrated in Figure 5 and described above. In particular, the characteristic bit sequence finding tasks performed in phase 1 by unit A may be performed in advance by a third party, with the results (collections of characteristic bit sequences) from phase 1 being provided subsequently as input to phase 2. Likewise, the resu lts (per-collection scores) from phase 2 need not be used straight away in phase 3, but instead may be stored and distributed to another location for the performance there of the phase 3 analysis. The appended claims are intended in particular to cover the method of phase 2 and unit B in isolation, but are also intended to cover any of the other phases and units in isolation, and any combination of phases 1 , 2 and 3, and any combination of units A, B and C. It will also be appreciated by the person of skill in the art that various modifications may be made to the above-described embodiments without departing from the scope of the present invention as defined by the appended claims.

Claims

Claims
1 . A method of processing traffic in a packet switched telecommunications network, comprising:
(a) for each of a plurality of applications: analysing traffic generated by the application to identify a collection of one or more characteristic bit sequences for the application, or at least providing such a plurality of collections;
(b) receiving traffic from the network; and
(c) for each of at least one of the plurality of collections:
(i) for each of at least one of the characteristic bit sequences in the collection: performing a sequence alignment process on the received traffic against the characteristic bit sequence to derive a per-sequence score; and
(ii) assigning a per-collection score to the collection based on the per-sequence scores for the collection, the per-collection score being indicative of a likelihood that the traffic was generated by the application associated with the collection.
2. A method as claimed in claim 1 , comprising managing traffic in the network based on the per-collection scores, or at least arranging for or causing such managing.
3. A method as claimed in claim 2, wherein the step of managing traffic comprises at least one of: determining or applying a charging policy in the network, traffic shaping in the network, and determining or applying a QoS guarantee in the network.
4. A method as claimed in any preceding claim, comprising analysing or profiling the received traffic based on the per-collection scores, or at least arranging for or causing such analysing or profiling.
5. A method as claimed in any preceding claim, comprising identifying the application that generated the received traffic based on the per-collection scores.
6. A method as claimed in claim 5, wherein the application that generated the received traffic is identified as being the application associated with the collection having a per-collection score that is indicative of the highest likelihood.
7. A method as claimed in any preceding claim, wherein at least one of the applications represents a group or class of applications, for example applications of the same or similar type.
8. A method as claimed in any preceding claim, wherein the received traffic comprises a plurality of packets.
9. A method as claimed in any preceding claim, comprising repeating steps (b) and (c) to assign accumulated per-collection scores to the respective collections, and wherein at least one step that is performed based on per-collection scores is performed at least partly based on the accumulated per-collection scores.
10. A method as claimed in claim 9, comprising normalising the accumulated per- collection scores.
1 1 . A method as claimed in any preceding claim, wherein the per-collection score for a collection is derived from at least one of the mean, mode and median of the per- sequence scores for the collection.
12. An apparatus for processing traffic in a packet switched telecommunications network, comprising:
(a) means for, in relation to each of a plurality of applications: analysing traffic generated by the application to identify a collection of one or more characteristic bit sequences for the application, or at least providing such a plurality of collections;
(b) means for receiving traffic from the network; and
(c) means for, in relation to each of at least one of the plurality of collections, performing the following steps:
(i) for each of at least one of the characteristic bit sequences in the collection: performing a sequence alignment process on the received traffic against the characteristic bit sequence to derive a per-sequence score; and
(ii) assigning a per-collection score to the collection based on the per-sequence scores for the collection, the per-collection score being indicative of a likelihood that the traffic was generated by the application associated with the collection.
13. A program for controlling an apparatus to perform a method as claimed in any one of claims 1 to 1 1 , optionally being carried on a carrier medium such as a storage medium or a transmission medium.
14. A storage medium containing a program as claimed in claim 13.
PCT/EP2010/065413 2010-10-14 2010-10-14 Application identification through data traffic analysis WO2012048744A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/876,288 US20130194930A1 (en) 2010-10-14 2010-10-14 Application Identification Through Data Traffic Analysis
PCT/EP2010/065413 WO2012048744A1 (en) 2010-10-14 2010-10-14 Application identification through data traffic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2010/065413 WO2012048744A1 (en) 2010-10-14 2010-10-14 Application identification through data traffic analysis

Publications (1)

Publication Number Publication Date
WO2012048744A1 true WO2012048744A1 (en) 2012-04-19

Family

ID=44122056

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2010/065413 WO2012048744A1 (en) 2010-10-14 2010-10-14 Application identification through data traffic analysis

Country Status (2)

Country Link
US (1) US20130194930A1 (en)
WO (1) WO2012048744A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9596321B2 (en) 2015-06-24 2017-03-14 Cisco Technology, Inc. Server grouping system
WO2022033115A1 (en) * 2020-08-12 2022-02-17 华为技术有限公司 Communication method and communication apparatus

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9553817B1 (en) * 2011-07-14 2017-01-24 Sprint Communications Company L.P. Diverse transmission of packet content
US9887881B2 (en) 2013-10-30 2018-02-06 Cisco Technology, Inc. DNS-assisted application identification
US9853876B1 (en) * 2014-06-13 2017-12-26 Narus, Inc. Mobile application identification in network traffic via a search engine approach
US10560362B2 (en) * 2014-11-25 2020-02-11 Fortinet, Inc. Application control
US10250466B2 (en) 2016-03-29 2019-04-02 Juniper Networks, Inc. Application signature generation and distribution
US11140068B2 (en) 2018-06-25 2021-10-05 Edgewater Networks, Inc. Edge networking devices and systems for identifying a software application

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8284662B2 (en) * 2007-03-06 2012-10-09 Ericsson Ab Flexible, cost-effective solution for peer-to-peer, gaming, and application traffic detection and treatment
US20090238071A1 (en) * 2008-03-20 2009-09-24 Embarq Holdings Company, Llc System, method and apparatus for prioritizing network traffic using deep packet inspection (DPI) and centralized network controller
US8201220B2 (en) * 2008-12-23 2012-06-12 Qwest Communications International Inc. Network user usage profiling
US8214487B2 (en) * 2009-06-10 2012-07-03 At&T Intellectual Property I, L.P. System and method to determine network usage
US8694619B2 (en) * 2009-07-30 2014-04-08 Telefonaktiebolaget L M Ericsson (Publ) Packet classification method and apparatus
US8750146B2 (en) * 2010-12-15 2014-06-10 At&T Intellectual Property I, L.P. Method and apparatus for applying uniform hashing to wireless traffic
US20120317151A1 (en) * 2011-06-09 2012-12-13 Thomas Walter Ruf Model-Based Method for Managing Information Derived From Network Traffic

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Application Protocol Identification in Data Network Based on Naive Bayes Identifier", IP.COM JOURNAL, IP.COM INC., WEST HENRIETTA, NY, US, 23 December 2008 (2008-12-23), XP013128336, ISSN: 1533-0001 *
A. CALLADO; G. SZABO; B. P. GERO; J. KELNER; S. FERNANDES; D. SADOK: "Survey on Internet Traffic Identification and Classification", IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, vol. 11, no. 3, 2009, pages 37 - 52, XP011272506, DOI: doi:10.1109/SURV.2009.090304
HAMZA DAHMOUNI ET AL: "A Markovian Signature-Based Approach to IP Traffic Classification", 12 June 2007 (2007-06-12), San Diego, California, pages 29 - 34, XP055001306, Retrieved from the Internet <URL:http://www.hsnlab.hu/twiki/pub/Targyak/Apr8Cikkek/p29-dahmouni.pdf> [retrieved on 20110623] *
SMITH, R.; ESTAN, C.; JHA, S.; KONG, S.: "Deflating the big bang: fast and scalable deep packet inspection with extended finite automata", SIGCOMM COMPUT. COMMUN. REV., vol. 38, no. 4, October 2008 (2008-10-01), pages 207 - 218, Retrieved from the Internet <URL:http://doi.acm.org/10.1145/1402946.1402983>
TIAN SONG; YIBO XUE; DONGSHENG WANG: "An Algorithm of Large-Scale Approximate Multiple String Matching for Network Security", FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA, 2006. CHINACOM '06., 25 October 2006 (2006-10-25), pages 1 - 5, XP031074696, Retrieved from the Internet <URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber =4149803&isnumber=4117415> DOI: doi:10.1109/CHINACOM.2006.344758

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9596321B2 (en) 2015-06-24 2017-03-14 Cisco Technology, Inc. Server grouping system
US9813442B2 (en) 2015-06-24 2017-11-07 Cisco Technology, Inc. Server grouping system
WO2022033115A1 (en) * 2020-08-12 2022-02-17 华为技术有限公司 Communication method and communication apparatus
US11855846B2 (en) 2020-08-12 2023-12-26 Huawei Technologies Co., Ltd. Communication method and communication apparatus

Also Published As

Publication number Publication date
US20130194930A1 (en) 2013-08-01

Similar Documents

Publication Publication Date Title
US20130194930A1 (en) Application Identification Through Data Traffic Analysis
Tongaonkar et al. Towards self adaptive network traffic classification
Park et al. Towards automated application signature generation for traffic identification
CN111371735B (en) Botnet detection method, system and storage medium
US20130332456A1 (en) Method and system for detecting operating systems running on nodes in communication network
US9680861B2 (en) Historical analysis to identify malicious activity
Grimaudo et al. Select: Self-learning classifier for internet traffic
US20140059216A1 (en) Methods and systems for network flow analysis
Qin et al. Robust application identification methods for P2P and VoIP traffic classification in backbone networks
US8213326B2 (en) Method and apparatus for the classification of ports on a data communication network node
Iliofotou et al. Graph-based p2p traffic classification at the internet backbone
DK2869495T3 (en) Node de-duplication in a network monitoring system
Yeganeh et al. Cute: Traffic classification using terms
CN112583657A (en) Distributed routing level network topology detection method based on embedded equipment
CN111953552B (en) Data flow classification method and message forwarding equipment
CN113872962B (en) Low-speed port scanning detection method for high-speed network sampling data acquisition scene
CN109088756A (en) A kind of network topology complementing method based on network equipment identification
Kardes et al. Graph based induction of unresponsive routers in internet topologies
JP2007074339A (en) Spread unauthorized access detection method and system
Kheir et al. Behavioral fine-grained detection and classification of P2P bots
Rostami et al. Analysis and detection of P2P botnet connections based on node behaviour
Lu et al. Botnet detection based on fuzzy association rules
Yuan et al. Harvesting unique characteristics in packet sequences for effective application classification
Berthier et al. An evaluation of connection characteristics for separating network attacks
Sengar et al. P2p bot detection system based on map reduce

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10770778

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 13876288

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10770778

Country of ref document: EP

Kind code of ref document: A1