EP2115642A1 - Web reputation scoring - Google Patents

Web reputation scoring

Info

Publication number
EP2115642A1
EP2115642A1 EP08728168A EP08728168A EP2115642A1 EP 2115642 A1 EP2115642 A1 EP 2115642A1 EP 08728168 A EP08728168 A EP 08728168A EP 08728168 A EP08728168 A EP 08728168A EP 2115642 A1 EP2115642 A1 EP 2115642A1
Authority
EP
European Patent Office
Prior art keywords
reputation
entity
engine
operable
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP08728168A
Other languages
German (de)
French (fr)
Other versions
EP2115642A4 (en
Inventor
Dmitri Alperovitch
Tomo Foote-Lennox
Paula Greve
Alejandro M. Hernandez
Paul Judge
Sven Krasser
Tim Lange
Phyllis A. Schneck
Martin Stecher
Yuchun Tang
Aarjav Jyotindra Neeta Trivedi
Lamar L. Willis
Weilai Yang
Jonathan A. Zdziarski
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
McAfee LLC
Original Assignee
Secure Computing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/626,479 external-priority patent/US7937480B2/en
Priority claimed from US11/626,470 external-priority patent/US8561167B2/en
Priority claimed from US11/626,620 external-priority patent/US7779156B2/en
Priority claimed from US11/626,644 external-priority patent/US8179798B2/en
Application filed by Secure Computing LLC filed Critical Secure Computing LLC
Publication of EP2115642A1 publication Critical patent/EP2115642A1/en
Publication of EP2115642A4 publication Critical patent/EP2115642A4/en
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/316User authentication by observing the pattern of computer usage, e.g. typical user behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/126Applying verification of the received information the source of the received data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2111Location-sensitive, e.g. geographical location, GPS

Definitions

  • This document relates generally to systems and methods for processing communications and more particularly to systems and methods for classifying entities associated with communications.
  • spammers use various creative means for evading detection by spam filters.
  • the entity from which a communication originated can provide another indication of whether a given communication should be allowed into an enterprise network environment.
  • IP blacklists sometimes called real-time blacklists (RBLs)
  • IP whitelists realtime whitelists (RWLs)
  • YES/NO binary-type response to each query.
  • blacklists and whitelists treat entities independently, and overlook the evidence provided by various attributes associated with the entities.
  • Systems and methods for web reputation scoring are provided.
  • Systems used to assign reputation to web-based entities can include a communications interface, a communications analyzer, a reputation engine and a decision engine.
  • the communications interface can receive a web communication
  • the communication analyzer can analyze the web communication to determine an entity associated with the web communication.
  • the reputation engine can provide a reputation associated with the entity based upon previously collected data associated with the entity, and the decision engine can determine whether the web communication is to be communicated to a recipient based upon the reputation.
  • Methods of assigning reputation to web-based entities can include: receiving a hypertext transfer protocol communication at an edge protection device; identifying an entity associated with the received hypertext transfer protocol communication; querying reputation engine for a reputation indicator associated with the entity; receiving the reputation indicator from the reputation engine; and, taking an action with respect to the hypertext transfer protocol communication based upon the received reputation indicator associated with the entity.
  • Examples of computer readable media operating on a processor to perform to aggregate local reputation data to produce a global reputation vector can perform the steps of: receiving a reputation query from a requesting local reputation engine; retrieving a plurality of local reputations the local reputations being respectively associated with a plurality of local reputation engines; aggregating the plurality of local reputations; deriving a global reputation from the aggregation of the local reputations; and, responding to the reputation query with the global reputation.
  • Other example systems can include a communications interface and a reputation engine.
  • the communications interface can receive global reputation information from a central server, the global reputation being associated with an entity.
  • the reputation engine can bias the global reputation received from the central server based upon defined local preferences.
  • Further example systems can include a communications interface, a reputation module and a traffic control module.
  • the communications interface can receive distributed reputation information from distributed reputation engines.
  • the reputation module can aggregate the distributed reputation information and derive a global reputation based upon the aggregation of the distributed reputation information, the reputation module can also derive a local reputation information based upon communications received by the reputation module.
  • the traffic control module can determine handling associated with communications based upon the global reputation and the local reputation.
  • Systems and methods used to aggregate reputation information can include a centralized reputation engine and an aggregation engine.
  • the centralized reputation engine can receive feedback from a plurality of local reputation engines.
  • the aggregation engine can derive a global reputation for a queried entity based upon an aggregation of the plurality of local reputations.
  • the centralized reputation engine can further provide the global reputation of the queried entity to a local reputation engines responsive to receiving a reputation query from the local reputation engine.
  • Methods of aggregating reputation information can include: receiving a reputation query from a requesting local reputation engine; retrieving a plurality of local reputations the local reputations being respectively associated with a plurality of local reputation engines; aggregating the plurality of local reputations; deriving a global reputation from the aggregation of the local reputations; and, responding to the reputation query with the global reputation.
  • Examples of computer readable media operating on a processor aggregate local reputation data to produce a global reputation vector can perform the steps of: receiving a reputation query from a requesting local reputation engine; retrieving a plurality of local reputations the local reputations being respectively associated with a plurality of local reputation engines; aggregating the plurality of local reputations; deriving a global reputation from the aggregation of the local reputations; and, responding to the reputation query with the global reputation.
  • reputation aggregations systems can include a communications interface and a reputation engine.
  • the communications interface can receive global reputation information from a central server, the global reputation being associated with an entity.
  • the reputation engine can bias the global reputation received from the central server based upon defined local preferences.
  • reputation aggregation systems can include a communications interface, a reputation module and a traffic control module.
  • the communications interface can receive distributed reputation information from distributed reputation engines.
  • the reputation module can aggregate the distributed reputation information and derive a global reputation based upon the aggregation of the distributed reputation information, the reputation module can also derive a local reputation information based upon communications received by the reputation module.
  • the traffic control module can determine handling associated with communications based upon the global reputation and the local reputation.
  • Systems and methods for a reputation based network security system are provided.
  • a reputation based network security system can include a communications interface, a communication analyzer, a reputation engine and a security engine.
  • the communications interface can receive incoming and outgoing communications associated with a network.
  • the communication analyzer can derive an external entity associated with a communication.
  • the reputation engine can derive a reputation vector associated with the external entity.
  • the security engine can receive the reputation vector and send the communication to interrogation engines, wherein the security engine determines which of the interrogation engines
  • reputation based network security systems can include a communications interface, a communications analyzer, a reputation engine and a security engine.
  • the communications interface can receive incoming and outgoing communications associated with a network.
  • the communication analyzer can derive an external entity associated with a communication.
  • the reputation engine can derive a reputation associated with the external entity.
  • the security engine assigns priority information to a communication, wherein the security engine can assign a high priority to communications where the external entity is a reputable entity and can assign a low priority to communications where the external entity is a non-reputable entity, whereby the priority information is used by one or more interrogation engines to improve quality of service for reputable entities.
  • Methods of efficiently processing communication based on reputation for security threats can include: receiving a communication associated with an external entity based upon origination or destination information associated with the communication; identifying the external entity associated with the received communication; deriving a reputation associated with the external entity based upon reputable and non-reputable criteria associated with the external entity; assigning a priority to the communication based upon the derived reputation associated with the external entity; executing one or more tests on the communication based upon the priority assigned to the communication.
  • Methods of efficiently processing communication based on reputation can include: receiving a communication associated with an external entity based upon origination or destination information associated with the communication; identifying the external entity associated with the received hypertext transfer protocol communication; deriving a reputation associated with the external entity based upon reputable and non-reputable criteria associated with the external entity; assigning the communication to one or more interrogation engines selected from among a plurality of interrogation engines, the selection of the one or more interrogation engines being based upon the derived reputation associated with the external entity and capacity of the interrogation engines; and, executing said one or more interrogation engines on the communication.
  • Systems and methods for reputation based connection throttling are provided.
  • Systems used for reputation based connection throttling can include a communications interface, a reputation engine and a connection control engine.
  • the communications interface can receive connection requests associated with an external entity prior to a connection being established to the external entity.
  • the reputation engine can derive a reputation associated with the external entity.
  • the connection control engine can deny connection requests to a protected network based upon the derived reputation of the external entity.
  • Methods of throttling connection requests based upon reputation can include: receiving a connection request, the connection request being related to an external entity; querying a reputation engine for a reputation associated with the external entity; comparing the reputation to a policy associated with a protected enterprise network; permitting the connection request based upon determining that the reputation of the external entity related to the connection request complies with the policy; and, throttling the connection request based upon determining that the reputation of the external entity related to the voice over internet protocol connection request does not comply with the policy.
  • FIG. 1 is a block diagram depicting an example network in which systems and methods of this disclosure can operate.
  • FIG. 2 is a block diagram depicting an example network architecture of this disclosure.
  • FIG. 3 is a block diagram depicting an example of communications and entities including identifiers and attributes used to detect relationships between entities.
  • FIG. 4 is a flowchart depicting an operational scenario used to detect relationships and assign risk to entities.
  • FIG. 5 is a block diagram illustrating an example network architecture including local reputations stored by local security agents and a global reputation stored by one or more servers.
  • FIG. 6 is a block diagram illustrating a determination of a global reputation based on local reputation feedback.
  • FIG. 7 is a flow diagram illustrating an example resolution between a global reputation and a local reputation.
  • FIG. 8 is an example graphical user interface for adjusting the settings of a filter associated with a reputation server.
  • FIG. 9 is a block diagram illustrating reputation based connection throttling for voice over internet protocol (VoIP) or short message service (SMS) communications.
  • VoIP voice over internet protocol
  • SMS short message service
  • FIG. 10 is a block diagram illustrating a reputation based load balancer.
  • FIG. HA is a flowchart illustrating an example operational scenario for geolocation based authentication.
  • FIG. 1 IB is a flowchart illustrating another example operational scenario for geolocation based authentication.
  • FIG. HC is a flowchart illustrating another example operational scenario for geolocation based authentication.
  • FIG. 12 is a flowchart illustrating an example operational scenario for a reputation based dynamic quarantine.
  • FIG. 13 is an example graphical user interface display of an image spam communication.
  • FIG. 14 is a flowchart illustrating an example operational scenario for detecting image spam.
  • FIG. 15A is a flowchart illustrating an operational scenario for analyzing the structure of a communication.
  • FIG. 15B is a flowchart illustrating an operational scenario for analyzing the features of an image.
  • FIG. 15C is a flowchart illustrating an operational scenario for normalizing the an image for spam processing.
  • FIG. 15D is a flowchart illustrating an operational scenario for analyzing the fingerprint of an image to find common fragments among multiple images.
  • FIG. 1 is a block diagram depicting an example network environment in which systems and methods of this disclosure can operate.
  • Security agent 100 can typically reside between a firewall system (not shown) and servers (not shown) internal to a network 110 (e.g., an enterprise network).
  • the network 110 can include a number of servers, including, for example, electronic mail servers, web servers, and various application servers as may be used by the enterprise associated with the network 110.
  • the security agent 100 monitors communications entering and exiting the network 110. These communications are typically received through the internet 120 from many entities 130a-f that are connected to the internet 120.
  • One or more of the entities 130a-f can be legitimate originators of communications traffic. However, one or more of the entities 130a-f can also be non-reputable entities originating unwanted communications.
  • the security agent 100 includes a reputation engine.
  • the reputation engine can inspect a communication and to determine a reputation associated with an entity that originated the communication. The security agent 100 then performs an action on the communication based upon the reputation of the originating entity. If the reputation indicates that the originator of the communication is reputable, for example, the security agent can forward the communication to the recipient of the communication. However, if the reputation indicates that the originator of the communication is non-reputable, for example, the security agent can quarantine the communication, perform more tests on the message, or require authentication from the message originator, among many others. Reputation engines are described in detail in United States Patent Publication No. 2006/0015942, which is hereby incorporated by reference.
  • FIG. 2 is a block diagram depicting an example network architecture of this disclosure.
  • Security agents 100a-n are shown logically residing between networks 110a-n, respectively, and the internet 120. While not shown in FIG. 2, it should be understood that a firewall may be installed between the security agents 100a-n and the internet 120 to provide protection from unauthorized communications from entering the respective networks 110a-n. Moreover, intrusion detection systems (IDS) (not shown) can be deployed in conjunction with firewall systems to identify suspicious patterns of activity and to signal alerts when such activity is identified.
  • IDS intrusion detection systems
  • sender information included in the communication can be used to help determine whether or not a communication is legitimate.
  • sophisticated security agents 100a-n can track entities and analyze the characteristics of the entities to help determine whether to allow a communication to enter a network 110a-n.
  • the entities 110a-n can then be assigned a reputation. Decisions on a communication can take into account the reputation of an entity 130a-e that originated the communication.
  • one or more central systems 200 can collect information on entities 120a-e and distribute the collected data to other central systems 200 and/or the security agents 100a-n. Reputation engines can assist in identifying the bulk of the malicious communications without extensive and potentially costly local analysis of the content of the communication.
  • Reputation engines can also help to identify legitimate communications and prioritize their delivery and reduce the risk of misclassifying a legitimate communication.
  • reputation engines can provide a dynamic and predictive approaches to the problem of identifying malicious, as well as legitimate, transactions in physical or virtual worlds. Examples include the process of filtering malicious communications in an email, instant messaging, VoIP, SMS or other communication protocol system using analysis of the reputation of sender and content.
  • a security agent 100a-n can then apply a global or local policy to determine what action to perform with respect to the communication (such as deny, quarantine, load balance, deliver with assigned priority, analyze locally with additional scrutiny) to the reputation result.
  • an entity 130a-e can connect to the internet in a variety of methods.
  • an entity 130a-e can have multiple identifiers (such as, for example, e-mail addresses, IP addresses, identifier documentation, etc) at the same time or over a period of time.
  • a mail server with changing IP addresses can have multiple identities over time.
  • one identifier can be associated with multiple entities, such as, for example, when an IP address is shared by an organization with many users behind it.
  • the specific method used to connect to the internet can obscure the identification of the entity 130a-e.
  • an entity 130b may connect to the internet using an internet service provider (ISP) 200. Many ISPs 200 use dynamic host configuration protocol (DHCP) to assign
  • ISP internet service provider
  • DHCP dynamic host configuration protocol
  • Entities 130a-e can also disguise their identity by spoofing a legitimate entity.
  • collecting data on the characteristics of each entity 130a-e can help to categorize an entity 130a-e and determine how to handle a communication.
  • the ease of creation and spoofing of identities in both virtual and physical world can create an incentive for users to act maliciously without bearing the consequences of that act.
  • a stolen IP address on the Internet (or a stolen passport in the physical world) of a legitimate entity by a criminal can enable that criminal to participate in malicious activity with relative ease by assuming the stolen identity.
  • reputation systems can influence reputable and non-reputable entities to operate responsibly for fear of becoming non-reputable, and being unable to correspond or interact with other network entities.
  • FIG. 3 is a block diagram depicting an example of communications and entities including using identifiers and attributes used to detect relationships between entities.
  • Security agents 100a-b can collect data by examining communications that are directed to an associated network.
  • Security agents 100a-b can also collect data by examining communications that are relayed by an associated network. Examination and analysis of communications can allow the security agents 100a-b to collect information about the entities 300a-c sending and receiving messages, including transmission patterns, volume, or whether the entity has a tendency to send certain kinds of message (e.g., legitimate messages, spam, virus, bulk mail, etc.), among many others.
  • certain kinds of message e.g., legitimate messages, spam, virus, bulk mail, etc.
  • each of the entities 300a-c is associated with one or more identifiers 310a-c, respectively.
  • the identifiers 310a-c can include, for example, IP addresses, universal resource locator (URL), phone number, IM username, message content, domain, or any other identifier that might describe an entity.
  • the identifiers 3 lOa-c are associated with one or more attributes 320a-c.
  • the attributes 320a-c are fitted to the particular identifier 31 Oa-c that is being described.
  • a message content identifier could include attributes such as, for example, malware, volume, type of content, behavior, etc.
  • attributes 320a-c associated with an identifier such as EP address, could include one or more IP addresses associated with an entity 300a-c.
  • this data can be collected from communications 330a-c (e.g., e-mail) typically include some identifiers and attributes of the entity that originated the communication.
  • the communications 330a-c provide a transport for communicating information about the entity to the security agents 100a, 100b.
  • These attributes can be detected by the security agents 100a, 100b through examination of the header information included in the message, analysis of the content of the message, as well as through aggregation of information previously collected by the security agents 100a, 100b (e.g., totaling the volume of communications received from an entity).
  • the data from multiple security agents 100a, 100b can be aggregated and mined.
  • the data can be aggregated and mined by a central system which receives identifiers and attributes associated with all entities 300a-c for which the security agents 100a, 100b have received communications.
  • the security agents 100a, 100b can operate as a distributed system, communicating identifier and attribute information about entities 300a-c with each other.
  • the process of mining the data can correlate the attributes of entities 300a-c with each other, thereby determining relationships between entities 300a-c (such as, for example, correlations between an event occurrence, volume, and/or other determining factors). These relationships can then be used to establish a multi-dimensional reputation "vector" for all identifiers based on the correlation of attributes that have been associated with each identifier.
  • the security agent 100a can determine whether all or a portion of the first set of attributes 350a matched all or a portion of the second set of attributes 350b. When some portion of the first set of attributes 350a matches some portion of the second set of attributes 330b, a relationship can be created depending upon the particular identifier 320a, 320b that included the matching attributes 33Oa, 330b.
  • the particular identifiers 340a, 340b which are found to have matching attributes can be used to determine a strength associated with the relationship between the entities 300a, 300b.
  • the strength of the relationship can help to determine how much of the non-reputable qualities of the non-reputable entity 300a are attributed to the reputation of the unknown entity 300b.
  • the unknown entity 300b may originate a communication 330c which includes attributes 350c that match some attributes 350d of a communication 330d originating from a known reputable entity 300c.
  • the particular identifiers 340c, 34Od which are found to have matching attributes can be used to determine a strength associated with the relationship between the entities 300b, 300c.
  • a distributed reputation engine also allows for real-time collaborative sharing of global intelligence about the latest threat landscape, providing instant protection benefits to the local analysis that can be performed by a filtering or risk analysis system, as well as identify malicious sources of potential new threats before they even occur.
  • sensors positioned at many different geographical locations information about new threats can be quickly and shared with the central system 200, or with the distributed security agents 100a, 100b.
  • distributed sensors can include the local security agents 100a, 100b, as well as local reputation clients, traffic monitors, or any other device suitable for collecting communication data (e.g., switches, routers, servers, etc.).
  • security agents 100a, 100b can communicate with a central system 200 to provide sharing of threat and reputation information.
  • the security agents 100a, 100b can communicate threat and reputation information between each other to provide up to date and accurate threat information.
  • the first security agent 100a has information about the relationship between the unknown entity 300b and the non-reputable entity 300a
  • the second security agent 100b has information about the relationship between the unknown entity 300b and the reputable entity 300c.
  • the first security agent 100a may take a particular action on the communication based upon the detected relationship.
  • the first security agent 100a might take a different action with a received communication from the unknown entity 300b. Sharing of the relationship information between security agents, thus provides for a more complete set of relationship information upon which a determination will be made.
  • the system attempts to assign reputations (reflecting a general disposition and/or categorization) to physical entities, such as individuals or automated systems performing transactions, hi the virtual world, entities are represented by identifiers (ex. IPs, URLs, content) that are tied to those entities in the specific transactions (such as sending a message or transferring money out of a bank account) that the entities are performing. Reputation can thus be assigned to those identifiers based on their overall behavioral and historical patterns as well as their relationship to other identifiers, such as the relationship of IPs sending messages and URLs included in those messages.
  • a "bad" reputation for a single identifier can cause the reputation of other neighboring identifiers to worsen, if there is a strong correlation between the identifiers.
  • an IP that is sending URLs which have a bad reputation will worsen its own reputation because of the reputation of the URLs.
  • the individual identifier reputations can be aggregated into a single reputation (risk score) for the entity that is associated with those identifiers
  • attributes can fall into a number of categories.
  • evidentiary attributes can represent physical, digital, or digitized physical data about an entity. This data can be attributed to a single known or unknown entity, or shared between multiple entities (forming entity relationships). Examples of evidentiary attributes relevant to messaging security include IP (internet protocol) address, known domain names, URLs, digital fingerprints or signatures used by the entity, TCP signatures, and etcetera.
  • behavioral attributes can represent human or machine- assigned observations about either an entity or an evidentiary attribute. Such attributes may include one, many, or all attributes from one or more behavioral profiles. For example, a behavioral attribute genetically associated with a spammer may by a high volume of communications being sent from that entity.
  • a number of behavioral attributes for a particular type of behavior can be combined to derive a behavioral profile.
  • a behavioral profile can contain a set of predefined behavioral attributes.
  • the attributive properties assigned to these profiles include behavioral events relevant to defining the disposition of an entity matching the profile. Examples of behavioral profiles relevant to messaging security might include, "Spammer”, “Scammer”, and "Legitimate Sender”. Events and/or evidentiary attributes relevant to each profile define appropriate entities to which a profile should be assigned. This may include a specific set of sending patterns, blacklist events, or specific attributes of the evidentiary data. Some examples include: Sender/Receiver Identification; Time Interval and sending patterns; Severity and disposition of payload; Message construction; Message quality; Protocols and related signatures;
  • FIG. 4 is a flowchart depicting an operational scenario 400 used to detect relationships and assign risk to entities.
  • the operational scenario begins at step 410 by collecting network data.
  • Data collection can be done, for example, by a security agent 100, a client device, a switch, a router, or any other device operable to receive communications from network entities (e.g., e-mail servers, web servers, IM servers,
  • network entities e.g., e-mail servers, web servers, IM servers,
  • ISPs file transfer protocol (FTP) servers
  • gopher servers gopher servers
  • VoIP equipments etc.
  • Step 420 identifiers are associated with the collected data (e.g., communication data).
  • Step 420 can be performed by a security agent 100 or by a central system 200 operable to aggregate data from a number of sensor devices, including, for example, one or more security agents 100.
  • step 420 can be performed by the security agents 100 themselves.
  • the identifiers can be based upon the type of communication received.
  • an e-mail can include one set of information (e.g., IP address of originator and destination, text content, attachment, etc.), while a VoIP communication can include a different set of information (e.g., originating phone number (or IP address if originating from a VoIP client), receiving phone number (or IP address if destined for a VoIP phone), voice content, etc.).
  • Step 420 can also include assigning the attributes of the communication with the associated identifiers.
  • Step 430 the attributes associated with the entities are analyzed to determine whether any relationships exist between entities for which communications information has been collected.
  • Step 430 can be performed, for example, by a central system 200 or one or more distributed security agents 100.
  • the analysis can include comparing attributes related to different entities to find relationships between the entities. Moreover, based upon the particular attribute which serves as the basis for the relationship, a strength can be associated with the relationship.
  • a risk vector is assigned to the entities.
  • the risk vector can be assigned by the central system 200 or by one or more security agents 100.
  • the risk vector assigned to an entity 130 (FIGS. 1-2), 300 (FIG. 3) can be based upon the relationship found between the entities and on the basis of the identifier which formed the basis for the relationship.
  • an action can be performed based upon the risk vector.
  • the action can be performed, for example, by a security agent 100.
  • the action can be performed on a received communication associated with an entity for which a risk vector has been assigned.
  • the action can include any of allow, deny, quarantine, load balance, deliver with assigned priority, or analyze locally with additional scrutiny, among many others.
  • a reputation vector can be derived separately
  • FIG. 5 is a block diagram illustrating an example network architecture including local reputations 500a-e derived by local reputation engines 510a-e and a global reputation 520 stored by one or more servers 530.
  • the local reputation engines 510a-e can be associated with local security agents such as security agents 100.
  • the local reputation engines 510a-e can be associated, for example, with a local client.
  • Each of the reputation engines 510a-e includes a list of one or more entities for which the reputation engine 510a-e stores a derived reputation 500a-e.
  • reputation engine 1 510a may include a reputation that indicates a particular entity is reputable
  • reputation engine 2 510b may include a reputation that indicates that the same entity is non-reputable.
  • These local reputational inconsistencies can be based upon different traffic received from the entity.
  • the inconsistencies can be based upon the feedback from a user of local reputation engine 1 510a indicating a communication is legitimate, while a user of local reputation engine 2 510b provides feedback indicating that the same communication is not legitimate.
  • the server 530 receives reputation information from the local reputation engines 510a-e. However, as noted above, some of the local reputation information may be inconsistent with other local reputation information.
  • the server 530 can arbitrate between the local reputations 500a-e to determine a global reputation 520 based upon the local reputation information 500a-e. In some examples, the global reputation information 520 can then be provided back to the local reputation engines 510a-e to provide these local engines 510a-e with up-to-date reputational information.
  • the local reputation engines 510a-e can be operable to query the server 530 for reputation information. In some examples, the server 530 responds to the query with global reputation information 520.
  • the server 530 applies a local reputation bias to the global reputation 520.
  • the local reputation bias can perform a transform on the global reputation to provide the local reputation engines 510a-e with a global reputation vector that is biased based upon the preferences of the particular local reputation engine 510a-e which originated the query.
  • a local reputation engine 510a with an administrator or user(s) that has indicated a high tolerance for spam messages can receive a global reputation vector that accounts for an indicated tolerance.
  • the particular components of the reputation vector returns to the reputation engine 510a might include portions of the reputation vector that are deemphasized with relationship to the rest of the reputation vector.
  • a local reputation engine 510b that has indicated, for example, a low tolerance communications from entities with reputations for originating viruses may receive a reputation vector that amplifies the components of the reputation vector that relate to virus reputation.
  • FIG. 6 is a block diagram illustrating a determination of a global reputation based on local reputation feedback.
  • a local reputation engine 600 is operable to send a query through a network 610 to a server 620.
  • the local reputation engine 600 originates a query in response to receiving a communication from an unknown entity.
  • the local reputation engine 600 can originate the query responsive to receiving any communications, thereby promoting use of more up-to-date reputation information.
  • the server 620 is operable to respond to the query with a global reputation determination.
  • the central server 620 can derive the global reputation using a global reputation aggregation engine 630.
  • the global reputation aggregation engine 630 is operable to receive a plurality of local reputations 640 from a respective plurality of local reputation engines.
  • the plurality of local reputations 640 can be periodically sent by the reputation engines to the server 620.
  • the plurality of local reputations 640 can be retrieved by the server upon receiving a query from one of the local reputation engines 600.
  • the local reputations can be combined using confidence values related to each of the local reputation engines and then accumulating the results.
  • the confidence value can indicate the confidence associated with a local reputation produced by an associated reputation engine.
  • Reputation engines associated with individuals for example, can receive a lower weighting in the global reputation determination.
  • local reputations associated with reputation engines operating on large networks can receive greater weight in the global reputation determination based upon the confidence value associated with that reputation engine.
  • the confidence values 650 can be based upon feedback received from users. For example, a reputation engine that receives a lot of feedback indicating that communications were not properly handled because local reputation information 640 associated with the communication indicated the wrong action can be assigned low confidence values 650 for local reputations 640 associated with those reputation engines. Similarly, reputation engines that receive feedback indicating that the communications were handled correctly based upon local reputation information 640 associated with the communication indicated the correct action can be assigned a high confidence value 650 for local reputations 640 associated with the reputation engine.
  • Adjustment of the confidence values associated with the various reputation engines can be accomplished using a tuner 660, which is operable to receive input information and to adjust the confidence values based upon the received input, hi some examples, the confidence values 650 can be provided to the server 620 by the reputation engine itself based upon stored statistics for incorrectly classified entities. In other examples, information used to weight the local reputation information can be communicated to the server 620.
  • a bias 670 can be applied to the resulting global reputation vector.
  • the bias 670 can normalize the reputation vector to provide a normalized global reputation vector to a reputation engine 600.
  • the bias 670 can be applied to account for local preferences associated with the reputation engine 600 originating the reputation query.
  • a reputation engine 600 can receive a global reputation vector matching the defined preferences of the querying reputation engine 600.
  • the reputation engine 600 can take an action on the communication based upon the global reputation vector received from the server 620.
  • FIG. 7 is a block diagram illustrating an example resolution between a global reputation and a local reputation.
  • the local security agent 700 communicates with a server 720 to retrieve global reputation information from the server 720.
  • the local security agent 700 can receive a communication at 702.
  • the local security agent can correlate the communication to identify attributes of the message at 704.
  • the attributes of the message can include, for example, an originating entity, a fingerprint of the message content, a message size, etc.
  • the local security agent 700 includes this information in a query to the server 720. In other examples, the local security agent
  • the 700 can forward the entire message to the server 720, and the server can perform the correlation and analysis of the message.
  • the server 720 uses the information received from the query to determine a global reputation based upon a configuration 725 of the server 720.
  • the configuration 725 can include a plurality of reputation information, including both information indicating that a queried entity is non-reputable 730 and information indicating that a queried entity is reputable 735.
  • the configuration 725 can also apply a weighting 740 to each of the aggregated reputations 730, 735.
  • a reputation score determinator 745 can provide the engine for weighting 740 the aggregated reputation information 730, 735 and producing a global reputation vector.
  • the local security agent 700 then sends a query to a local reputation engine at 706.
  • the local reputation engine 708 performs a determination of the local reputation and returns a local reputation vector at 710.
  • the local security agent 700 also receives a response to the reputation query sent to the server 720 in the form of a global reputation vector.
  • the local security agent 700 then mixes the local and global reputation vectors together at 712. An action is then taken with respect to the received message at 714.
  • FIG. 8 is an example graphical user interface 800 for adjusting the settings of a filter associated with a reputation server.
  • the graphical user interface 800 can allow the user of a local security agent to adjust the settings of a local filter in several different categories 810, such as, for example, "Virus,” “Worms,” “Trojan Horse,” “Phishing,” “Spyware,” “Spam,” “Content,” and “Bulk.”
  • categories 810 depicted are merely examples, and that the disclosure is not limited to the categories 810 chosen as examples here.
  • the categories 810 can be divided into two or more types of categories.
  • the categories 810 of FIG. 8 are divided into a "Security Settings" type 820 of category 810, and a "Policy Settings” type 830 of category.
  • a mixer bar representation 840 can allow the user to adjust the particular filter setting associated with the respective category 810 of communications or entity reputations.
  • categories 810 of "Policy Settings” type 830 can be adjusted freely based upon the user's own judgment
  • categories of "Security Settings” type 820 can be limited to adjustment within a range. This distinction can be made in order to prevent a user from altering the security settings of the security agent beyond an acceptable range. For example, a disgruntled employee could attempt to lower the security settings, thereby leaving an enterprise network vulnerable to attack.
  • the ranges 850 placed on categories 810 in the "Security Settings" type 820 are operable to keep security at a minimum level to prevent the network from being compromised.
  • the "Policy Settings" type 830 categories 810 are those types of categories 810 that would not compromise the security of a network, but might only inconvenience the user or the enterprise if the settings were lowered.
  • range limits 850 can be placed upon all of the categories 810.
  • the local security agent would prevent users from setting the mixer bar representation 840 outside of the provided range 850.
  • the ranges may not be shown on the graphical user interface 800. Instead, the range 850 would be abstracted out of the graphical user interface 800 and all of the settings would be relative settings.
  • the category 810 could display and appear to allow a full range of settings, while transforming the setting into a setting within the provided range.
  • the "Virus" category 810 range 850 is provided in this example as being between level markers 8 and 13.
  • the "Virus" category 810 would allow setting of the mixer bar representation 840 anywhere between 0 and 14. However, the graphical user interface 800 could transform the 0- 14 setting to a setting within the 8 to 13 range 850. Thus, if a user requested a setting of midway between 0 and 14, the graphical user interface could transform that setting into a setting of midway between 8 and 13.
  • FIG. 9 is a block diagram illustrating reputation based connection throttling for voice over internet protocol (VoIP) or short message service (SMS) communications.
  • VoIP voice over internet protocol
  • SMS short message service
  • an originating IP phone 900 can place a VoIP call to a receiving IP phone 910.
  • These IP phones 900, 910 can be, for example, computers executing soft-phone software, network enabled phones, etc.
  • the originating IP o phone 900 can place a VoIP call through a network 920 (e.g., the internet).
  • the receiving IP phone 910 can receive the VoIP call through a local network 930 (e.g., an enterprise network).
  • the originating IP phone Upon establishing a VoIP call, the originating IP phone has established a connection to the local network 930.
  • This connection can be exploited similarly to the 5 way e-mail, web, instant messaging, or other internet applications can be exploited for providing unregulated connect to a network.
  • a connection to a receiving IP phone can be exploited, thereby putting computers 940, 950 operating on the local network 930 at risk for intrusion, viruses, trojan horses, worms, and various other types of attacks based upon the established connection.
  • these communications are typically not examined to ensure that the connection is not being misused. For example, voice conversations occur in real-time.
  • a local security agent 960 can use reputation information received from a reputation engine or server 970 to determine a reputation associated with the originating IP phone.
  • the local security agent 960 can use the reputation of the originating entity to determine whether to allow a connection to the originating entity.
  • the security agent 960 can prevent connections to non-reputable entities, as0 indicated by reputations that do not comply with the policy of the local security agent
  • the local security agent 960 can include a connection throttling engine operable to control the flow rate of packets being transmitted using the connection established between the originating IP phone 900 and the receiving IP phone 910.
  • a connection throttling engine operable to control the flow rate of packets being transmitted using the connection established between the originating IP phone 900 and the receiving IP phone 910.
  • an originating entities 900 with a non-reputable reputation can be allowed to make a connection to the receiving IP phone 910.
  • the packet throughput will be capped, thereby preventing the originating entity 900 from exploiting the connection to attack the local network 930.
  • the throttling of the connection can be accomplished by performing a detailed inspection of any packets originating from non-reputable entities. As discussed above, the detailed inspection of all VoIP packets is not efficient.
  • QoS quality of service
  • Standard communication interrogation techniques can be performed on connections associated with non- reputable entities in order to discover whether any of the transmitted packets received from the originating entity comprise a threat to the network 930.
  • Various interrogation techniques and systems are described in U.S. Patent No. 6,941,467, No. 7,089,590, No. 7,096,498, and No. 7,124,438 and in U.S. Patent Application Nos. 2006/0015942, 2006/0015563, 2003/0172302, 2003/0172294, 2003/0172291, and 2003/0172166, which are hereby incorporated by reference.
  • the load balancer 1000 is operable to receive communications from reputable and non-reputable entities 1010, 1020 (respectively) through a network 1030 (e.g., the internet).
  • the load balancer 1000 communicates with a reputation engine 1040 to determine the reputation of entities 1010, 1020 associated with incoming or outgoing communications.
  • the reputation engine 1030 is operable to provide the load balancer with a reputation vector.
  • the reputation vector can indicate the reputation of the entity 1010, 1020 associated with the communication in a variety of different categories. For example, the reputation vector might indicate a good reputation for an entity 1010, 1020 with respect to the entity 1010, 1020 originating spam, while also indicating a poor reputation for the same entity 1010, 1020 with respect to that entity 1010, 1020 originating viruses.
  • the load balancer 1000 can use the reputation vector to determine what action to perform with respect to a communication associated with that entity 1010, 1020. In situations where a reputable entity 1010 is associated with the communication, the message is sent to a message transfer agent (MTA) 1050 and delivered to a recipient 1060.
  • MTA message transfer agent
  • the communication is forwarded to one of a plurality of virus detectors 1070.
  • the load balancer 1000 is operable to determine which of the plurality of virus detectors 1070 to use based upon the current capacity of the virus detectors and the reputation of the originating entity. For example, the load balancer 1000 could send the communication to the least utilized virus detector.
  • the load balancer 1000 might determine a degree of non-reputability associated with the originating entity and send slightly non-reputable communications to the least utilized virus detectors, while sending highly non-reputable communications to a highly utilized virus detector, thereby throttling the QoS of a connection associated with a highly non-reputable entity.
  • the load balancer can send the communication to specialized spam detectors 1080 to the exclusion of other types of testing. It should be understood that in situations where a communication is associated with a non-reputable entity 1020 that originates multiple types of non-reputable activity, the communication can be sent to be tested for each of the types of non-reputable activity that the entity 1020 is known to display, while avoiding tests associated with non-reputable activity that the entity 1020 is not known to display.
  • every communication can receive routine testing for multiple types of non-legitimate content. However, when an entity 1020 associated with the communication shows a reputation for certain types of activity, the communication can also be quarantined for detailed testing for the content that the entity shows a reputation for originating. In yet further examples, every communication may receive the same type of testing. However, communications associated with reputable entities 1010 is sent to the testing modules with the shortest queue or to testing modules with spare processing capacity. On the other hand, communications associated with non- reputable entities 1020 is sent to testing modules 1070, 1080with the longest queue.
  • communications associated with reputable entities 1010 can receive priority in delivery over communications associated with non-reputable entities. Quality of service is therefore maximized for reputable entitieslOlO, while being reduced for non-reputable entities 1020.
  • reputation based load balancing can protect the network from exposure to attack by reducing the ability of a non-reputable entity to connect to the network 930.
  • FIG. 11 A is a flowchart illustrating an example operational scenario for collection of geolocation based data for authentication analysis.
  • the operational scenario collects data from various login attempts.
  • Step 1100 can be performed for example by a local security agent, such as the security agent 100 of
  • the collected data can include IP address associated with the login attempt, time of the login attempt, number of login attempts before successful, or the details of any unsuccessful passwords attempted, among many other types of information.
  • the collected data is then analyzed in step 1105 to derive statistical information such as, for example, a geographical location of the login attempts.
  • Step 1105 can be performed, for example, by a reputation engine.
  • the statistical information associated with the login attempts is then stored at step 1110.
  • the storing can be performed, for example, by a system data store.
  • FIG. HB is a flowchart illustrating an example operational scenario for geolocation based authentication.
  • a login attempt is received at step 1115.
  • the login attempt can be received for example, by a secure web server operable to provide secure financial data over a network. It is then determined whether the login attempt matches a stored username and password combination at step 1120.
  • Step 1120 can be performed, for example, by a secure server operable to authenticate login attempts. If the username and password do not match a stored username/password combination, the login attempt is declared a failure at step 1125. However, if the username and password do match a legitimate username/password combination, the origin of the login attempt is ascertained at step 1130. The origin of the login attempt can be determined by a local security agent 100 as described in FIG. 1.
  • the origin of the login attempt can be determined by a reputation engine.
  • the origin of the login attempt can then be compared with the statistical information derived in FIG. HA, as shown in step 1135.
  • Step 1135 can be performed, for example, by a local security agent 100 or by a reputation engine. It is determined whether the origin matches statistical expectations at step 1140. If the actual origin matches statistical expectations, the user is authenticated at step 1145.
  • step 1150 further processing is performed in step 1150.
  • further processing can include requesting further information from the user to verify his or her authenticity.
  • Such information can include, for example, home address, mother's maiden name, place of birth, or any other piece of information known about the user (e.g., secret question).
  • additional processing can include searching previous login attempts to determine whether the location of the current login attempt is truly anomalous or merely coincidental.
  • a reputation associated with the entity originating the login attempt can be derived and used to determine whether to allow the login.
  • FIG. HC is a flowchart illustrating another example operational scenario for geolocation based authentication using reputation of an originating entity to confirm authentication.
  • a login attempt is received at step 1155.
  • the login attempt can be received for example, by a secure web server operable to provide secure financial data over a network. It is then determined whether the login attempt matches a stored username and password combination at step 1160.
  • Step 1160 can be performed, for example, by a secure server operable to authenticate login attempts. If the username and password do not match a stored username/password combination, the login attempt is declared a failure at step 1165. However, if the username and password do match a legitimate username/password combination, the origin of the login attempt is ascertained at step 1170.
  • the origin of the login attempt can be determined by a local security agent 100 as described in FIG. 1.
  • the origin of the login attempt can be determined by a reputation engine.
  • a reputation associated with the entity originating the login attempt can then be retrieved, as shown in step 1175.
  • Step 1175 can be performed, for example, by a reputation engine. It is determined whether the reputation of the originating entity is reputable at step 1180. If the originating entity is reputable, the user is authenticated at step 1185.
  • step 1190 further processing is performed in step 1190.
  • further processing can include requesting further information from the user to verify his or her authenticity.
  • Such information can include, for example, home address, mother's maiden name, place of birth, or any other piece of information known about the user (e.g., secret question).
  • additional processing can include searching previous login attempts to determine whether the location of the current login attempt is truly anomalous or merely coincidental.
  • reputation systems can be applied to identifying fraud in financial transactions.
  • the reputation system can raise the risk score of a transaction depending on the reputation of the transaction originator or the data in the actual transaction (source, destination, amount, etc). In such situations, the financial institution can better determine the probability that a particular transaction is fraudulent based upon the reputation of the originating entity.
  • FIG. 12 is a flowchart illustrating an example operational scenario for a reputation based dynamic quarantine. Communications are received at step 1200. The communications are then analyzed to determine whether they are associated with an unknown entity at step 1205. It should be noted, however, that this operational scenario could be applied to any communications received, not merely communications received from previously unknown entities. For example, communications received from a non-reputable entity could be dynamically quarantined until it is determined that the received communications do no pose a threat to the network. Where the communications are not associated with a new entity, the communications undergo normal processing for incoming communications as shown in step 1210. If the communications are associated with a new entity, a dynamic quarantine counter is initialized in step 1215. Communications received from the new entity are then sent to a dynamic quarantined at step 1220.
  • the counter is then checked to determine whether the counter has elapsed in step 1225. If the counter has not elapsed, the counter is decremented in step 1230.
  • the behavior of the entity as well as the quarantined communications can be analyzed in step 1235. A determination is made whether the quarantined communications or behavior of the entity is anomalous in step 1240. If there is no anomaly found, the operational scenario returns to step 1220, where new communications are quarantined. However, if the communications or behavior of the entity are found to be anomalous in step 1240, a non-reputable reputation is assigned to the entity in step 1245.
  • the process ends by sending notification to an administrator or recipients of communications sent by the originating entity.
  • step 1220 the process of quarantining and examining communications and entity behavior continues until anomalous behavior is discovered, or until the dynamic quarantine counter elapses in step 1225. If the dynamic quarantine counter elapses, a reputation is assigned to the entity at step 1255. Alternatively, in situations where the entity is not an unknown entity, the reputation would be updated in steps 1245 or 1255.
  • the operational scenario ends at step 1260 by releasing the dynamic quarantine where the dynamic quarantine counter has elapsed without discovery of an anomaly in the communications or in the originating entity behavior.
  • FIG. 13 is an example graphical user interface 1300 display of an image spam communication which can be classified as an unwanted image or message.
  • image spam poses a problem for traditional spam filters.
  • Image spam bypasses the traditional textual analysis of spam by converting the text message of the spam into an image format.
  • FIG. 13 shows an example of image spam.
  • the message shows an image 1310. While the image 1300 appears to be textual, it is merely the graphic encoding of a textual message.
  • Image spam also typically includes a textual message 1320 comprising sentences which are structured correctly, but make no sense in the context of the message.
  • the message 1320 is designed to elude spam filters that key on communications that only include an image 1310 within the communication.
  • the message 1320 is designed to trick filters that apply superficial testing to the text of a communication that includes an image 1310. Further, while these messages do include information about the origination of the message in the header 1330, an entity's reputation for originating image spam might not be known until the entity is caught sending image spam.
  • FIG. 14 is a flowchart illustrating an example operational scenario for detecting unwanted images (e.g., image spam). It should be understood that many of the steps shown in FIG. 14 can be performed alone or in combination with any or all of the other steps shown in FIG. 14 to provide some detection of image spam. However, the use of each of the steps in FIG. 14 provides a comprehensive process for detecting image spam.
  • unwanted images e.g., image spam
  • Step 1400 typically includes analyzing the communication to determine whether the communication includes an image that is subject to image spam processing.
  • the operational scenario performs a structural analysis of the communication to determine whether the image comprises spam.
  • the header of the image is then analyzed in step 1420. Analysis of the image header allows the system to determine whether anomalies exist with respect to the image format itself (e.g., protocol errors, corruption, etc.).
  • the features of the image are analyzed in step 1430. The feature analysis is intended to determine whether any of the features of the image are anomalous.
  • the image can be normalized in step 1440. Normalization of an image typically includes removal of random noise that might be added by a spammer to avoid image fingerprinting techniques. Image normalization is intended to convert the image into a format that can be easily compared among images. A fingerprint analysis can be performed on the normalized image to determine whether the image matches images from previously received known image spam.
  • FIG. 15A is a flowchart illustrating an operational scenario for analyzing the structure of a communication.
  • the operational scenario begins at step 1500 with analysis of the message structure.
  • the hypertext markup language is a hypertext markup language
  • HTML HTML
  • HTML structure of the communication is analyzed to introduce n-gram tags as additional tokens to a Bayesian analysis.
  • Such processing can analyze the text 1320 that is included in an image spam communication for anomalies.
  • the HTML structure of the message can be analyzed to define meta-tokens.
  • Meta-tokens are the HTML content of the message, processed to discard any irrelevant HTML tags and compressed by removing white space to create a "token" for Bayesian analysis.
  • Each of the above described tokens can be used as input to a Bayesian analysis for comparison to previously received communications.
  • the operational scenario then includes image detection at step 1515.
  • the image detection can include partitioning the image into a plurality of pieces and performing fingerprinting on the pieces to determine whether the fingerprints match pieces of previously received images.
  • FIG. 15B is a flowchart illustrating an operational scenario for analyzing the features of an image to extract features of the message for input into a clustering engine to identify components of the image which align with known image spam.
  • the operational scenario begins at step 1520 where a number of high level features of the image are detected for use in a machine learning algorithm. Such features can include values such as the number of unique colors, number of noise black pixels, number of edges in horizontal direction (sharp transitions between shapes), etc.
  • One of the features extracted by the operational scenario can include the number of histogram modes of the image, as show at step 1525. The number of modes is yielded by an examination of spectral intensity of the image.
  • artificial images will typically include fewer modes than natural images, because natural image colors are typically spread through a broad spectrum.
  • the features extracted from the image can be used to identify anomalies.
  • anomalies can include analyzing the characteristics of a message to determine a level of similarity of a number of features to the features of stored unwanted images.
  • the image features can also be analyzed for comparison with known reputable images to determine similarity to reputable images. It should be understood that none of the extracted features alone are determinative of a classification. For example, a specific feature might be associated with 60% of unwanted messages, while also being associated with 40% of wanted messages. Moreover, as the value associated with the feature changed, there might be a change in the probability that the message is wanted or unwanted. There are many features that can indicate a slight tendency. If each of these features are combined the image spam detection system can make classification decision.
  • the aspect ratio is then examined in step 1530 to determine whether there are any anomalies with respect to the image size or aspect.
  • anomalies in the aspect ratio could be indicated by similarity of the image size or aspect ratio to known sizes or aspect ratios which are common to known image spam.
  • image spam can come in specific sizes to make the image spam look more like common e-mail. Messages that include images which share a common size with known spam images are more likely to be spam themselves.
  • there are image sizes which are not conducive to spam e.g., a 1" x 1" square image might be difficult to read if a spammer inserted a message into the image).
  • Messages that include images which are known to be non-conducive to spam insertion are less likely to be image spam.
  • the aspect ratio of a message can be compared to common aspect ratios used in image spam to determine a probability that the image is an unwanted image or that the image is a reputable image.
  • the frequency distribution of the image is examined.
  • natural pictures have uniform frequency distribution with a relative scarcity of sharp frequency gradations.
  • image spam typically includes a choppy frequency distribution as a result of black letters being placed on a dark background.
  • the signal to noise ratio can be analyzed.
  • a high signal to noise ratio might indicate that a spammer may be trying to evade fingerprinting techniques by introducing noise into the image.
  • Increasing noise levels can thereby indicate an increasing probability that the image is an unwanted image.
  • the image can be subdivided into a plurality of subparts.
  • Each of the rectangles can be transformed into a frequency domain using a fast Fourier transform (FFT).
  • FFT fast Fourier transform
  • the predominance of frequencies in a plurality of directions can be extracted as features.
  • These subparts of the transformed image can also be examined to determine the amount of high frequencies and low frequencies.
  • the points that are further away from the origin represent higher frequencies.
  • these features can then be compared to known legitimate and unwanted images to determine which characteristics the unknown image shares with each type of known image.
  • the transformed (e.g., frequency domain) image can also be divided into subparts
  • FIG. 15C is a flowchart illustrating an operational scenario for normalizing the an image for spam processing.
  • obfuscation and noise is removed from the image.
  • these can be introduced by spammers to evade fingerprinting techniques such as hashing by varying the sum of the hash such that it does not match any previously received hash fingerprints of known image spam.
  • Obfuscation and noise removal can describe several techniques for removing artificial noise introduced by spammers. It should be understood that artificial noise can include techniques used by spammers such as banding (where a font included in the image is varied to vary the hash of the image).
  • An edge detection algorithm can be run on the normalized image at step 1550.
  • the edge detected image can be used provided to an optical character recognition engine to convert the edge detected image to text.
  • the edge detection can be used to remove unnecessary detail from the picture which can cause inefficiency in processing the image again other images.
  • median filtering can be applied.
  • the median filtering is applied to remove random pixel noise. Such random pixels can cause problems to content analysis of the image.
  • the median filtering can help to remove single pixel type of noise introduced by spammers. It should be understood that single pixel noise is introduced by spammers using an image editor to alter one or more pixels in the image, which can make the image appear grainy in some areas, thereby making the image more difficult to detect.
  • the image is quantized. Quantizing of the image remove unnecessary color information.
  • the color information typically requires more processing and is unrelated to the attempted propagation of the spam.
  • spammers could vary the color scheme in an image slightly and again vary the hash such that known image spam hashes would not match the derived hash from the color variant image spam.
  • contrast stretching is performed. Using contrast stretching the color scale in the image is maximized from black to white, even if the colors only vary through shades of gray.
  • the lightest shade of the image is assigned a white value, while the darkest shade in the image is assigned a black value. All other shades are assigned their relative position in the spectrum in comparison to the lightest and darkest shades in the original image.
  • Contrast stretching helps to define details in an image that may not make full use of the available spectrum and therefore can help to prevent spammers from using different pieces of the spectrum to avoid fingerprinting techniques. Spammers sometimes intentionally shift the intensity range of an image to defeat some types of feature identification engines. Contrast stretching can also help normalize an image such that it can be compared to other images to identify common features contained in the images. FIG.
  • the operational scenario begins a step 1570 by defining regions within an image. A winnowing algorithm is then performed on the defined regions to identify the relevant portions of the image upon which fingerprints should be taken at step 1575. At step 1580, the operational scenario fingerprints the resulting fragments from the winnowing operation and determines whether there is a match between the fingerprints of the received image an known spam images.
  • a similar winnowing fingerprint approach is described in United States Patent Application Publication No. 2006/0251068, which is hereby incorporated by reference.

Abstract

Methods and systems for operation upon one or more data processors for assigning reputation to web-based entities based upon previously collected data.

Description

WEB REPUTATION SCORING
TECHNICAL FIELD
This document relates generally to systems and methods for processing communications and more particularly to systems and methods for classifying entities associated with communications.
BACKGROUND
In the anti-spam industry, spammers use various creative means for evading detection by spam filters. As such, the entity from which a communication originated can provide another indication of whether a given communication should be allowed into an enterprise network environment.
However, current tools for message sender analysis include internet protocol (IP) blacklists (sometimes called real-time blacklists (RBLs)) and IP whitelists (realtime whitelists (RWLs)). Whitelists and blacklists certainly add value to the spam classification process; however, whitelists and blacklists are inherently limited to providing a binary-type (YES/NO) response to each query. Moreover, blacklists and whitelists treat entities independently, and overlook the evidence provided by various attributes associated with the entities.
SUMMARY
Systems and methods for web reputation scoring are provided. Systems used to assign reputation to web-based entities can include a communications interface, a communications analyzer, a reputation engine and a decision engine. The communications interface can receive a web communication, and the communication analyzer can analyze the web communication to determine an entity associated with the web communication. The reputation engine can provide a reputation associated with the entity based upon previously collected data associated with the entity, and the decision engine can determine whether the web communication is to be communicated to a recipient based upon the reputation. Methods of assigning reputation to web-based entities can include: receiving a hypertext transfer protocol communication at an edge protection device; identifying an entity associated with the received hypertext transfer protocol communication; querying reputation engine for a reputation indicator associated with the entity; receiving the reputation indicator from the reputation engine; and, taking an action with respect to the hypertext transfer protocol communication based upon the received reputation indicator associated with the entity.
Examples of computer readable media operating on a processor to perform to aggregate local reputation data to produce a global reputation vector, can perform the steps of: receiving a reputation query from a requesting local reputation engine; retrieving a plurality of local reputations the local reputations being respectively associated with a plurality of local reputation engines; aggregating the plurality of local reputations; deriving a global reputation from the aggregation of the local reputations; and, responding to the reputation query with the global reputation. Other example systems can include a communications interface and a reputation engine. The communications interface can receive global reputation information from a central server, the global reputation being associated with an entity. The reputation engine can bias the global reputation received from the central server based upon defined local preferences. Further example systems can include a communications interface, a reputation module and a traffic control module. The communications interface can receive distributed reputation information from distributed reputation engines. The reputation module can aggregate the distributed reputation information and derive a global reputation based upon the aggregation of the distributed reputation information, the reputation module can also derive a local reputation information based upon communications received by the reputation module. The traffic control module can determine handling associated with communications based upon the global reputation and the local reputation.
Systems and methods used to aggregate reputation information are provided. Systems used to aggregate reputation information can include a centralized reputation engine and an aggregation engine. The centralized reputation engine can receive feedback from a plurality of local reputation engines. The aggregation engine can derive a global reputation for a queried entity based upon an aggregation of the plurality of local reputations. The centralized reputation engine can further provide the global reputation of the queried entity to a local reputation engines responsive to receiving a reputation query from the local reputation engine. Methods of aggregating reputation information can include: receiving a reputation query from a requesting local reputation engine; retrieving a plurality of local reputations the local reputations being respectively associated with a plurality of local reputation engines; aggregating the plurality of local reputations; deriving a global reputation from the aggregation of the local reputations; and, responding to the reputation query with the global reputation.
Examples of computer readable media operating on a processor aggregate local reputation data to produce a global reputation vector, can perform the steps of: receiving a reputation query from a requesting local reputation engine; retrieving a plurality of local reputations the local reputations being respectively associated with a plurality of local reputation engines; aggregating the plurality of local reputations; deriving a global reputation from the aggregation of the local reputations; and, responding to the reputation query with the global reputation.
Other example reputation aggregations systems can include a communications interface and a reputation engine. The communications interface can receive global reputation information from a central server, the global reputation being associated with an entity. The reputation engine can bias the global reputation received from the central server based upon defined local preferences.
Further example reputation aggregation systems can include a communications interface, a reputation module and a traffic control module. The communications interface can receive distributed reputation information from distributed reputation engines. The reputation module can aggregate the distributed reputation information and derive a global reputation based upon the aggregation of the distributed reputation information, the reputation module can also derive a local reputation information based upon communications received by the reputation module. The traffic control module can determine handling associated with communications based upon the global reputation and the local reputation. Systems and methods for a reputation based network security system are provided. A reputation based network security system can include a communications interface, a communication analyzer, a reputation engine and a security engine. The communications interface can receive incoming and outgoing communications associated with a network. The communication analyzer can derive an external entity associated with a communication. The reputation engine can derive a reputation vector associated with the external entity. The security engine can receive the reputation vector and send the communication to interrogation engines, wherein the security engine determines which of the interrogation engines interrogate the communication based upon the reputation vector.
Other reputation based network security systems can include a communications interface, a communications analyzer, a reputation engine and a security engine. The communications interface can receive incoming and outgoing communications associated with a network. The communication analyzer can derive an external entity associated with a communication. The reputation engine can derive a reputation associated with the external entity. The security engine assigns priority information to a communication, wherein the security engine can assign a high priority to communications where the external entity is a reputable entity and can assign a low priority to communications where the external entity is a non-reputable entity, whereby the priority information is used by one or more interrogation engines to improve quality of service for reputable entities.
Methods of efficiently processing communication based on reputation for security threats can include: receiving a communication associated with an external entity based upon origination or destination information associated with the communication; identifying the external entity associated with the received communication; deriving a reputation associated with the external entity based upon reputable and non-reputable criteria associated with the external entity; assigning a priority to the communication based upon the derived reputation associated with the external entity; executing one or more tests on the communication based upon the priority assigned to the communication.
Methods of efficiently processing communication based on reputation can include: receiving a communication associated with an external entity based upon origination or destination information associated with the communication; identifying the external entity associated with the received hypertext transfer protocol communication; deriving a reputation associated with the external entity based upon reputable and non-reputable criteria associated with the external entity; assigning the communication to one or more interrogation engines selected from among a plurality of interrogation engines, the selection of the one or more interrogation engines being based upon the derived reputation associated with the external entity and capacity of the interrogation engines; and, executing said one or more interrogation engines on the communication. Systems and methods for reputation based connection throttling are provided.
Systems used for reputation based connection throttling can include a communications interface, a reputation engine and a connection control engine. The communications interface can receive connection requests associated with an external entity prior to a connection being established to the external entity. The reputation engine can derive a reputation associated with the external entity. The connection control engine can deny connection requests to a protected network based upon the derived reputation of the external entity.
Methods of throttling connection requests based upon reputation can include: receiving a connection request, the connection request being related to an external entity; querying a reputation engine for a reputation associated with the external entity; comparing the reputation to a policy associated with a protected enterprise network; permitting the connection request based upon determining that the reputation of the external entity related to the connection request complies with the policy; and, throttling the connection request based upon determining that the reputation of the external entity related to the voice over internet protocol connection request does not comply with the policy.
DESCRIPTION OF DRAWINGS
FIG. 1 is a block diagram depicting an example network in which systems and methods of this disclosure can operate. FIG. 2 is a block diagram depicting an example network architecture of this disclosure. FIG. 3 is a block diagram depicting an example of communications and entities including identifiers and attributes used to detect relationships between entities.
FIG. 4 is a flowchart depicting an operational scenario used to detect relationships and assign risk to entities.
FIG. 5 is a block diagram illustrating an example network architecture including local reputations stored by local security agents and a global reputation stored by one or more servers.
FIG. 6 is a block diagram illustrating a determination of a global reputation based on local reputation feedback.
FIG. 7 is a flow diagram illustrating an example resolution between a global reputation and a local reputation.
FIG. 8 is an example graphical user interface for adjusting the settings of a filter associated with a reputation server. FIG. 9 is a block diagram illustrating reputation based connection throttling for voice over internet protocol (VoIP) or short message service (SMS) communications.
FIG. 10 is a block diagram illustrating a reputation based load balancer.
FIG. HA is a flowchart illustrating an example operational scenario for geolocation based authentication. FIG. 1 IB is a flowchart illustrating another example operational scenario for geolocation based authentication.
FIG. HC is a flowchart illustrating another example operational scenario for geolocation based authentication.
FIG. 12 is a flowchart illustrating an example operational scenario for a reputation based dynamic quarantine.
FIG. 13 is an example graphical user interface display of an image spam communication.
FIG. 14 is a flowchart illustrating an example operational scenario for detecting image spam. FIG. 15A is a flowchart illustrating an operational scenario for analyzing the structure of a communication. FIG. 15B is a flowchart illustrating an operational scenario for analyzing the features of an image.
FIG. 15C is a flowchart illustrating an operational scenario for normalizing the an image for spam processing. FIG. 15D is a flowchart illustrating an operational scenario for analyzing the fingerprint of an image to find common fragments among multiple images.
DETAILED DESCRIPTION
FIG. 1 is a block diagram depicting an example network environment in which systems and methods of this disclosure can operate. Security agent 100 can typically reside between a firewall system (not shown) and servers (not shown) internal to a network 110 (e.g., an enterprise network). As should be understood, the network 110 can include a number of servers, including, for example, electronic mail servers, web servers, and various application servers as may be used by the enterprise associated with the network 110. The security agent 100 monitors communications entering and exiting the network 110. These communications are typically received through the internet 120 from many entities 130a-f that are connected to the internet 120. One or more of the entities 130a-f can be legitimate originators of communications traffic. However, one or more of the entities 130a-f can also be non-reputable entities originating unwanted communications. As such, the security agent 100 includes a reputation engine. The reputation engine can inspect a communication and to determine a reputation associated with an entity that originated the communication. The security agent 100 then performs an action on the communication based upon the reputation of the originating entity. If the reputation indicates that the originator of the communication is reputable, for example, the security agent can forward the communication to the recipient of the communication. However, if the reputation indicates that the originator of the communication is non-reputable, for example, the security agent can quarantine the communication, perform more tests on the message, or require authentication from the message originator, among many others. Reputation engines are described in detail in United States Patent Publication No. 2006/0015942, which is hereby incorporated by reference. FIG. 2 is a block diagram depicting an example network architecture of this disclosure. Security agents 100a-n are shown logically residing between networks 110a-n, respectively, and the internet 120. While not shown in FIG. 2, it should be understood that a firewall may be installed between the security agents 100a-n and the internet 120 to provide protection from unauthorized communications from entering the respective networks 110a-n. Moreover, intrusion detection systems (IDS) (not shown) can be deployed in conjunction with firewall systems to identify suspicious patterns of activity and to signal alerts when such activity is identified.
While such systems provide some protection for a network they typically do not address application level security threats. For example, hackers often attempt to use various network-type applications (e.g., e-mail, web, instant messaging (IM), etc.) to create a pre-textual connection with the networks 110a-n in order to exploit security holes created by these various applications using entities 130a-e. However, not all entities 130a-e imply threats to the network 110a-n. Some entities 130a-e originate legitimate traffic, allowing the employees of a company to communicate with business associates more efficiently. While examining the communications for potential threats is useful, it can be difficult to maintain current threat information because attacks are being continually modified to account for the latest filtering techniques. Thus, security agents 100a-n can run multiple tests on a communication to determine whether the communication is legitimate.
Furthermore, sender information included in the communication can be used to help determine whether or not a communication is legitimate. As such, sophisticated security agents 100a-n can track entities and analyze the characteristics of the entities to help determine whether to allow a communication to enter a network 110a-n. The entities 110a-n can then be assigned a reputation. Decisions on a communication can take into account the reputation of an entity 130a-e that originated the communication. Moreover, one or more central systems 200 can collect information on entities 120a-e and distribute the collected data to other central systems 200 and/or the security agents 100a-n. Reputation engines can assist in identifying the bulk of the malicious communications without extensive and potentially costly local analysis of the content of the communication. Reputation engines can also help to identify legitimate communications and prioritize their delivery and reduce the risk of misclassifying a legitimate communication. Moreover, reputation engines can provide a dynamic and predictive approaches to the problem of identifying malicious, as well as legitimate, transactions in physical or virtual worlds. Examples include the process of filtering malicious communications in an email, instant messaging, VoIP, SMS or other communication protocol system using analysis of the reputation of sender and content. A security agent 100a-n can then apply a global or local policy to determine what action to perform with respect to the communication (such as deny, quarantine, load balance, deliver with assigned priority, analyze locally with additional scrutiny) to the reputation result.
However, the entities 130a-e can connect to the internet in a variety of methods. As should be understood, an entity 130a-e can have multiple identifiers (such as, for example, e-mail addresses, IP addresses, identifier documentation, etc) at the same time or over a period of time. For example, a mail server with changing IP addresses can have multiple identities over time. Moreover, one identifier can be associated with multiple entities, such as, for example, when an IP address is shared by an organization with many users behind it. Moreover, the specific method used to connect to the internet can obscure the identification of the entity 130a-e. For example, an entity 130b may connect to the internet using an internet service provider (ISP) 200. Many ISPs 200 use dynamic host configuration protocol (DHCP) to assign
IP addresses dynamically to entities 130b requesting a connection. Entities 130a-e can also disguise their identity by spoofing a legitimate entity. Thus, collecting data on the characteristics of each entity 130a-e can help to categorize an entity 130a-e and determine how to handle a communication. The ease of creation and spoofing of identities in both virtual and physical world can create an incentive for users to act maliciously without bearing the consequences of that act. For example, a stolen IP address on the Internet (or a stolen passport in the physical world) of a legitimate entity by a criminal can enable that criminal to participate in malicious activity with relative ease by assuming the stolen identity. However, by assigning a reputation to the physical and virtual entities and recognizing the multiple identities that they can employ, reputation systems can influence reputable and non-reputable entities to operate responsibly for fear of becoming non-reputable, and being unable to correspond or interact with other network entities.
FIG. 3 is a block diagram depicting an example of communications and entities including using identifiers and attributes used to detect relationships between entities. Security agents 100a-b can collect data by examining communications that are directed to an associated network. Security agents 100a-b can also collect data by examining communications that are relayed by an associated network. Examination and analysis of communications can allow the security agents 100a-b to collect information about the entities 300a-c sending and receiving messages, including transmission patterns, volume, or whether the entity has a tendency to send certain kinds of message (e.g., legitimate messages, spam, virus, bulk mail, etc.), among many others.
As shown in FIG. 3, each of the entities 300a-c is associated with one or more identifiers 310a-c, respectively. The identifiers 310a-c can include, for example, IP addresses, universal resource locator (URL), phone number, IM username, message content, domain, or any other identifier that might describe an entity. Moreover, the identifiers 3 lOa-c are associated with one or more attributes 320a-c. As should be understood, the attributes 320a-c are fitted to the particular identifier 31 Oa-c that is being described. For example, a message content identifier could include attributes such as, for example, malware, volume, type of content, behavior, etc. Similarly, attributes 320a-c associated with an identifier, such as EP address, could include one or more IP addresses associated with an entity 300a-c.
Furthermore, it should be understood that this data can be collected from communications 330a-c (e.g., e-mail) typically include some identifiers and attributes of the entity that originated the communication. Thus, the communications 330a-c provide a transport for communicating information about the entity to the security agents 100a, 100b. These attributes can be detected by the security agents 100a, 100b through examination of the header information included in the message, analysis of the content of the message, as well as through aggregation of information previously collected by the security agents 100a, 100b (e.g., totaling the volume of communications received from an entity). The data from multiple security agents 100a, 100b can be aggregated and mined. For example, the data can be aggregated and mined by a central system which receives identifiers and attributes associated with all entities 300a-c for which the security agents 100a, 100b have received communications. Alternatively, the security agents 100a, 100b can operate as a distributed system, communicating identifier and attribute information about entities 300a-c with each other. The process of mining the data can correlate the attributes of entities 300a-c with each other, thereby determining relationships between entities 300a-c (such as, for example, correlations between an event occurrence, volume, and/or other determining factors). These relationships can then be used to establish a multi-dimensional reputation "vector" for all identifiers based on the correlation of attributes that have been associated with each identifier. For example, if a non-reputable entity 300a with a known reputation for being non-reputable sends a message 330a with a first set of attributes 350a , and then an unknown entity 300b sends a message 330b with a second set of attributes 350b, the security agent 100a can determine whether all or a portion of the first set of attributes 350a matched all or a portion of the second set of attributes 350b. When some portion of the first set of attributes 350a matches some portion of the second set of attributes 330b, a relationship can be created depending upon the particular identifier 320a, 320b that included the matching attributes 33Oa, 330b. The particular identifiers 340a, 340b which are found to have matching attributes can be used to determine a strength associated with the relationship between the entities 300a, 300b. The strength of the relationship can help to determine how much of the non-reputable qualities of the non-reputable entity 300a are attributed to the reputation of the unknown entity 300b. However, it should also be recognized that the unknown entity 300b may originate a communication 330c which includes attributes 350c that match some attributes 350d of a communication 330d originating from a known reputable entity 300c. The particular identifiers 340c, 34Od which are found to have matching attributes can be used to determine a strength associated with the relationship between the entities 300b, 300c. The strength of the relationship can help to determine how much of the reputable qualities of reputable entity 300c are attributed to the reputation of the unknown entity 300b. A distributed reputation engine also allows for real-time collaborative sharing of global intelligence about the latest threat landscape, providing instant protection benefits to the local analysis that can be performed by a filtering or risk analysis system, as well as identify malicious sources of potential new threats before they even occur. Using sensors positioned at many different geographical locations information about new threats can be quickly and shared with the central system 200, or with the distributed security agents 100a, 100b. As should be understood, such distributed sensors can include the local security agents 100a, 100b, as well as local reputation clients, traffic monitors, or any other device suitable for collecting communication data (e.g., switches, routers, servers, etc.).
For example, security agents 100a, 100b can communicate with a central system 200 to provide sharing of threat and reputation information. Alternatively, the security agents 100a, 100b can communicate threat and reputation information between each other to provide up to date and accurate threat information. In the example of FIG. 3, the first security agent 100a has information about the relationship between the unknown entity 300b and the non-reputable entity 300a, while the second security agent 100b has information about the relationship between the unknown entity 300b and the reputable entity 300c. Without sharing the information, the first security agent 100a may take a particular action on the communication based upon the detected relationship. However, with the knowledge of the relationship between the unknown entity 300b and the reputable entity 300c, the first security agent 100a might take a different action with a received communication from the unknown entity 300b. Sharing of the relationship information between security agents, thus provides for a more complete set of relationship information upon which a determination will be made.
The system attempts to assign reputations (reflecting a general disposition and/or categorization) to physical entities, such as individuals or automated systems performing transactions, hi the virtual world, entities are represented by identifiers (ex. IPs, URLs, content) that are tied to those entities in the specific transactions (such as sending a message or transferring money out of a bank account) that the entities are performing. Reputation can thus be assigned to those identifiers based on their overall behavioral and historical patterns as well as their relationship to other identifiers, such as the relationship of IPs sending messages and URLs included in those messages. A "bad" reputation for a single identifier can cause the reputation of other neighboring identifiers to worsen, if there is a strong correlation between the identifiers. For example, an IP that is sending URLs which have a bad reputation will worsen its own reputation because of the reputation of the URLs. Finally, the individual identifier reputations can be aggregated into a single reputation (risk score) for the entity that is associated with those identifiers
It should be noted that attributes can fall into a number of categories. For example, evidentiary attributes can represent physical, digital, or digitized physical data about an entity. This data can be attributed to a single known or unknown entity, or shared between multiple entities (forming entity relationships). Examples of evidentiary attributes relevant to messaging security include IP (internet protocol) address, known domain names, URLs, digital fingerprints or signatures used by the entity, TCP signatures, and etcetera. As another example, behavioral attributes can represent human or machine- assigned observations about either an entity or an evidentiary attribute. Such attributes may include one, many, or all attributes from one or more behavioral profiles. For example, a behavioral attribute genetically associated with a spammer may by a high volume of communications being sent from that entity. A number of behavioral attributes for a particular type of behavior can be combined to derive a behavioral profile. A behavioral profile can contain a set of predefined behavioral attributes. The attributive properties assigned to these profiles include behavioral events relevant to defining the disposition of an entity matching the profile. Examples of behavioral profiles relevant to messaging security might include, "Spammer", "Scammer", and "Legitimate Sender". Events and/or evidentiary attributes relevant to each profile define appropriate entities to which a profile should be assigned. This may include a specific set of sending patterns, blacklist events, or specific attributes of the evidentiary data. Some examples include: Sender/Receiver Identification; Time Interval and sending patterns; Severity and disposition of payload; Message construction; Message quality; Protocols and related signatures;
Communications medium It should be understood that entities sharing some or all of the same evidentiary attributes have an evidentiary relationship. Similarly, entities sharing behavioral attributes have a behavioral relationship. These relationships help form logical groups of related profiles, which can then be applied adaptively to enhance the profile or identify entities slightly more or less standard with the profiles assigned.
FIG. 4 is a flowchart depicting an operational scenario 400 used to detect relationships and assign risk to entities. The operational scenario begins at step 410 by collecting network data. Data collection can be done, for example, by a security agent 100, a client device, a switch, a router, or any other device operable to receive communications from network entities (e.g., e-mail servers, web servers, IM servers,
ISPs, file transfer protocol (FTP) servers, gopher servers, VoIP equipments, etc.).
At step 420 identifiers are associated with the collected data (e.g., communication data). Step 420 can be performed by a security agent 100 or by a central system 200 operable to aggregate data from a number of sensor devices, including, for example, one or more security agents 100. Alternatively, step 420 can be performed by the security agents 100 themselves. The identifiers can be based upon the type of communication received. For example, an e-mail can include one set of information (e.g., IP address of originator and destination, text content, attachment, etc.), while a VoIP communication can include a different set of information (e.g., originating phone number (or IP address if originating from a VoIP client), receiving phone number (or IP address if destined for a VoIP phone), voice content, etc.). Step 420 can also include assigning the attributes of the communication with the associated identifiers.
At step 430 the attributes associated with the entities are analyzed to determine whether any relationships exist between entities for which communications information has been collected. Step 430 can be performed, for example, by a central system 200 or one or more distributed security agents 100. The analysis can include comparing attributes related to different entities to find relationships between the entities. Moreover, based upon the particular attribute which serves as the basis for the relationship, a strength can be associated with the relationship.
At step 440 a risk vector is assigned to the entities. As an example, the risk vector can be assigned by the central system 200 or by one or more security agents 100. The risk vector assigned to an entity 130 (FIGS. 1-2), 300 (FIG. 3) can be based upon the relationship found between the entities and on the basis of the identifier which formed the basis for the relationship.
At step 450, an action can be performed based upon the risk vector. The action can be performed, for example, by a security agent 100. The action can be performed on a received communication associated with an entity for which a risk vector has been assigned. The action can include any of allow, deny, quarantine, load balance, deliver with assigned priority, or analyze locally with additional scrutiny, among many others. However, it should be understood that a reputation vector can be derived separately
FIG. 5 is a block diagram illustrating an example network architecture including local reputations 500a-e derived by local reputation engines 510a-e and a global reputation 520 stored by one or more servers 530. The local reputation engines 510a-e, for example, can be associated with local security agents such as security agents 100. Alternatively, the local reputation engines 510a-e can be associated, for example, with a local client. Each of the reputation engines 510a-e includes a list of one or more entities for which the reputation engine 510a-e stores a derived reputation 500a-e.
However, these stored derived reputations can be inconsistent between reputation engines, because each of the reputation engines may observe different types of traffic. For example, reputation engine 1 510a may include a reputation that indicates a particular entity is reputable, while reputation engine 2 510b may include a reputation that indicates that the same entity is non-reputable. These local reputational inconsistencies can be based upon different traffic received from the entity. Alternatively, the inconsistencies can be based upon the feedback from a user of local reputation engine 1 510a indicating a communication is legitimate, while a user of local reputation engine 2 510b provides feedback indicating that the same communication is not legitimate.
The server 530 receives reputation information from the local reputation engines 510a-e. However, as noted above, some of the local reputation information may be inconsistent with other local reputation information. The server 530 can arbitrate between the local reputations 500a-e to determine a global reputation 520 based upon the local reputation information 500a-e. In some examples, the global reputation information 520 can then be provided back to the local reputation engines 510a-e to provide these local engines 510a-e with up-to-date reputational information. Alternative, the local reputation engines 510a-e can be operable to query the server 530 for reputation information. In some examples, the server 530 responds to the query with global reputation information 520.
In other examples, the server 530 applies a local reputation bias to the global reputation 520. The local reputation bias can perform a transform on the global reputation to provide the local reputation engines 510a-e with a global reputation vector that is biased based upon the preferences of the particular local reputation engine 510a-e which originated the query. Thus, a local reputation engine 510a with an administrator or user(s) that has indicated a high tolerance for spam messages can receive a global reputation vector that accounts for an indicated tolerance. The particular components of the reputation vector returns to the reputation engine 510a might include portions of the reputation vector that are deemphasized with relationship to the rest of the reputation vector. Likewise, a local reputation engine 510b that has indicated, for example, a low tolerance communications from entities with reputations for originating viruses may receive a reputation vector that amplifies the components of the reputation vector that relate to virus reputation. FIG. 6 is a block diagram illustrating a determination of a global reputation based on local reputation feedback. A local reputation engine 600 is operable to send a query through a network 610 to a server 620. In some examples, the local reputation engine 600 originates a query in response to receiving a communication from an unknown entity. Alternatively, the local reputation engine 600 can originate the query responsive to receiving any communications, thereby promoting use of more up-to-date reputation information.
The server 620 is operable to respond to the query with a global reputation determination. The central server 620 can derive the global reputation using a global reputation aggregation engine 630. The global reputation aggregation engine 630 is operable to receive a plurality of local reputations 640 from a respective plurality of local reputation engines. In some examples, the plurality of local reputations 640 can be periodically sent by the reputation engines to the server 620. Alternatively, the plurality of local reputations 640 can be retrieved by the server upon receiving a query from one of the local reputation engines 600.
The local reputations can be combined using confidence values related to each of the local reputation engines and then accumulating the results. The confidence value can indicate the confidence associated with a local reputation produced by an associated reputation engine. Reputation engines associated with individuals, for example, can receive a lower weighting in the global reputation determination. In contrast, local reputations associated with reputation engines operating on large networks can receive greater weight in the global reputation determination based upon the confidence value associated with that reputation engine.
In some examples, the confidence values 650 can be based upon feedback received from users. For example, a reputation engine that receives a lot of feedback indicating that communications were not properly handled because local reputation information 640 associated with the communication indicated the wrong action can be assigned low confidence values 650 for local reputations 640 associated with those reputation engines. Similarly, reputation engines that receive feedback indicating that the communications were handled correctly based upon local reputation information 640 associated with the communication indicated the correct action can be assigned a high confidence value 650 for local reputations 640 associated with the reputation engine. Adjustment of the confidence values associated with the various reputation engines can be accomplished using a tuner 660, which is operable to receive input information and to adjust the confidence values based upon the received input, hi some examples, the confidence values 650 can be provided to the server 620 by the reputation engine itself based upon stored statistics for incorrectly classified entities. In other examples, information used to weight the local reputation information can be communicated to the server 620.
In some examples, a bias 670 can be applied to the resulting global reputation vector. The bias 670 can normalize the reputation vector to provide a normalized global reputation vector to a reputation engine 600. Alternatively, the bias 670 can be applied to account for local preferences associated with the reputation engine 600 originating the reputation query. Thus, a reputation engine 600 can receive a global reputation vector matching the defined preferences of the querying reputation engine 600. The reputation engine 600 can take an action on the communication based upon the global reputation vector received from the server 620.
FIG. 7 is a block diagram illustrating an example resolution between a global reputation and a local reputation. The local security agent 700 communicates with a server 720 to retrieve global reputation information from the server 720. The local security agent 700 can receive a communication at 702. The local security agent can correlate the communication to identify attributes of the message at 704. The attributes of the message can include, for example, an originating entity, a fingerprint of the message content, a message size, etc. The local security agent 700 includes this information in a query to the server 720. In other examples, the local security agent
700 can forward the entire message to the server 720, and the server can perform the correlation and analysis of the message.
The server 720 uses the information received from the query to determine a global reputation based upon a configuration 725 of the server 720. The configuration 725 can include a plurality of reputation information, including both information indicating that a queried entity is non-reputable 730 and information indicating that a queried entity is reputable 735. The configuration 725 can also apply a weighting 740 to each of the aggregated reputations 730, 735. A reputation score determinator 745 can provide the engine for weighting 740 the aggregated reputation information 730, 735 and producing a global reputation vector.
The local security agent 700 then sends a query to a local reputation engine at 706. The local reputation engine 708 performs a determination of the local reputation and returns a local reputation vector at 710. The local security agent 700 also receives a response to the reputation query sent to the server 720 in the form of a global reputation vector. The local security agent 700 then mixes the local and global reputation vectors together at 712. An action is then taken with respect to the received message at 714.
FIG. 8 is an example graphical user interface 800 for adjusting the settings of a filter associated with a reputation server. The graphical user interface 800 can allow the user of a local security agent to adjust the settings of a local filter in several different categories 810, such as, for example, "Virus," "Worms," "Trojan Horse," "Phishing," "Spyware," "Spam," "Content," and "Bulk." However, it should be understood that the categories 810 depicted are merely examples, and that the disclosure is not limited to the categories 810 chosen as examples here.
In some examples, the categories 810 can be divided into two or more types of categories. For example, the categories 810 of FIG. 8 are divided into a "Security Settings" type 820 of category 810, and a "Policy Settings" type 830 of category. In each of the categories 810 and types 820, 830, a mixer bar representation 840 can allow the user to adjust the particular filter setting associated with the respective category 810 of communications or entity reputations.
Moreover, while categories 810 of "Policy Settings" type 830 can be adjusted freely based upon the user's own judgment, categories of "Security Settings" type 820 can be limited to adjustment within a range. This distinction can be made in order to prevent a user from altering the security settings of the security agent beyond an acceptable range. For example, a disgruntled employee could attempt to lower the security settings, thereby leaving an enterprise network vulnerable to attack. Thus, the ranges 850 placed on categories 810 in the "Security Settings" type 820 are operable to keep security at a minimum level to prevent the network from being compromised. However, as should be noted, the "Policy Settings" type 830 categories 810 are those types of categories 810 that would not compromise the security of a network, but might only inconvenience the user or the enterprise if the settings were lowered.
Furthermore, it should be recognized that in various examples, range limits 850 can be placed upon all of the categories 810. Thus, the local security agent would prevent users from setting the mixer bar representation 840 outside of the provided range 850. It should also be noted, that in some examples, the ranges may not be shown on the graphical user interface 800. Instead, the range 850 would be abstracted out of the graphical user interface 800 and all of the settings would be relative settings. Thus, the category 810 could display and appear to allow a full range of settings, while transforming the setting into a setting within the provided range. For example, the "Virus" category 810 range 850 is provided in this example as being between level markers 8 and 13. If the graphical user interface 800 were set to abstract the allowable range 850 out of the graphical user interface 800, the "Virus" category 810 would allow setting of the mixer bar representation 840 anywhere between 0 and 14. However, the graphical user interface 800 could transform the 0- 14 setting to a setting within the 8 to 13 range 850. Thus, if a user requested a setting of midway between 0 and 14, the graphical user interface could transform that setting into a setting of midway between 8 and 13.
5 FIG. 9 is a block diagram illustrating reputation based connection throttling for voice over internet protocol (VoIP) or short message service (SMS) communications. As should be understood, an originating IP phone 900 can place a VoIP call to a receiving IP phone 910. These IP phones 900, 910 can be, for example, computers executing soft-phone software, network enabled phones, etc. The originating IP o phone 900 can place a VoIP call through a network 920 (e.g., the internet). The receiving IP phone 910 can receive the VoIP call through a local network 930 (e.g., an enterprise network).
Upon establishing a VoIP call, the originating IP phone has established a connection to the local network 930. This connection can be exploited similarly to the 5 way e-mail, web, instant messaging, or other internet applications can be exploited for providing unregulated connect to a network. Thus, a connection to a receiving IP phone can be exploited, thereby putting computers 940, 950 operating on the local network 930 at risk for intrusion, viruses, trojan horses, worms, and various other types of attacks based upon the established connection. Moreover, because of the0 time sensitive nature of VoIP communications, these communications are typically not examined to ensure that the connection is not being misused. For example, voice conversations occur in real-time. If a few packets of a voice conversation are delayed, the conversation becomes stilted and difficult to understand. Thus, the contents of the packets typically cannot be examined once a connection is established. 5 However, a local security agent 960 can use reputation information received from a reputation engine or server 970 to determine a reputation associated with the originating IP phone. The local security agent 960 can use the reputation of the originating entity to determine whether to allow a connection to the originating entity. Thus, the security agent 960 can prevent connections to non-reputable entities, as0 indicated by reputations that do not comply with the policy of the local security agent
960. In some examples, the local security agent 960 can include a connection throttling engine operable to control the flow rate of packets being transmitted using the connection established between the originating IP phone 900 and the receiving IP phone 910. Thus, an originating entities 900 with a non-reputable reputation can be allowed to make a connection to the receiving IP phone 910. However, the packet throughput will be capped, thereby preventing the originating entity 900 from exploiting the connection to attack the local network 930. Alternatively, the throttling of the connection can be accomplished by performing a detailed inspection of any packets originating from non-reputable entities. As discussed above, the detailed inspection of all VoIP packets is not efficient. Thus, quality of service (QoS) can be maximized for connections associated with reputable entities, while reducing the QoS associated with connections to non-reputable entities. Standard communication interrogation techniques can be performed on connections associated with non- reputable entities in order to discover whether any of the transmitted packets received from the originating entity comprise a threat to the network 930. Various interrogation techniques and systems are described in U.S. Patent No. 6,941,467, No. 7,089,590, No. 7,096,498, and No. 7,124,438 and in U.S. Patent Application Nos. 2006/0015942, 2006/0015563, 2003/0172302, 2003/0172294, 2003/0172291, and 2003/0172166, which are hereby incorporated by reference. FIG. 10 is a block diagram illustrating an operation of a reputation based load balancer 1000. The load balancer 1000 is operable to receive communications from reputable and non-reputable entities 1010, 1020 (respectively) through a network 1030 (e.g., the internet). The load balancer 1000 communicates with a reputation engine 1040 to determine the reputation of entities 1010, 1020 associated with incoming or outgoing communications.
The reputation engine 1030 is operable to provide the load balancer with a reputation vector. The reputation vector can indicate the reputation of the entity 1010, 1020 associated with the communication in a variety of different categories. For example, the reputation vector might indicate a good reputation for an entity 1010, 1020 with respect to the entity 1010, 1020 originating spam, while also indicating a poor reputation for the same entity 1010, 1020 with respect to that entity 1010, 1020 originating viruses. The load balancer 1000 can use the reputation vector to determine what action to perform with respect to a communication associated with that entity 1010, 1020. In situations where a reputable entity 1010 is associated with the communication, the message is sent to a message transfer agent (MTA) 1050 and delivered to a recipient 1060.
In situations where a non-reputable entity 1020 has a reputation for viruses, but does not have a reputation for other types of non-reputable activity, the communication is forwarded to one of a plurality of virus detectors 1070. The load balancer 1000 is operable to determine which of the plurality of virus detectors 1070 to use based upon the current capacity of the virus detectors and the reputation of the originating entity. For example, the load balancer 1000 could send the communication to the least utilized virus detector. In other examples, the load balancer 1000 might determine a degree of non-reputability associated with the originating entity and send slightly non-reputable communications to the least utilized virus detectors, while sending highly non-reputable communications to a highly utilized virus detector, thereby throttling the QoS of a connection associated with a highly non-reputable entity.
Similarly, in situations where a non-reputable entity 1020 has a reputation for originating spam communications, but no other types of non-reputable activities, the load balancer can send the communication to specialized spam detectors 1080 to the exclusion of other types of testing. It should be understood that in situations where a communication is associated with a non-reputable entity 1020 that originates multiple types of non-reputable activity, the communication can be sent to be tested for each of the types of non-reputable activity that the entity 1020 is known to display, while avoiding tests associated with non-reputable activity that the entity 1020 is not known to display.
In some examples, every communication can receive routine testing for multiple types of non-legitimate content. However, when an entity 1020 associated with the communication shows a reputation for certain types of activity, the communication can also be quarantined for detailed testing for the content that the entity shows a reputation for originating. In yet further examples, every communication may receive the same type of testing. However, communications associated with reputable entities 1010 is sent to the testing modules with the shortest queue or to testing modules with spare processing capacity. On the other hand, communications associated with non- reputable entities 1020 is sent to testing modules 1070, 1080with the longest queue.
Therefore, communications associated with reputable entities 1010 can receive priority in delivery over communications associated with non-reputable entities. Quality of service is therefore maximized for reputable entitieslOlO, while being reduced for non-reputable entities 1020. Thus, reputation based load balancing can protect the network from exposure to attack by reducing the ability of a non-reputable entity to connect to the network 930.
FIG. 11 A is a flowchart illustrating an example operational scenario for collection of geolocation based data for authentication analysis. At step 1100 the operational scenario collects data from various login attempts. Step 1100 can be performed for example by a local security agent, such as the security agent 100 of
FIG. 1. The collected data can include IP address associated with the login attempt, time of the login attempt, number of login attempts before successful, or the details of any unsuccessful passwords attempted, among many other types of information. The collected data is then analyzed in step 1105 to derive statistical information such as, for example, a geographical location of the login attempts. Step 1105 can be performed, for example, by a reputation engine. The statistical information associated with the login attempts is then stored at step 1110. The storing can be performed, for example, by a system data store.
FIG. HB is a flowchart illustrating an example operational scenario for geolocation based authentication. A login attempt is received at step 1115. The login attempt can be received for example, by a secure web server operable to provide secure financial data over a network. It is then determined whether the login attempt matches a stored username and password combination at step 1120. Step 1120 can be performed, for example, by a secure server operable to authenticate login attempts. If the username and password do not match a stored username/password combination, the login attempt is declared a failure at step 1125. However, if the username and password do match a legitimate username/password combination, the origin of the login attempt is ascertained at step 1130. The origin of the login attempt can be determined by a local security agent 100 as described in FIG. 1. Alternatively, the origin of the login attempt can be determined by a reputation engine. The origin of the login attempt can then be compared with the statistical information derived in FIG. HA, as shown in step 1135. Step 1135 can be performed, for example, by a local security agent 100 or by a reputation engine. It is determined whether the origin matches statistical expectations at step 1140. If the actual origin matches statistical expectations, the user is authenticated at step 1145.
Alternatively, if the actual origin does not match statistical expectations for the origin, further processing is performed in step 1150. It should be understood that further processing can include requesting further information from the user to verify his or her authenticity. Such information can include, for example, home address, mother's maiden name, place of birth, or any other piece of information known about the user (e.g., secret question). Other examples of additional processing can include searching previous login attempts to determine whether the location of the current login attempt is truly anomalous or merely coincidental. Furthermore, a reputation associated with the entity originating the login attempt can be derived and used to determine whether to allow the login.
FIG. HC is a flowchart illustrating another example operational scenario for geolocation based authentication using reputation of an originating entity to confirm authentication. A login attempt is received at step 1155. The login attempt can be received for example, by a secure web server operable to provide secure financial data over a network. It is then determined whether the login attempt matches a stored username and password combination at step 1160. Step 1160 can be performed, for example, by a secure server operable to authenticate login attempts. If the username and password do not match a stored username/password combination, the login attempt is declared a failure at step 1165. However, if the username and password do match a legitimate username/password combination, the origin of the login attempt is ascertained at step 1170. The origin of the login attempt can be determined by a local security agent 100 as described in FIG. 1. Alternatively, the origin of the login attempt can be determined by a reputation engine. A reputation associated with the entity originating the login attempt can then be retrieved, as shown in step 1175. Step 1175 can be performed, for example, by a reputation engine. It is determined whether the reputation of the originating entity is reputable at step 1180. If the originating entity is reputable, the user is authenticated at step 1185.
Alternatively, if the originating entity is non-reputable, further processing is performed in step 1190. It should be understood that further processing can include requesting further information from the user to verify his or her authenticity. Such information can include, for example, home address, mother's maiden name, place of birth, or any other piece of information known about the user (e.g., secret question). Other examples of additional processing can include searching previous login attempts to determine whether the location of the current login attempt is truly anomalous or merely coincidental. Thus, it should be understood that reputation systems can be applied to identifying fraud in financial transactions. The reputation system can raise the risk score of a transaction depending on the reputation of the transaction originator or the data in the actual transaction (source, destination, amount, etc). In such situations, the financial institution can better determine the probability that a particular transaction is fraudulent based upon the reputation of the originating entity.
FIG. 12 is a flowchart illustrating an example operational scenario for a reputation based dynamic quarantine. Communications are received at step 1200. The communications are then analyzed to determine whether they are associated with an unknown entity at step 1205. It should be noted, however, that this operational scenario could be applied to any communications received, not merely communications received from previously unknown entities. For example, communications received from a non-reputable entity could be dynamically quarantined until it is determined that the received communications do no pose a threat to the network. Where the communications are not associated with a new entity, the communications undergo normal processing for incoming communications as shown in step 1210. If the communications are associated with a new entity, a dynamic quarantine counter is initialized in step 1215. Communications received from the new entity are then sent to a dynamic quarantined at step 1220. The counter is then checked to determine whether the counter has elapsed in step 1225. If the counter has not elapsed, the counter is decremented in step 1230. The behavior of the entity as well as the quarantined communications can be analyzed in step 1235. A determination is made whether the quarantined communications or behavior of the entity is anomalous in step 1240. If there is no anomaly found, the operational scenario returns to step 1220, where new communications are quarantined. However, if the communications or behavior of the entity are found to be anomalous in step 1240, a non-reputable reputation is assigned to the entity in step 1245. The process ends by sending notification to an administrator or recipients of communications sent by the originating entity.
Returning to step 1220, the process of quarantining and examining communications and entity behavior continues until anomalous behavior is discovered, or until the dynamic quarantine counter elapses in step 1225. If the dynamic quarantine counter elapses, a reputation is assigned to the entity at step 1255. Alternatively, in situations where the entity is not an unknown entity, the reputation would be updated in steps 1245 or 1255. The operational scenario ends at step 1260 by releasing the dynamic quarantine where the dynamic quarantine counter has elapsed without discovery of an anomaly in the communications or in the originating entity behavior.
FIG. 13 is an example graphical user interface 1300 display of an image spam communication which can be classified as an unwanted image or message. As should be understood, image spam poses a problem for traditional spam filters. Image spam bypasses the traditional textual analysis of spam by converting the text message of the spam into an image format. FIG. 13 shows an example of image spam. The message shows an image 1310. While the image 1300 appears to be textual, it is merely the graphic encoding of a textual message. Image spam also typically includes a textual message 1320 comprising sentences which are structured correctly, but make no sense in the context of the message. The message 1320 is designed to elude spam filters that key on communications that only include an image 1310 within the communication. Moreover, the message 1320 is designed to trick filters that apply superficial testing to the text of a communication that includes an image 1310. Further, while these messages do include information about the origination of the message in the header 1330, an entity's reputation for originating image spam might not be known until the entity is caught sending image spam.
FIG. 14 is a flowchart illustrating an example operational scenario for detecting unwanted images (e.g., image spam). It should be understood that many of the steps shown in FIG. 14 can be performed alone or in combination with any or all of the other steps shown in FIG. 14 to provide some detection of image spam. However, the use of each of the steps in FIG. 14 provides a comprehensive process for detecting image spam.
The process begins at step 1400 with analysis of the communication. Step 1400 typically includes analyzing the communication to determine whether the communication includes an image that is subject to image spam processing. At step 1410, the operational scenario performs a structural analysis of the communication to determine whether the image comprises spam. The header of the image is then analyzed in step 1420. Analysis of the image header allows the system to determine whether anomalies exist with respect to the image format itself (e.g., protocol errors, corruption, etc.). The features of the image are analyzed in step 1430. The feature analysis is intended to determine whether any of the features of the image are anomalous.
The image can be normalized in step 1440. Normalization of an image typically includes removal of random noise that might be added by a spammer to avoid image fingerprinting techniques. Image normalization is intended to convert the image into a format that can be easily compared among images. A fingerprint analysis can be performed on the normalized image to determine whether the image matches images from previously received known image spam.
FIG. 15A is a flowchart illustrating an operational scenario for analyzing the structure of a communication. The operational scenario begins at step 1500 with analysis of the message structure. At step 1505 the hypertext markup language
(HTML) structure of the communication is analyzed to introduce n-gram tags as additional tokens to a Bayesian analysis. Such processing can analyze the text 1320 that is included in an image spam communication for anomalies. The HTML structure of the message can be analyzed to define meta-tokens. Meta-tokens are the HTML content of the message, processed to discard any irrelevant HTML tags and compressed by removing white space to create a "token" for Bayesian analysis. Each of the above described tokens can be used as input to a Bayesian analysis for comparison to previously received communications.
The operational scenario then includes image detection at step 1515. The image detection can include partitioning the image into a plurality of pieces and performing fingerprinting on the pieces to determine whether the fingerprints match pieces of previously received images.
FIG. 15B is a flowchart illustrating an operational scenario for analyzing the features of an image to extract features of the message for input into a clustering engine to identify components of the image which align with known image spam. The operational scenario begins at step 1520 where a number of high level features of the image are detected for use in a machine learning algorithm. Such features can include values such as the number of unique colors, number of noise black pixels, number of edges in horizontal direction (sharp transitions between shapes), etc. One of the features extracted by the operational scenario can include the number of histogram modes of the image, as show at step 1525. The number of modes is yielded by an examination of spectral intensity of the image. As should be understood, artificial images will typically include fewer modes than natural images, because natural image colors are typically spread through a broad spectrum.
As described above, the features extracted from the image can be used to identify anomalies. In some examples, anomalies can include analyzing the characteristics of a message to determine a level of similarity of a number of features to the features of stored unwanted images. Alternatively, in some examples, the image features can also be analyzed for comparison with known reputable images to determine similarity to reputable images. It should be understood that none of the extracted features alone are determinative of a classification. For example, a specific feature might be associated with 60% of unwanted messages, while also being associated with 40% of wanted messages. Moreover, as the value associated with the feature changed, there might be a change in the probability that the message is wanted or unwanted. There are many features that can indicate a slight tendency. If each of these features are combined the image spam detection system can make classification decision.
The aspect ratio is then examined in step 1530 to determine whether there are any anomalies with respect to the image size or aspect. Such anomalies in the aspect ratio could be indicated by similarity of the image size or aspect ratio to known sizes or aspect ratios which are common to known image spam. For example, image spam can come in specific sizes to make the image spam look more like common e-mail. Messages that include images which share a common size with known spam images are more likely to be spam themselves. Alternatively, there are image sizes which are not conducive to spam (e.g., a 1" x 1" square image might be difficult to read if a spammer inserted a message into the image). Messages that include images which are known to be non-conducive to spam insertion are less likely to be image spam. Thus, the aspect ratio of a message can be compared to common aspect ratios used in image spam to determine a probability that the image is an unwanted image or that the image is a reputable image.
At step 1535, the frequency distribution of the image is examined. Typically, natural pictures have uniform frequency distribution with a relative scarcity of sharp frequency gradations. On the other hand, image spam typically includes a choppy frequency distribution as a result of black letters being placed on a dark background.
Thus, such non-uniform frequency distribution can indicate image spam.
At step 1540, the signal to noise ratio can be analyzed. A high signal to noise ratio might indicate that a spammer may be trying to evade fingerprinting techniques by introducing noise into the image. Increasing noise levels can thereby indicate an increasing probability that the image is an unwanted image.
It should be understood that some features can be extracted on the scale of the entire image, while other features can be extracted from subparts of the image. For example, the image can be subdivided into a plurality of subparts. Each of the rectangles can be transformed into a frequency domain using a fast Fourier transform (FFT). In the transformed image, the predominance of frequencies in a plurality of directions can be extracted as features. These subparts of the transformed image can also be examined to determine the amount of high frequencies and low frequencies. In the transformed image, the points that are further away from the origin represent higher frequencies. Similarly to the other extracted features, these features can then be compared to known legitimate and unwanted images to determine which characteristics the unknown image shares with each type of known image. Moreover, the transformed (e.g., frequency domain) image can also be divided into subparts
(e.g., slices, rectangles, concentric circles, etc.) and compared against data from known images (e.g., both known unwanted images and known legitimate images).
FIG. 15C is a flowchart illustrating an operational scenario for normalizing the an image for spam processing. At step 1545, obfuscation and noise is removed from the image. As discussed previously, these can be introduced by spammers to evade fingerprinting techniques such as hashing by varying the sum of the hash such that it does not match any previously received hash fingerprints of known image spam. Obfuscation and noise removal can describe several techniques for removing artificial noise introduced by spammers. It should be understood that artificial noise can include techniques used by spammers such as banding (where a font included in the image is varied to vary the hash of the image).
An edge detection algorithm can be run on the normalized image at step 1550. In some examples, the edge detected image can be used provided to an optical character recognition engine to convert the edge detected image to text. The edge detection can be used to remove unnecessary detail from the picture which can cause inefficiency in processing the image again other images.
At step 1555, median filtering can be applied. The median filtering is applied to remove random pixel noise. Such random pixels can cause problems to content analysis of the image. The median filtering can help to remove single pixel type of noise introduced by spammers. It should be understood that single pixel noise is introduced by spammers using an image editor to alter one or more pixels in the image, which can make the image appear grainy in some areas, thereby making the image more difficult to detect.
At step 1560, the image is quantized. Quantizing of the image remove unnecessary color information. The color information typically requires more processing and is unrelated to the attempted propagation of the spam. Moreover, spammers could vary the color scheme in an image slightly and again vary the hash such that known image spam hashes would not match the derived hash from the color variant image spam.
At step 1565, contrast stretching is performed. Using contrast stretching the color scale in the image is maximized from black to white, even if the colors only vary through shades of gray. The lightest shade of the image is assigned a white value, while the darkest shade in the image is assigned a black value. All other shades are assigned their relative position in the spectrum in comparison to the lightest and darkest shades in the original image. Contrast stretching helps to define details in an image that may not make full use of the available spectrum and therefore can help to prevent spammers from using different pieces of the spectrum to avoid fingerprinting techniques. Spammers sometimes intentionally shift the intensity range of an image to defeat some types of feature identification engines. Contrast stretching can also help normalize an image such that it can be compared to other images to identify common features contained in the images. FIG. 15D is a flowchart illustrating an operational scenario for analyzing the fingerprint of an image to find common fragments among multiple images. The operational scenario begins a step 1570 by defining regions within an image. A winnowing algorithm is then performed on the defined regions to identify the relevant portions of the image upon which fingerprints should be taken at step 1575. At step 1580, the operational scenario fingerprints the resulting fragments from the winnowing operation and determines whether there is a match between the fingerprints of the received image an known spam images. A similar winnowing fingerprint approach is described in United States Patent Application Publication No. 2006/0251068, which is hereby incorporated by reference. As used in the description herein and throughout the claims that follow, the meaning of "a," "an," and "the" includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of "in" includes "in" and "on" unless the context clearly dictates otherwise. Finally, as used in the description herein and throughout the claims that follow, the meanings of "and" and "or" include both the conjunctive and disjunctive and may be used interchangeably unless the context clearly dictates otherwise. Ranges may be expressed herein as from "about" one particular value, and/or to "about" another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent "about," it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.
A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.

Claims

CLAIMS What is claimed is:
1. A computer implemented method operable to assign a reputation to a web-based entity associated with a hypertext transfer protocol communication, comprising: receiving a hypertext transfer protocol communication at an edge protection device; identifying an entity associated with the received hypertext transfer protocol communication; querying reputation engine for a reputation indicator associated with the entity; receiving the reputation indicator from the reputation engine; taking an action with respect to the hypertext transfer protocol communication based upon the received reputation indicator associated with the entity.
2. The method of claim 1, wherein the entity is a web entity comprising a destination universal resource locator, domain or IP address.
3. The method of claim 1 , wherein the reputation of the entity is based upon previous communications received from the entity and public or private network information available about the entity comprising ownership or hosting information.
4. The method of claim 3, wherein the previous communications comprise one or more of an electronic message, hypertext transfer protocol communication, an instant message, a file transfer protocol communication, simple object access protocol messages, real-time transport protocol packages, a short message service communication, multimedia message service communication, or a voice over internet protocol communication.
5. The method of claim 1, wherein the action is to discard the communication and notify an enterprise network user associated with the hypertext transfer protocol communication.
6. The method of claim 1 , wherein the entity is associated with a plurality of different types of network communications comprising at least hypertext transfer protocol type of communications and at least one of electronic mail communications, file transfer protocol communications, instant messaging communications, gopher communications, short messaging service communications or voice over internet protocol communications.
7. The method of claim 1 , wherein the reputation engine determines the reputation indicator based upon an aggregation of reputable criteria associated with the entity and non-reputable criteria associated with the entity.
8. The method of claim 7, wherein the reputation indicator is a vector which indicates reputation based upon a plurality of different criteria.
9. The method of claim 8, further comprising examining the reputation vector to determine whether a policy associated with an enterprise network protected by the edge protection device allows communication with the entity based upon its reputation vector.
10. The method of claim 1, wherein the reputation engine is a reputation server operable to provide a plurality of edge protection devices with reputation information.
11. The method of claim 10, wherein the reputation engine is operable to store a global reputation indicator and to bias the global reputation indicator with a local bias prior to outputting the reputation indicator.
12. The method of claim 1, wherein the reputation indicator comprises a reputation vector, the reputation vector comprising a multi-dimensional classification of the entity.
13. The method of claim 12, wherein the multi-dimensional classification comprises classification of the message in two or more of a porn category, a news category, a computer category, a secure category, a phishing category, a spyware category, a virus category, or an attack category.
14. The method of claim 12, wherein the reputation indicator further comprises a confidence associated with each of the multi-dimensional classifications of the entity.
15. The method of claim 1 , further comprising detecting randomization of a universal resource locator.
16. The method of claim 15, wherein the randomization of a universal resource locator is determined by generating a hash of the universal resource locator and comparing the hash to previously identified non-reputable universal resource locators.
17. The method of claim 15, wherein the randomization of a universal resource locator is determined by fingerprinting a plurality of portions of the universal resource locator and comparing the hash to previously identified non-reputable universal resource locators.
18. A web reputation system on an edge protection device operable to receive a web communication and to assign a reputation to an entity associated with the communication, the system comprising: a communications interface operable to receive a web communication; a communication analyzer operable to analyze the web communication to determine an entity associated with the web communication; a reputation engine operable to provide a reputation associated with the entity based upon previously collected data associated with the entity; a decision engine operable to receive the reputation indicator from the reputation engine and to determine whether the web communication is to be communicated to a recipient.
19. The system of claim 18, wherein the reputation of the entity is based upon previous communications received from the entity, the previous communications comprising one or more of an electronic message, hypertext transfer protocol communication, an instant message, a file transfer protocol communication, simple object access protocol messages, real-time transport protocol packages, a short message service communication, or a voice over internet protocol communication.
20. The system of claim 18, wherein the decision engine is operable to notify an enterprise network user associated with the hypertext transfer protocol communication in the event that the communication is not transmitted to the recipient.
21. The system of claim 18, wherein the reputation engine determines the reputation indicator based upon an aggregation of reputable criteria associated with the entity and non-reputable criteria associated with the entity.
22. The system of claim 21 , wherein the reputation indicator is a vector which indicates reputation based upon a plurality of different criteria.
23. The system of claim 22, further comprising examining the reputation vector to determine whether a policy associated with an enterprise network protected by the edge protection device allows communication with the entity based upon its reputation vector.
24. The system of claim 18, wherein the reputation engine is a reputation server operable to provide a plurality of edge protection devices with reputation information and is operable to store a global reputation indicator and to bias the global reputation indicator with a local bias prior to outputting the reputation indicator.
25. The system of claim 18, further comprising an interrogation engine operable to perform a plurality of tests on the communication and to determine a profile associated with the web communication.
26. The system of claim 25, wherein the decision engine is operable to determine whether to forward the web communication based upon the profile associated with the web communication.
27. The system of claim 26, wherein the reputation engine is operable to use the profile to update reputation information associated with the entity.
28. The system of claim of claim 18, wherein the reputation comprises a reputation vector, the reputation vector comprising a multi-dimensional classification of the entity.
29. The system of claim 28, wherein the multi-dimensional classification comprises classification of the message in two or more of a porn category, a news category, a computer category, a secure category, a phishing category, a spyware category, a virus category, or an attack category.
30. The system of claim 28, wherein the reputation further comprises a confidence associated with each of the multi-dimensional classifications of the entity.
31. The system of claim 18, further comprising detecting randomization of a universal resource locator.
32. The method of claim 31 , wherein the randomization of a universal resource locator is determined by generating a hash of the universal resource locator and comparing the hash to previously identified non-reputable universal resource locators.
33. The method of claim 31 , wherein the randomization of a universal resource locator is determined by fingerprinting a plurality of portions of the universal resource locator and comparing the hash to previously identified non-reputable universal resource locators.
34. One or more computer readable media having software program code operable to assign a reputation to a messaging entity associated with a received communication, comprising: receiving a hypertext transfer protocol communication at an edge protection device; identifying an entity associated with the received hypertext transfer protocol communication; querying reputation engine for a reputation indicator associated with the entity; receiving the reputation indicator from the reputation engine; taking an action with respect to the hypertext transfer protocol communication based upon the received reputation indicator associated with the entity.
35. A reputation system, the system comprising: a centralized reputation engine operable to receive feedback from a plurality of local reputation engines, the plurality of local reputation engines being operable to determine local reputations based upon one or more entities and respectively associated the local reputation engines; an aggregation engine operable to derive a global reputation for a queried entity based upon an aggregation of the plurality of local reputations; and wherein the centralized reputation engine is operable to provide the global reputation of the queried entity to one or more of the local reputation engines responsive to receiving a reputation query from said one or more of the local reputation engines.
36. The system of claim 35, wherein the aggregation engine is operable to store confidence values associated with a respective local reputation engine, the aggregation engine being further operable to aggregate the plurality of local reputations using the confidence values associated with each of the plurality of local reputations through its respective local reputation engine.
37. The system of claim 36, wherein the local reputation system is a subsystem to the centralized reputation system, and performs the reputation scoring on a local scale based upon the communications received by the local reputation engine, and the centralized reputation engine performs reputation scoring based upon communications received by the centralized reputation engine and reputation information received from the local reputation engines.
38. The system of claim 36, wherein the local reputations are weighted based on their respective confidence values prior to aggregation of the local reputations.
39. The system of claim 38, wherein the confidence values are tuned based upon feedback received from the plurality of local reputation engines.
40. The system of claim 35, wherein the local reputations and global reputations are vectors which identify the characteristics of the respective entities to which they are associated.
41. The system of claim 40, wherein the characteristics comprise one or more of a spamming characteristic, a phishing characteristic, a bulk mailing characteristic, a virus source characteristic, a legitimate communication characteristic, an intrusion characteristic, an attack characteristic, a spyware characteristic, or a geolocation characteristic.
42. The system of claim 35, wherein the local reputations are based upon an aggregation of reputable and non-reputable criteria.
43. The system of claim 35, wherein the centralized reputation system is operable to apply a local reputation bias to the global reputation based on the local reputation engine that originated the reputation query.
44. The system of claim 43, wherein the local reputation bias is based upon input received from the local reputation engine that originated the reputation query.
45. The system of claim 43, wherein the local reputation bias is based upon feedback received from the local reputation engine that originated the reputation query.
46. The system of claim 43, wherein the local reputation bias is operable to amplify certain criteria for reputation while moderating other criteria for reputation based upon the local reputation bias.
47. The system of claim 35, wherein a local reputation engine is operable to apply a local reputation bias to the global reputation prior to applying the global reputation to a communication received from the queried entity.
48. The system of claim 35, wherein the local reputation engine originates the reputation query responsive to receiving a communication associated with an external entity with respect to a protected enterprise network associated with the local reputation engine.
49. The system of claim 48, wherein the local reputation engine originates the reputation query responsive to a local reputation associated with the external entity being indeterminate.
50. The system of claim 35, wherein the centralized reputation engine is further operable to aggregate reputation for a plurality of identities associated with one or more of the plurality of entities.
51. The system of claim 50, wherein the centralized reputation engine is further operable to correlate attributes associated with different identities to identify relationships between the identities, and to assign a portion of the reputation associated with one entity to the reputation of another entity where a relationship have been identified between entities.
52. A method of producing a global reputation, comprising the steps of: receiving a reputation query from a requesting local reputation engine; retrieving a plurality of local reputations the local reputations being respectively associated with a plurality of local reputation engines; aggregating the plurality of local reputations; deriving a global reputation from the aggregation of the local reputations; and responding to the reputation query with the global reputation.
53. The method of claim 52, further comprising retrieving confidence values associated with the local reputation engines, the deriving step using the confidence values to derive the global reputation.
54. The method of claim 53, wherein the deriving step further comprises weighting the local reputations using their respective confidence values and combining the weighted reputations to generate the global reputation.
55. The method of claim 54, further comprising tuning the confidence values based upon feedback from the plurality of local reputation engines.
56. The method of claim 52, wherein the local reputations and global reputations are vectors which identify the characteristics of the respective entities to which they are associated.
57. The method of claim 56, wherein the characteristics comprise one or more of a spamming characteristic, a phishing characteristic, a bulk mailing characteristic, a malware source characteristic, or a legitimate mail characteristic.
58. The method of claim 52, wherein the local reputations are based upon an aggregation of reputable and non-reputable criteria.
59. The method of claim 52, further comprising applying a local reputation bias to the aggregation of local reputations to produce the global reputation vector, the local reputation bias being based upon the requesting local reputation engine.
60. The method of claim 59, wherein the local reputation bias is based upon input received from the requesting local reputation engine.
61. The method of claim 59, wherein the local reputation bias is based upon feedback received from the requesting local reputation engine.
62. The method of claim 59, further comprising amplifying certain criteria for reputation based upon the local reputation bias and moderating other criteria for reputation based upon the local reputation bias.
63. The method of claim 52, wherein the requesting local reputation engine originates the reputation query responsive to receiving a communication associated with an external entity with respect to a protected enterprise network associated with the requesting local reputation engine.
64. The method of claim 63, wherein the requesting local reputation engine originates the reputation query responsive to a local reputation associated with the external entity being indeterminate.
65. The method of claim 52, wherein deriving the global reputation is further based upon public and private information not available to any of the plurality of local reputation engines.
66. One or more computer readable media having software program code operable perform steps to aggregate a plurality of local reputation vectors to produce a global reputation vector, the steps comprising: receiving a reputation query from a requesting local reputation engine; retrieving a plurality of local reputations the local reputations being respectively associated with a plurality of local reputation engines; aggregating the plurality of local reputations; deriving a global reputation from the aggregation of the local reputations; and responding to the reputation query with the global reputation.
67. A reputation system, the system comprising: a communications interface operable to receive global reputation information from a central server, the central server being operable to determine global reputations based upon feedback received from one or more local reputation engines, the global reputation being respectively associated with one or more entities; a reputation engine operable to bias the global reputation received from the central server based upon defined local preferences; and wherein the centralized reputation engine is operable to provide the global reputation of the queried entity to the communications interface responsive to receiving a reputation query from said communications interface.
68. A reputation system, the system comprising: a communications interface operable to receive distributed reputation information from one or more distributed reputation engines, the distributed reputation engines being operable to examine communications and derive a reputation associated with one or more entities originating the communications; a reputation module operable to aggregate the distributed reputation information, and to derive a global reputation based upon the aggregation of the distributed reputation information, the reputation module being further operable to derive a local reputation information based upon communications received by the reputation module; and a traffic control module operable to determine handling associated with communications based upon the global reputation and the local reputation.
69. A reputation based network security system, the system comprising: a communications interface operable to receive incoming and outgoing communications associated with a network; a communication analyzer operable to derive an external entity associated with a communication; a reputation engine operable to derive a reputation vector associated with the external entity, the reputation vector comprising an aggregation of reputable and non-reputable criteria in a plurality of categories comprising different types of communications; a security engine operable to receive the reputation vector and to send the communication to one or more of a plurality of interrogation engines, wherein the security engine is operable to determine to which of the plurality of interrogation engines to send the communication based upon the reputation vector.
70. The system of claim 69, wherein the security engine is operable to avoid sending the communication to an unwarranted interrogation engine where the reputation vector does not indicate that the external entity has a reputation for participating in activity identified by the unwarranted interrogation engine.
71. The system of claim 69, wherein each of the one or more interrogation engines include a plurality of instances of the interrogation engine.
72. The system of claim 71 , wherein upon selection of an interrogation engine, the security engine can select a chosen instance of the interrogation engine, wherein the chosen instance of the interrogation engine is selected based upon the capacity of the chosen instance of the interrogation engine.
73. The system of claim 69, wherein the security engine is operable to assign a high priority to the communication in an interrogation queue associated with the plurality of interrogation engines if the external entity is a reputable entity and to assign a low priority to the communication in the interrogation queue if the external entity is a disreputable entity.
74. The system of claim 73, wherein quality of service is maximized for reputable entities, while quality of service is minimized for non-reputable entities.
75. The system of claim 69, wherein each of the one or more interrogation engines comprise a plurality of instances of the interrogation engine, the instances of the interrogation engine being operable to reside on an edge protection device or an enterprise client device.
76. The system of claim 69, wherein the reputation engine is a reputation server operable to provide reputation information to a plurality of edge protection devices and client devices.
77. A reputation based network security system, the system comprising: a communications interface operable to receive incoming and outgoing communications associated with a network; a communication analyzer operable to derive an external entity associated with a communication; a reputation engine operable to derive a reputation associated with the external entity, the reputation comprising an aggregation of reputable and non- reputable criteria associated with the external entity; a security engine operable assign priority information to a communication, wherein the security engine is operable to receive the reputation and assign a high priority to communications where the external entity is a reputable entity and to assign a low priority to communications where the external entity is a non-reputable entity, whereby the priority information is used by one or more interrogation engines to improve quality of service for reputable entities.
78. A computer implemented method operable to efficiently process communications based on a reputation associated with an external entity, comprising: receiving a communication associated with an external entity based upon origination or destination information associated with the communication; identifying the external entity associated with the received communication; deriving a reputation associated with the external entity based upon reputable and non-reputable criteria associated with the external entity; assigning a priority to the communication based upon the derived reputation associated with the external entity; executing one or more tests on the communication based upon the priority assigned to the communication.
79. The method of claim 78, further comprising maximizing quality of service for messages that are assigned a high priority.
80. The method of claim 78, wherein the derived reputation is a reputation vector, the reputation vector communicating the reputation associated with the external entity in a plurality of categories.
81. The method of claim 80, further comprising bypassing any of said one or more tests if the reputation vector associated with the communication indicates that the external entity is a reputable entity with regard to the criteria tested by the bypassed test.
82. The method of claim 78, wherein each of the one or more tests include a plurality of engines operable to perform said one or more tests.
83. The method of claim 82, wherein the security engine is operable to distribute testing of communications including the received communication evenly across the plurality of engines based upon the capacity of the engines.
84. The method of claim 78, wherein said one or more tests are executed by a plurality of engines operable to perform the test, the engines being operable to reside on an edge protection device or an enterprise client device.
85. The method of claim 78, wherein the reputation is retrieved from a reputation server operable to provide reputation information to a plurality of edge protection devices and client devices.
86. The method of claim 78, wherein the reputation is retrieved from a local reputation engine.
87. A computer implemented method operable to efficiently process communications based on a reputation associated with an external entity, comprising: receiving a communication associated with an external entity based upon origination or destination information associated with the communication; identifying the external entity associated with the received hypertext transfer protocol communication; deriving a reputation associated with the external entity based upon reputable and non-reputable criteria associated with the external entity; assigning the communication to one or more interrogation engines selected from among a plurality of interrogation engines, the selection of the one or more interrogation engines being based upon the derived reputation associated with the external entity and capacity of the interrogation engines; and executing said one or more interrogation engines on the communication.
88. One or more computer readable media having software program code operable to efficiently process communications based upon reputation of external entities associated with the communications, comprising: receiving a communication associated with an external entity based upon origination or destination information associated with the communication; identifying the external entity associated with the received hypertext transfer protocol communication; deriving a reputation associated with the external entity based upon reputable and non-reputable criteria associated with the external entity; assigning a priority to the communication based upon the derived reputation associated with the external entity; executing one or more tests on the communication based upon the priority assigned to the communication.
89. A computer implemented method operable to process communications based upon a reputation associated with an external entity, comprising: receiving a communication associated with an external entity based upon origination or destination information associated with the communication; identifying the external entity associated with the received communication; deriving a reputation associated with the external entity based upon reputable and non-reputable criteria associated with the external entity; assigning a processing path to the communication based upon the derived reputation associated with the external entity.
90. A reputation based connection throttling system for voice over internet protocol communications, the system comprising: a communications interface operable to receive voice over internet protocol connection requests associated with an external entity prior to a connection being established between the external entity and a protected network associated with the communications interface; a reputation engine operable to derive a reputation associated with the external entity; and a connection control engine operable to deny a voice over internet protocol connection request to a protected network based upon the derived reputation of the external entity associated with the voice over protocol connection request.
91. The system of claim 90, wherein the reputation engine derives the reputation of the external entity based upon an aggregation of reputable criteria and non-reputable criteria associated with the external entity.
92. The system of claim 90, wherein the connection control engine prevents non-reputable entities from creating a connection with the protected network.
93. The system of claim 92, wherein the non-reputable entities are operable to attempt to send voice over internet protocol communications to the protected network in an attempt to create a pre-textual voice over internet protocol connection with the protected network and to exploit the pre-textual voice over internet protocol connection for non-legitimate activities..
94. The system of claim 90, wherein the communications interface is further operable to receive short message service connection requests, and the connection control engine is operable to deny the short message service connection request based upon a reputation associated with a short message service entity originating the short message service connection request.
95. The system of claim 90, further comprising a message interrogation engine operable to examine the contents of communications originating from the external entity to determine whether the external entity is exploiting a voice over internet protocol connection..
96. The system of claim 90, wherein the reputation engine is a reputation server operable to receive a reputation query from the connection control engine and to provide the connection control engine with the derived reputation.
97. The system of claim 96, wherein the reputation server derives the reputation of the external entity by aggregating a plurality of local reputations associated with the external entity, the plurality of local reputations being supplied by a plurality of local reputation engines.
98. The system of claim 90, wherein the connection control engine comprises a policy against which the reputation is compared to determine whether to allow the voice over internet protocol connection request.
99. The system of claim 98, wherein the policy defines one or more categories of external entities to which voice over internet protocol requests are allowed.
100. The system of claim 90, wherein the connection control engine is operable to reduce quality of service for any connections received from non-reputable external entities and to maximize quality of service for any connections received from reputable external entities.
101. The method of claim 90, further comprising receiving a plurality of simultaneous connection requests; correlating the simultaneous connection requests to determine that the requests comprise an attack; and, updating a reputation associated with one or more entities associated with the simultaneous connection requests so as to cause throttling of the plurality of connection requests.
102. The method of claim 90, further comprising deriving a reputation associated with the external entity, the reputation indicating a reputation of the external entity for participating in denial of service attacks, wherein a reputation for participating in a denial of service attack triggers the connection control engine to immediately throttle a connection based upon input from a handset or a policy.
103. The method of claim 90, wherein the connection is requested to a device on the protected network, the device comprising a mobile, location aware device.
104. A reputation based connection throttling system for short message service communications, the system comprising: a communications interface operable to receive short message service connection requests associated with an external entity prior to a connection being established between the external entity and a protected network associated with the communications interface; a reputation engine operable to derive a reputation associated with the external entity; and a connection control engine operable to deny a short message service connection request to a protected network based upon the derived reputation of the external entity associated with the short message service connection request.
105. The system of claim 104, wherein the reputation engine derives the reputation of the external entity based upon an aggregation of reputable criteria and non-reputable criteria associated with the external entity.
106. The system of claim 105, wherein the connection control engine prevents non-reputable entities from creating a connection with the protected network.
107. The system of claim 106, wherein the non-reputable entities are operable to attempt to send short message service communications to the protected network in an attempt to create a pre-textual short message service connection with the protected network and to exploit the pre-textual short message service connection for non-legitimate activities..
108. The system of claim 104, further comprising a message interrogation engine operable to examine the contents of communications originating from the external entity to determine whether the external entity is exploiting a short message service connection.
109. The system of claim 104, wherein the reputation engine is a reputation server operable to receive a reputation query from the connection control engine and to provide the connection control engine with the derived reputation.
110. The system of claim 109, wherein the reputation server derives the reputation of the external entity by aggregating a plurality of local reputations associated with the external entity, the plurality of local reputations being supplied by a plurality of local reputation engines.
111. The system of claim 104, wherein the connection control engine comprises a policy against which the reputation is compared to determine whether to allow the voice over internet protocol connection request.
112. The system of claim 111, wherein the policy defines one or more categories of external entities to which voice over internet protocol requests are allowed.
113. A method for reputation based connection throttling, comprising the steps of: receiving a voice over internet protocol connection request, the voice over internet protocol connection request being related to an external entity; querying a reputation engine for a reputation associated with the external entity; comparing the reputation to a policy associated with a protected enterprise network; permitting the connection request based upon determining that the reputation of the external entity related to the voice over internet protocol connection request complies with the policy; throttling the connection request based upon determining that the reputation of the external entity related to the voice over internet protocol connection request does not comply with the policy.
114. A method for reputation based connection throttling, comprising the steps of: receiving a connection request, the connection request requesting a connection between an external entity and a protected enterprise network; querying a reputation engine for a reputation associated with the external entity, the reputation comprising an aggregation of the reputable and non- reputable criteria associated with the external entity; comparing the reputation to a policy associated with the protected enterprise network; permitting the connection request based upon determining that the reputation of the external entity related to the voice over internet protocol connection request complies with the policy; throttling the connection request based upon determining that the reputation of the external entity related to the voice over internet protocol connection request does not comply with the policy.
115. A reputation based firewall, comprising: a firewall operable to receive a data packet directed to a protected network and to determine handling of the data packet based upon a security policy associated with the protected network, the security policy comprising at least one rule based upon a reputation of an external entity associated with the data packet; a reputation engine operable to determine the external entity associated with the data packet and to provide the reputation to the firewall based upon determination of the external entity; and wherein handling comprises permitting the data packet entry to the protected network, or denying the data packet entry to the protected network.
116. A system comprising: a security control interface operable to produce a plurality of security control representations, each of the plurality of security control representations being operable to control a plurality of security settings associated with a protected entity; and a policy control interface operable to produce a plurality of policy control representations, each of the plurality of policy control representations being operable to control a plurality of policy settings associated with a protected entity; a filtering module operable to filter one or more communication streams based upon the plurality of security settings and based upon the plurality of policy settings.
117. The system of claim 116, wherein the security control representations comprise a plurality of security slider representations in a plurality of security categories operable to control the security settings associated with the protected entity.
118. The system of claim 117, wherein the plurality of security categories comprise two or more of a virus category, a phishing category, a worms category, or a trojan horse category.
119. The system of claim 118, wherein the plurality of security control representations are each operable to adjust a threshold sensitivity for an associated one of the security settings.
120. The system of claim 119, wherein the threshold sensitivity comprises a level of similarity between communication stream characteristics and characteristics associated with the security categories.
121. The system of claim 117, wherein the policy control representations comprise a plurality of policy slider representations in a plurality of policy categories operable to control the policy settings associated with the protected entity.
122. The system of claim 121, wherein the plurality of policy categories comprise two or more of a spam category, a content category, a spyware category, or a bulk mail category.
123. The system of claim 122, wherein the plurality of policy control representations are each operable to adjust a threshold sensitivity for an associated one of the policy settings.
124. The system of claim 123, wherein the threshold sensitivity comprises a level of similarity between communication stream characteristics and characteristics associated with the policy categories.
125. The system of claim 116, wherein the protected entity is one of a computing device, a communications device, a mobile device, or a network.
126. The system of claim 116, wherein the security control interface and the policy control interface are controlled by an administrator.
127. The system of claim 41, wherein the security control interface and the policy control interface are controlled by an end user.
128. The system of claim 127, wherein the security control interface include a plurality of ranges associated with the plurality of security control representations, the security control settings operable to be adjusted within the range.
129. A computer implemented method comprising: receiving a plurality of ranges from an administrator; providing a security control interface to a user, the security control interface comprising a plurality of security control representations associated with security controls, each of the security control mechanisms including an associated range from among the plurality of ranges, the associated range defining a minimum and maximum setting associated with the respective security controls; receiving a plurality of security control settings from the user through the security control interface; adjusting a plurality of thresholds related to plurality of control settings received from the user, the plurality of thresholds being associated with tolerance for a classification of potential security violation; and filtering communications streams from a protected entity associated with the user based upon the plurality of thresholds.
130. The method of claim 129, wherein the security control representations comprise a plurality of security slider representations in a plurality of security categories operable to control the security settings associated with the protected entity.
131. The method of claim 130, wherein the plurality of security categories comprise two or more of a virus category, a phishing category, a worms category, a trojan horse category, a spam category, a content category, a spyware category, or a bulk mail category.
132. The method of claim 131, wherein the plurality of security control representations are each operable to adjust a threshold sensitivity for an associated one of the security settings.
133. The method of claim 132, wherein the threshold sensitivity comprises a level of similarity between communication stream characteristics and characteristics associated with the security categories.
134. The method of claim 129, wherein the protected entity is one of a computing device, a communications device, a mobile device, or a network.
135. One or more computer readable media having software program code operable to enable filter adjustments for incoming and outgoing communications streams, comprising: receiving a plurality of ranges from an administrator; providing a security control interface to a user, the security control interface comprising a plurality of security control representations associated with a plurality of security control settings, each of the security control mechanisms including an associated range from among the plurality of ranges, the associated range defining a minimum and maximum setting associated with the respective security controls; receiving input from the user through the security control interface, the input requesting adjustment of the security control settings; adjusting a plurality of thresholds related to plurality of control settings received from the user, the plurality of thresholds being associated with tolerance for a classification of potential security violation; and filtering communications streams from a protected entity associated with the user based upon the plurality of thresholds.
EP08728168.9A 2007-01-24 2008-01-24 Web reputation scoring Ceased EP2115642A4 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US11/626,479 US7937480B2 (en) 2005-06-02 2007-01-24 Aggregation of reputation data
US11/626,470 US8561167B2 (en) 2002-03-08 2007-01-24 Web reputation scoring
US11/626,620 US7779156B2 (en) 2007-01-24 2007-01-24 Reputation based load balancing
US11/626,644 US8179798B2 (en) 2007-01-24 2007-01-24 Reputation based connection throttling
PCT/US2008/051865 WO2008091980A1 (en) 2007-01-24 2008-01-24 Web reputation scoring

Publications (2)

Publication Number Publication Date
EP2115642A1 true EP2115642A1 (en) 2009-11-11
EP2115642A4 EP2115642A4 (en) 2014-02-26

Family

ID=39644880

Family Applications (1)

Application Number Title Priority Date Filing Date
EP08728168.9A Ceased EP2115642A4 (en) 2007-01-24 2008-01-24 Web reputation scoring

Country Status (4)

Country Link
EP (1) EP2115642A4 (en)
CN (1) CN101730892A (en)
AU (1) AU2008207924B2 (en)
WO (1) WO2008091980A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8549611B2 (en) 2002-03-08 2013-10-01 Mcafee, Inc. Systems and methods for classification of messaging entities
US8561167B2 (en) 2002-03-08 2013-10-15 Mcafee, Inc. Web reputation scoring
US8578480B2 (en) 2002-03-08 2013-11-05 Mcafee, Inc. Systems and methods for identifying potentially malicious messages
US8589503B2 (en) 2008-04-04 2013-11-19 Mcafee, Inc. Prioritizing network traffic
US8621559B2 (en) 2007-11-06 2013-12-31 Mcafee, Inc. Adjusting filter or classification control settings
US8621638B2 (en) 2010-05-14 2013-12-31 Mcafee, Inc. Systems and methods for classification of messaging entities
US8635690B2 (en) 2004-11-05 2014-01-21 Mcafee, Inc. Reputation based message processing
US8763114B2 (en) 2007-01-24 2014-06-24 Mcafee, Inc. Detecting image spam
US8762537B2 (en) 2007-01-24 2014-06-24 Mcafee, Inc. Multi-dimensional reputation scoring

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120215853A1 (en) * 2011-02-17 2012-08-23 Microsoft Corporation Managing Unwanted Communications Using Template Generation And Fingerprint Comparison Features
CN103559413B (en) * 2013-11-15 2016-11-02 北京搜房科技发展有限公司 A kind of data processing method and device
US9817843B2 (en) * 2014-09-26 2017-11-14 Mcafee, Inc. Notification of human safety reputation of a place based on historical events, profile data, and dynamic factors
US10291584B2 (en) * 2016-03-28 2019-05-14 Juniper Networks, Inc. Dynamic prioritization of network traffic based on reputation
US10938844B2 (en) 2016-07-22 2021-03-02 At&T Intellectual Property I, L.P. Providing security through characterizing mobile traffic by domain names
CN108876270B (en) * 2018-09-19 2022-08-12 惠龙易通国际物流股份有限公司 Automatic goods source auditing system and method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060212931A1 (en) * 2005-03-02 2006-09-21 Markmonitor, Inc. Trust evaluation systems and methods

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015942A1 (en) * 2002-03-08 2006-01-19 Ciphertrust, Inc. Systems and methods for classification of messaging entities
US7467206B2 (en) * 2002-12-23 2008-12-16 Microsoft Corporation Reputation system for web services
US20040177120A1 (en) * 2003-03-07 2004-09-09 Kirsch Steven T. Method for filtering e-mail messages
US20060095404A1 (en) * 2004-10-29 2006-05-04 The Go Daddy Group, Inc Presenting search engine results based on domain name related reputation
US20060155553A1 (en) * 2004-12-30 2006-07-13 Brohman Carole G Risk management methods and systems
US7912192B2 (en) * 2005-02-15 2011-03-22 At&T Intellectual Property Ii, L.P. Arrangement for managing voice over IP (VoIP) telephone calls, especially unsolicited or unwanted calls
US7822620B2 (en) * 2005-05-03 2010-10-26 Mcafee, Inc. Determining website reputations using automatic testing
US20060277259A1 (en) * 2005-06-07 2006-12-07 Microsoft Corporation Distributed sender reputations

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060212931A1 (en) * 2005-03-02 2006-09-21 Markmonitor, Inc. Trust evaluation systems and methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO2008091980A1 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8549611B2 (en) 2002-03-08 2013-10-01 Mcafee, Inc. Systems and methods for classification of messaging entities
US8561167B2 (en) 2002-03-08 2013-10-15 Mcafee, Inc. Web reputation scoring
US8578480B2 (en) 2002-03-08 2013-11-05 Mcafee, Inc. Systems and methods for identifying potentially malicious messages
US8635690B2 (en) 2004-11-05 2014-01-21 Mcafee, Inc. Reputation based message processing
US8763114B2 (en) 2007-01-24 2014-06-24 Mcafee, Inc. Detecting image spam
US8762537B2 (en) 2007-01-24 2014-06-24 Mcafee, Inc. Multi-dimensional reputation scoring
US9544272B2 (en) 2007-01-24 2017-01-10 Intel Corporation Detecting image spam
US10050917B2 (en) 2007-01-24 2018-08-14 Mcafee, Llc Multi-dimensional reputation scoring
US8621559B2 (en) 2007-11-06 2013-12-31 Mcafee, Inc. Adjusting filter or classification control settings
US8589503B2 (en) 2008-04-04 2013-11-19 Mcafee, Inc. Prioritizing network traffic
US8606910B2 (en) 2008-04-04 2013-12-10 Mcafee, Inc. Prioritizing network traffic
US8621638B2 (en) 2010-05-14 2013-12-31 Mcafee, Inc. Systems and methods for classification of messaging entities

Also Published As

Publication number Publication date
WO2008091980A1 (en) 2008-07-31
EP2115642A4 (en) 2014-02-26
CN101730892A (en) 2010-06-09
AU2008207924A1 (en) 2008-07-31
AU2008207924B2 (en) 2012-09-27

Similar Documents

Publication Publication Date Title
US9544272B2 (en) Detecting image spam
US10050917B2 (en) Multi-dimensional reputation scoring
EP2115688B1 (en) Correlation and analysis of entity attributes
US7779156B2 (en) Reputation based load balancing
US7937480B2 (en) Aggregation of reputation data
US8179798B2 (en) Reputation based connection throttling
US8561167B2 (en) Web reputation scoring
AU2008207924B2 (en) Web reputation scoring
US20180343254A1 (en) Method and system for tracking machines on a network using fuzzy guid technology
US20120174219A1 (en) Identifying mobile device reputations
US20110280160A1 (en) VoIP Caller Reputation System
US20110296519A1 (en) Reputation based connection control
US8291024B1 (en) Statistical spamming behavior analysis on mail clusters
NL1040630C2 (en) Method and system for email spam elimination and classification, using recipient defined codes and sender response.
Benson Edwin Raj et al. A novel approach for the early detection and identification of botnets

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20090821

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: MCAFEE, INC.

A4 Supplementary search report drawn up and despatched

Effective date: 20140129

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 17/30 20060101AFI20140123BHEP

Ipc: G06Q 10/10 20120101ALI20140123BHEP

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: MCAFEE, INC.

17Q First examination report despatched

Effective date: 20170124

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: MCAFEE, LLC

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20180802

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 17/30 20060101AFI20140123BHEP

Ipc: G06Q 10/10 20120101ALI20140123BHEP