US20110225200A1 - Privacy-preserving method for skimming of data from a collaborative infrastructure - Google Patents

Privacy-preserving method for skimming of data from a collaborative infrastructure Download PDF

Info

Publication number
US20110225200A1
US20110225200A1 US12/723,193 US72319310A US2011225200A1 US 20110225200 A1 US20110225200 A1 US 20110225200A1 US 72319310 A US72319310 A US 72319310A US 2011225200 A1 US2011225200 A1 US 2011225200A1
Authority
US
United States
Prior art keywords
collaboration data
privacy policy
fields
user
collaboration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/723,193
Other versions
US8959097B2 (en
Inventor
Catalina M. Danis
Thomas D. Erickson
Mary E. Helander
Wendy A. Kellogg
Rhonda Rosenbaum
David S. Singer
Calvin B. Swart
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/723,193 priority Critical patent/US8959097B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ROSENBAUM, RHONDA, DANIS, CATALINA M., ERICKSON, THOMAS D., HELANDER, MARY E., KELLOGG, WENDY A., SINGER, DAVID S., SWART, CALVIN B.
Publication of US20110225200A1 publication Critical patent/US20110225200A1/en
Application granted granted Critical
Publication of US8959097B2 publication Critical patent/US8959097B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Definitions

  • the present invention relates generally to the harvesting of collaboration data, and particularly to a method and system that harvests collaboration data while preserving the privacy of the senders and recipients of the collaboration data.
  • Computational systems that enable people to communicate with each other play an increasingly central role in the functioning of large organizations. These computational systems provide a collaborative infrastructure that facilitates communication.
  • the modern collaborative infrastructure can include file sharing, document libraries, chat rooms, application sharing, video conferencing and discussion forums to name only a few.
  • Communications may be categorized as linguistic, such as an email, and as non-linguistic, such as file or application sharing.
  • the communications by their very nature, contain data that is of potential value to the organization. For example, an email not only contains information within the body of the email, but also associated metadata about who is communicating with whom, and when that communication occurs. The information contained within the metadata is just as valuable to the organization as the original message conveyed in the email.
  • an improved methodology and framework for harvesting and analyzing information from an organization's collaboration data is desirable. It is further desirable that the improved methodology and system preserves the privacy of the communicators.
  • a method and system for producing a set of collaboration data in accordance with a privacy policy comprises defining a privacy policy for collaboration data, said privacy policy including a list of fields associated with the collaboration data to be harvested; harvesting the collaboration data associated with the fields specified as allowable under the privacy policy; transforming the collaboration data associated with the fields specified as allowable if transformed in accordance with a set of rules defined in the privacy policy; and storing the harvested collaboration data in a database.
  • a system for harvesting collaboration data comprising a processor operable to define a privacy policy for collaboration data, said privacy policy including a list of fields associated with the collaboration data to be harvested, harvest the collaboration data associated with the fields specified as allowable under the privacy policy, transform the collaboration data associated with the fields specified as allowable if transformed in accordance with a set of rules defined in the privacy policy, and store the harvested collaboration data in a database.
  • a computer program product employing the above method is also provided.
  • FIG. 1 is a flow diagram illustrating a method of the present invention in one embodiment for harvesting collaboration data
  • FIG. 2 is an example of how a user may edit a privacy policy
  • FIG. 3 provides several “before” and “after” examples of the effects of the privacy policy on collaboration data
  • FIG. 4 is an architectural diagram illustrating an infrastructure in which the invention is implemented according to one embodiment.
  • a method and system of the present invention allows collaboration data to be harvested in a manner that preserves the privacy of the communicators.
  • Collaboration data includes, but is not limited to, email messages, calendar entries in calendar programs and meeting and appointment schedules, and other information related to how groups of people work together in an organization.
  • an email may contain a message that is extremely personal and confidential between the sender and the recipient.
  • metadata associated with the email such as the time and date the email was sent, who sent the email, and who received the email, may all be of value to an organization. It is not necessary to know the content of the email for the organization to benefit from the non-identifying metadata associated with the email. Therefore, information beneficial to the organization may be harvested from the metadata by the method and system of the present invention. It should be understood, however, that the method and system of the present invention can also be applied to collaboration data generated by calendar programs and scheduling software etc., and is not limited to only email.
  • the method and system harvests collaboration data in accordance with a user defined privacy policy.
  • a series of user defined rules determine what collaboration data is allowable under the privacy policy, and what data is unallowable under the privacy policy. If possible, unallowable data under the privacy policy is transformed into data that is allowable under the policy. For example, if the policy provides for anonymity of the senders and receivers, then personally identifying information, such as a name, is replaced with a character string or text such as a pseudonym, allowing the anonymized data to be harvested.
  • the privacy policy may be set or adjusted by a user so that all of the user's collaboration data is harvestable, or so that only certain types of collaboration data are harvestable.
  • FIG. 3 there is depicted several “before and after” examples that may be generated for a user showing how the collaboration data may be transformed prior to harvesting in accordance with the rules of the privacy policy.
  • FIG. 3 shows a calendar entry generated by the collaboration software.
  • names associated with the calendar entry (shown in section 302 ) are transformed in accordance with one embodiment of the invention. For example, “Bob Jones” is replaced by “John Doe” and “Mary Smith” is replaced by “Jane Doe”.
  • telephone numbers have their last 7 digits masked by x's so that “1-888-555-1212” is replaced by “1-888-xxx-xxxx” in accordance with a privacy policy.
  • Other information present in section 304 such as a host password for a teleconference is also masked by “xxxx”.
  • website addresses such as “http://www.ibm.com” may be replaced with the character string “URL” as shown in FIG. 3 .
  • the information deemed sensitive or confidential by a policy rule may be replaced by hashes, such as an MD5 hash.
  • a hash obscures the value of the field while still allowing a user or a program to compare different field values to determine if the value stored in the field is the same as another harvested field value.
  • the “before and after examples” allow the user to view the effects of and/or adjust the privacy policy settings.
  • the user can generate ‘before and after’ examples of the privacy policy's effect on the collaboration data by selecting buttons 306 , 308 or 310 .
  • selecting the button ‘Generate example from calendar data’ 306 generates an anonymized calendar entry from collaboration data stored in a ‘mailfile’.
  • An example of an anonymous calendar entry is shown throughout FIG. 3 .
  • the user may also generate an example of an anonymized email (not shown) by selecting the button ‘Generate example from email data’ 308 and generate an example of an anonymized instant message (not shown) by selecting the button ‘Generate example from instant message logs’ 310 . Further details on how an anonymized email, calendar entry and instant message are generated are presented below.
  • a privacy policy for collaboration data is defined.
  • the rules of the privacy policy are set by the end user through a graphical user interface (GUI) presented to the user on a monitor or a display device by an application plug-in.
  • GUI graphical user interface
  • tabs 225 , 226 , 227 and 228 running across the top of the GUI allow the user to switch between different screens related to the privacy policy.
  • the user may view the current privacy policy and the rules associated with the current privacy policy by selecting tab ‘Current Policy’ 225 .
  • the user may edit the current privacy policy by selecting tab ‘Edit This Policy’ 226 .
  • FIG. 2 An example screenshot of the GUI in ‘Edit This Policy’ mode is shown throughout FIG. 2 .
  • the user may also view the effects of the privacy policy on collaboration data by selecting tab ‘Generate examples from applying this policy’ 227 .
  • An example of the effects of the privacy policy on collaboration data are further shown in FIG. 3 .
  • the user may also view a change log related to the privacy policy by selecting tab ‘Policy amendment and use history’ 228 .
  • One section of the GUI associated with tab ‘Edit This Policy’ 226 , section 202 allows the user to select a “default privacy policy” 203 , “opt out” 204 from sharing or providing access to any collaboration data to third parties or “select a shared privacy policy from a library of privacy policies” 205 .
  • the user marks or selects among these options 203 , 204 and 205 by clicking on an appropriate radio button.
  • the end user also has the ability to edit the current privacy policy by setting privileges for each category of collaboration data in sections 212 , 206 and 209 .
  • the privacy policy defines: 1) specific types or categories of collaboration data that can be captured ( 212 ); 2) specific types or categories of collaboration data that cannot be captured ( 209 ); and 3) specific types or categories of collaboration data that can be captured if transformed prior to harvesting in certain ways ( 206 ).
  • Categories of collaboration data may include the following: ‘names’ 215 , ‘phone numbers’ 216 , ‘URLs’ 217 , ‘photos’ 218 , ‘attachments’ (i.e., file attachments to emails) 219 , and ‘private calendar entries’ 220 .
  • Other categories of collaboration data rely upon a user set rule.
  • email filtering rules can sort email based upon information in the ‘to’ ‘from’ and ‘subject’ lines of the email.
  • the category ‘email with addresses not @ global_inc’ 221 identifies collaboration data that does not originate from the domain ‘global_inc’.
  • Categories of collaboration data can be added to a section 212 , 206 or 209 by the user clicking on the ‘Add amendment’ button 223 within the appropriate section. For example, a user may want to allow ‘telephone numbers’ 216 to be harvested as long as the telephone number is transformed. The user can add ‘telephone numbers’ 216 to section 206 by clicking on the ‘Add amendment’ button 223 and selecting ‘telephone numbers’. Other categories, as defined by the collaboration software, may also be selected and added to section 206 . Once a category is added to a section the privileges associated with that category can be set or edited by the user.
  • the ‘Show base policy template’ button 224 shows which categories of data are present in a section 212 , 206 or 209 as defined by a default privacy policy. This allows the user to compare the current privacy policy to the default privacy policy and determine if the user privacy policy is more or less restrictive than the default privacy policy.
  • Privileges for each category may be set by the user clicking on the “edit” button 208 associated with that particular category, while information about the current privileges and rules associated with that particular category may be shown by clicking on the “info” button 207 associated with that particular category.
  • the current policy may be saved by selecting the “save policy” 213 option, or any changes made to the current policy can be undone by selecting the “revert policy” 214 option.
  • any changes to the current privacy policy are recorded to a change log, which allows the user to revert back to a prior privacy policy.
  • these categories correspond to defined fields within collaboration software such as IBM® LOTUS® NOTES® available from International Business Machines Corp. of Armonk, N.Y. These fields are well documented metadata fields such as ‘$PublicAccess’ which stores a value that controls whether a calendar and scheduling entry is publicly viewable and ‘BlindCopyTo’ which stores the names of any ‘BCC’ recipients. A complete description of these fields and their associated field values are publicly available from IBM® LOTUS® NOTES® Calendaring & Scheduling Schema July 2007 from the website http://www.ibm.com/developerworks/lotus/documentation/dw-I-calendarschema.html which is incorporated by reference in its entirety. Other examples of fields include telephone numbers, telephone number area codes and exchanges, business department identifiers, meeting times, and acceptance or rejection of a meeting time and any subsequent rescheduling of a rejected meeting schedule.
  • Each of these fields is associated with one or more user defined privileges.
  • the privileges, or privacy policy setting indicate whether collaboration data can be harvested from the field.
  • One or more rules, such as whether or not the collaboration data stored within the field should be made anonymous and how to anonymize the collaboration data is also associated with each field.
  • the privacy policy may allow the privilege of information about the sender and the recipient of an email to be harvested.
  • the privacy policy may also require that the names of the sender and the recipient remain anonymous.
  • the collaboration data is harvested from the specified metadata fields of the email, but the names of the sender and recipient of the email are replaced with their job titles, e.g., “manager”, “assistant”, “associate”, etc. thus preserving anonymity.
  • the names of the sender and recipient may also be replaced with a hash value to provide greater anonymity.
  • user defined privacy policies can be stored in a library of privacy policies and shared with other users by selecting the “share policy” 211 option. This allows a department manager to create a privacy policy and a set of rules and privileges for each field and share the privacy policy with an entire group of co-workers. A person would be able to select the predefined privacy policy from the library without individually setting the rules and privileges for each category of collaboration data.
  • a written description of how the privacy policy affects collaboration data is provided within dialog box 210 . Dialog box 210 may also be used to provide any legal disclaimers or other information about the use of the privacy policy.
  • the user is made aware of the privacy policy.
  • the user of the collaboration software is made aware of the privacy policy by a message displayed on a display screen of the user device after the user connects to a communications network.
  • the privacy policy is displayed to the user as a splash screen, and the user must acknowledge the privacy policy and its terms before being allowed to send any collaboration data, e.g., email, across the communications network.
  • the user is given the choice to “opt out”, i.e., not allow the harvesting of any collaboration data. If the user opts out the method immediately ends.
  • step 106 filtering software on the client computer retrieves the collaboration data processed by the collaboration software.
  • collaboration software LOTUS NOTES®
  • collaboration data pertaining to email messages, calendaring and schedules are stored remotely on a server in a ‘mailfile’ and locally on the client computer in a ‘replica mailefile’ which is a duplicate copy of the remotely stored ‘mailfile’.
  • the collaboration software may also have an instant messaging feature that generates collaboration data.
  • LOTUS NOTES® has a built-in instant messaging feature that allows users to communicate with each other and in groups of users.
  • collaboration data related to instant messaging is stored in a file in the form of a ‘chat log’. These chat logs may be stored locally on the user's computer or remotely on a server and include at least the names or ‘user ids’ of the communicators in addition to other collaboration data.
  • the filtering software is a plug-in that interfaces with the collaboration software and is coded in an object oriented programming language such as JAVA®, PYTHON® or RUBY®.
  • the filtering software is a stand-alone external program that is separately operated on the client computer aside from the collaboration software and acts independently on the collaboration data. Both implementations of the filtering software access the collaboration data through either an actual file path or a relative file path to the ‘mailfile’.
  • the plug-in applies the privacy policy and the rules that were set by the user at step 102 to the collaboration data.
  • the collaboration data is scanned by the plug-in to determine if any fields present match the fields set in the privacy policy. If the fields match, the privileges associated with those fields are checked to determine if the collaboration data stored in those fields can be harvested.
  • the plug-in separates the collaboration data into one of three categories: 1) collaboration data that is allowable to be captured; 2) collaboration data that is not allowable to be captured; and 3) collaboration data that is allowable to be captured if transformed.
  • the collaboration data is harvested by extracting the collaboration data from the metadata field in accordance with the user defined rules and privileges.
  • the collaboration data may include the identity of the communicators, the time of communication, whether the communication was a reply to a prior communication, or a forward of a prior communication, routing information related to the communication, telephone numbers, etc.
  • the harvested collaboration data is stored in another file separate from the ‘mailfile’ or stored in a memory storage device, e.g., searchable database.
  • the method determines if the collaboration data can be transformed in accordance with one or more user defined rules that would make it allowable under the privacy policy.
  • Certain collaboration data may be “MARKED PRIVATE” by the end user and thus never harvestable or transformable.
  • the “MARKED PRIVATE” function is a well documented feature of collaboration software such as IBM® LOTUS® NOTES®. If the decision is no, i.e., the collaboration data cannot be transformed, then the method ends. If the decision is yes, then the method proceeds to step 113 . At step 113 the collaboration data is transformed in accordance with user defined rules.
  • emails from an attorney may be identified by the plug-in and removed from the collaboration data to preserve attorney-client privilege.
  • the transformation process may entail degrading the quality of the image so the people or objects in the image cannot be readily identified.
  • Image quality may be degraded by applying a filter, such as a blur filter to the image, and only allowing the filtered image to be harvested.
  • an attached file may be identified by a hash, such as an MD5 hash, which identifies the file without revealing the contents of the file.
  • the collaboration data may include information gathered from calendars and schedules. Calendars and schedules often contain information about meetings, times of meetings, meeting participants, telephone numbers associated with the meetings, and meeting locations. Often, a telephone number may be enough to reveal the identity of the meeting participants. For example, a telephone number beginning with an (888) area code may reveal that a teleconference call took place at a certain date and time. A pass code associated with the telephone number may reveal the identity of the participants who dialed into the teleconference call. The collaboration data may be transformed by masking all or part of the telephone number or masking the pass code with a character such as an ‘x” to preserve the identity of the participants. In another embodiment, the telephone number or a portion of the telephone number may be replaced by a hash. After the data is transformed, the method then proceeds to step 114 , and as discussed above the collaboration data is harvested.
  • Collaboration data that is not allowed to be harvested and that cannot be transformed is allowed to pass through the plug-in unaltered and is not stored.
  • the collaboration data that is harvested is stored in a searchable database for later analysis.
  • the stored information may reveal communication patterns, connections between coworkers, connections between employees and the outside world, and how decisions are made within the organization.
  • FIG. 4 is an architectural overview of a computing environment 400 in which the invention may be implemented.
  • a server 401 is situated between two client computers 416 1 and 416 2 .
  • the client computers 416 may be desktop computers, laptop computers, or any other device that may benefit from connection to a computer network.
  • the server 401 may be coupled directly to the client computers 416 as shown, or coupled indirectly via a network such as the Internet, Ethernet, private local area network (LAN) and the like.
  • LAN local area network
  • the server 401 comprises a central processing unit (CPU) 402 , a memory 404 , mass storage 412 , and support circuitry 403 .
  • the CPU 402 is interconnected to the memory 404 and the support circuitry 403 .
  • the support circuitry includes cache, power supplies, clocks, input/output interface circuitry, a network interface and the like.
  • the mass storage 412 may be physically present within the server or operably coupled to the server 401 as part of a common mass storage system that is shared by a plurality of servers.
  • the mass storage comprises a searchable database 418 .
  • the database 418 stores the information harvested from the collaboration data.
  • the memory 404 may include random access memory, read only memory, removable disk memory, flash memory, and various combinations of these types of memory.
  • the memory 404 is sometimes referred to as a main memory and may in part be used as cache memory.
  • the memory 404 stores an operating system (OS) 406 and individual ‘mailfiles’ for each user of the collaboration software.
  • OS operating system
  • ‘mailfile’ 405 1 corresponds to the user of client computer 416 2
  • ‘mailfile’ 428 1 corresponds to the user of client computer 416 1 .
  • the mailfiles 405 1 and 428 1 store the collaboration data generated by their respective users.
  • the client computers 416 comprise a central processing unit (CPU) 420 , a memory 424 , and support circuitry 422 .
  • the CPU 420 is interconnected to the memory 424 and the support circuitry 422 .
  • the support circuitry includes cache, power supplies, clocks, input/output interface circuitry, a network interface and the like.
  • the memory 424 may include random access memory, read only memory, removable disk memory, flash memory, and various combinations of these types of memory.
  • the memory 424 is sometimes referred to as a main memory and may in part be used as cache memory.
  • the memory 424 stores an operating system (OS) 425 , collaboration software 426 such as IBM® LOTUS® NOTES®, and filtering software 427 .
  • OS operating system
  • collaboration software 426 such as IBM® LOTUS® NOTES®
  • filtering software 427 Each of the client computers 416 also stores a local copy or a ‘replica mailfile’ of the user's ‘mailfile’ that is stored on the server 401 .
  • client computer 416 2 stores a ‘replica mailfile’ 405 2 of remotely stored ‘mailfile’ 405 1 and client computer 416 1 stores a ‘replica mailfile’ 428 2 of remotely stored ‘mailfile’ 428 1 .
  • the ‘mailfiles’ store email data (including email metadata) utilized by the collaboration software and the filtering software 427 implements the privacy policy and the rules described above.
  • the filtering software 427 is a plug-in which interacts with the collaboration software 426 via an API.
  • the filtering software 427 is an external process initiated by the user separately from the running of the collaboration software 426 .
  • the filtering software 427 operates on the collaboration data stored in the user's ‘replica mailfile’ 405 2 associated with the collaboration software 426 to apply the rules defined in the privacy policy.
  • the filtering software 427 could operate on the collaboration data remotely stored on the server 401 in the user's ‘mailfile’ 405 1 .
  • the filtering software 427 retrieves the collaboration data from the ‘replica mailfile’ 405 2 and filters the collaboration data in accordance with the privacy policy and the user defined rules as discussed above.
  • the filtering software then harvests the filtered data and stores the harvested data, i.e., the fields and the content of the fields, in a searchable database, e.g., database 418 or another file, e.g., ‘harvested data’ 429 .
  • the filtering software also enforces the privacy policy when another user or client computer requests information. For example, if client computer 416 2 requests information from client computer 416 1 , the filtering software 427 may intercept the request and provide client computer 416 2 with access to only the anonymized data stored in file 429 .
  • aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
  • a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Abstract

A method and system for harvesting collaboration data in accordance with a privacy policy is provided. In one embodiment, the method comprises defining a privacy policy for collaboration data, said privacy policy including a list of fields associated with the collaboration data to be harvested; harvesting the collaboration data associated with the fields specified as allowable under the privacy policy; transforming the collaboration data associated with the fields specified as allowable if said collaboration data can be transformed in accordance with a set of rules defined in the privacy policy; and storing the harvested collaboration data in a database.

Description

    BACKGROUND
  • The present invention relates generally to the harvesting of collaboration data, and particularly to a method and system that harvests collaboration data while preserving the privacy of the senders and recipients of the collaboration data.
  • Computational systems that enable people to communicate with each other play an increasingly central role in the functioning of large organizations. These computational systems provide a collaborative infrastructure that facilitates communication. The modern collaborative infrastructure can include file sharing, document libraries, chat rooms, application sharing, video conferencing and discussion forums to name only a few. Communications may be categorized as linguistic, such as an email, and as non-linguistic, such as file or application sharing. The communications, by their very nature, contain data that is of potential value to the organization. For example, an email not only contains information within the body of the email, but also associated metadata about who is communicating with whom, and when that communication occurs. The information contained within the metadata is just as valuable to the organization as the original message conveyed in the email.
  • Certain aspects of communication between individuals are often regarded as confidential and private. This is true regardless of whether the organization has a policy explicitly stating that all communications that occur over its systems are property of the organization. An expectation of privacy facilitates communication about a wide range of issues, some of which may be unpopular, tentative, or informal. Free and unimpeded communication between parties improves the quality of the decision making process of an organization and enables the organization to reach better decisions.
  • Existing solutions to ensure privacy include such methods as user authentication to the computational systems, which prevents unauthorized access to the collaboration data. P3P, also known as Platform for Privacy Preferences, enables a website to express its privacy practices in a standard format that can be retrieved automatically and interpreted easily by users. However, neither of these solutions allow the collection or analysis of collaboration data in an adjustable manner while also preserving the privacy of the communicators.
  • Therefore, an improved methodology and framework for harvesting and analyzing information from an organization's collaboration data is desirable. It is further desirable that the improved methodology and system preserves the privacy of the communicators.
  • SUMMARY
  • A method and system for producing a set of collaboration data in accordance with a privacy policy is provided. In one embodiment, the method comprises defining a privacy policy for collaboration data, said privacy policy including a list of fields associated with the collaboration data to be harvested; harvesting the collaboration data associated with the fields specified as allowable under the privacy policy; transforming the collaboration data associated with the fields specified as allowable if transformed in accordance with a set of rules defined in the privacy policy; and storing the harvested collaboration data in a database.
  • In another embodiment, a system for harvesting collaboration data, comprising a processor operable to define a privacy policy for collaboration data, said privacy policy including a list of fields associated with the collaboration data to be harvested, harvest the collaboration data associated with the fields specified as allowable under the privacy policy, transform the collaboration data associated with the fields specified as allowable if transformed in accordance with a set of rules defined in the privacy policy, and store the harvested collaboration data in a database.
  • A computer program product employing the above method is also provided.
  • Further features as well as the structure and operation of various embodiments are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flow diagram illustrating a method of the present invention in one embodiment for harvesting collaboration data;
  • FIG. 2 is an example of how a user may edit a privacy policy;
  • FIG. 3 provides several “before” and “after” examples of the effects of the privacy policy on collaboration data; and
  • FIG. 4 is an architectural diagram illustrating an infrastructure in which the invention is implemented according to one embodiment.
  • DETAILED DESCRIPTION
  • A method and system of the present invention allows collaboration data to be harvested in a manner that preserves the privacy of the communicators. Collaboration data includes, but is not limited to, email messages, calendar entries in calendar programs and meeting and appointment schedules, and other information related to how groups of people work together in an organization. For example, an email may contain a message that is extremely personal and confidential between the sender and the recipient. However, metadata associated with the email, such as the time and date the email was sent, who sent the email, and who received the email, may all be of value to an organization. It is not necessary to know the content of the email for the organization to benefit from the non-identifying metadata associated with the email. Therefore, information beneficial to the organization may be harvested from the metadata by the method and system of the present invention. It should be understood, however, that the method and system of the present invention can also be applied to collaboration data generated by calendar programs and scheduling software etc., and is not limited to only email.
  • In one embodiment, the method and system harvests collaboration data in accordance with a user defined privacy policy. A series of user defined rules determine what collaboration data is allowable under the privacy policy, and what data is unallowable under the privacy policy. If possible, unallowable data under the privacy policy is transformed into data that is allowable under the policy. For example, if the policy provides for anonymity of the senders and receivers, then personally identifying information, such as a name, is replaced with a character string or text such as a pseudonym, allowing the anonymized data to be harvested. The privacy policy may be set or adjusted by a user so that all of the user's collaboration data is harvestable, or so that only certain types of collaboration data are harvestable.
  • Referring now to FIG. 3, there is depicted several “before and after” examples that may be generated for a user showing how the collaboration data may be transformed prior to harvesting in accordance with the rules of the privacy policy. FIG. 3, as an example, shows a calendar entry generated by the collaboration software. In one embodiment, names associated with the calendar entry (shown in section 302) are transformed in accordance with one embodiment of the invention. For example, “Bob Jones” is replaced by “John Doe” and “Mary Smith” is replaced by “Jane Doe”. As shown in section 304 of the calendar entry, telephone numbers have their last 7 digits masked by x's so that “1-888-555-1212” is replaced by “1-888-xxx-xxxx” in accordance with a privacy policy. Other information present in section 304, such as a host password for a teleconference is also masked by “xxxx”. In other embodiments, website addresses such as “http://www.ibm.com” may be replaced with the character string “URL” as shown in FIG. 3. In another embodiment, the information deemed sensitive or confidential by a policy rule may be replaced by hashes, such as an MD5 hash. A hash obscures the value of the field while still allowing a user or a program to compare different field values to determine if the value stored in the field is the same as another harvested field value. The “before and after examples” allow the user to view the effects of and/or adjust the privacy policy settings.
  • In one embodiment, the user can generate ‘before and after’ examples of the privacy policy's effect on the collaboration data by selecting buttons 306, 308 or 310. In one embodiment, selecting the button ‘Generate example from calendar data’ 306 generates an anonymized calendar entry from collaboration data stored in a ‘mailfile’. An example of an anonymous calendar entry is shown throughout FIG. 3. The user may also generate an example of an anonymized email (not shown) by selecting the button ‘Generate example from email data’ 308 and generate an example of an anonymized instant message (not shown) by selecting the button ‘Generate example from instant message logs’ 310. Further details on how an anonymized email, calendar entry and instant message are generated are presented below.
  • Referring back to FIG. 1, a method in accordance with one embodiment of the present invention for harvesting collaboration data is provided. At step 102, a privacy policy for collaboration data is defined. Referring now to FIG. 2, in one embodiment, the rules of the privacy policy are set by the end user through a graphical user interface (GUI) presented to the user on a monitor or a display device by an application plug-in. In one embodiment, tabs 225, 226, 227 and 228 running across the top of the GUI allow the user to switch between different screens related to the privacy policy. The user may view the current privacy policy and the rules associated with the current privacy policy by selecting tab ‘Current Policy’ 225. The user may edit the current privacy policy by selecting tab ‘Edit This Policy’ 226. An example screenshot of the GUI in ‘Edit This Policy’ mode is shown throughout FIG. 2. The user may also view the effects of the privacy policy on collaboration data by selecting tab ‘Generate examples from applying this policy’ 227. An example of the effects of the privacy policy on collaboration data are further shown in FIG. 3. The user may also view a change log related to the privacy policy by selecting tab ‘Policy amendment and use history’ 228.
  • One section of the GUI associated with tab ‘Edit This Policy’ 226, section 202, allows the user to select a “default privacy policy” 203, “opt out” 204 from sharing or providing access to any collaboration data to third parties or “select a shared privacy policy from a library of privacy policies” 205. In one embodiment, the user marks or selects among these options 203, 204 and 205 by clicking on an appropriate radio button. The end user also has the ability to edit the current privacy policy by setting privileges for each category of collaboration data in sections 212, 206 and 209. The privacy policy defines: 1) specific types or categories of collaboration data that can be captured (212); 2) specific types or categories of collaboration data that cannot be captured (209); and 3) specific types or categories of collaboration data that can be captured if transformed prior to harvesting in certain ways (206). Categories of collaboration data may include the following: ‘names’ 215, ‘phone numbers’ 216, ‘URLs’ 217, ‘photos’ 218, ‘attachments’ (i.e., file attachments to emails) 219, and ‘private calendar entries’ 220. Other categories of collaboration data rely upon a user set rule. For example, email filtering rules can sort email based upon information in the ‘to’ ‘from’ and ‘subject’ lines of the email. As an example, the category ‘email with addresses not @ global_inc’ 221 identifies collaboration data that does not originate from the domain ‘global_inc’.
  • Categories of collaboration data can be added to a section 212, 206 or 209 by the user clicking on the ‘Add amendment’ button 223 within the appropriate section. For example, a user may want to allow ‘telephone numbers’ 216 to be harvested as long as the telephone number is transformed. The user can add ‘telephone numbers’ 216 to section 206 by clicking on the ‘Add amendment’ button 223 and selecting ‘telephone numbers’. Other categories, as defined by the collaboration software, may also be selected and added to section 206. Once a category is added to a section the privileges associated with that category can be set or edited by the user. The ‘Show base policy template’ button 224 shows which categories of data are present in a section 212, 206 or 209 as defined by a default privacy policy. This allows the user to compare the current privacy policy to the default privacy policy and determine if the user privacy policy is more or less restrictive than the default privacy policy.
  • Privileges for each category may be set by the user clicking on the “edit” button 208 associated with that particular category, while information about the current privileges and rules associated with that particular category may be shown by clicking on the “info” button 207 associated with that particular category. After a user sets the privacy policy, the current policy may be saved by selecting the “save policy” 213 option, or any changes made to the current policy can be undone by selecting the “revert policy” 214 option. In one embodiment, any changes to the current privacy policy are recorded to a change log, which allows the user to revert back to a prior privacy policy.
  • In one embodiment, these categories correspond to defined fields within collaboration software such as IBM® LOTUS® NOTES® available from International Business Machines Corp. of Armonk, N.Y. These fields are well documented metadata fields such as ‘$PublicAccess’ which stores a value that controls whether a calendar and scheduling entry is publicly viewable and ‘BlindCopyTo’ which stores the names of any ‘BCC’ recipients. A complete description of these fields and their associated field values are publicly available from IBM® LOTUS® NOTES® Calendaring & Scheduling Schema July 2007 from the website http://www.ibm.com/developerworks/lotus/documentation/dw-I-calendarschema.html which is incorporated by reference in its entirety. Other examples of fields include telephone numbers, telephone number area codes and exchanges, business department identifiers, meeting times, and acceptance or rejection of a meeting time and any subsequent rescheduling of a rejected meeting schedule.
  • Each of these fields is associated with one or more user defined privileges. The privileges, or privacy policy setting, indicate whether collaboration data can be harvested from the field. One or more rules, such as whether or not the collaboration data stored within the field should be made anonymous and how to anonymize the collaboration data is also associated with each field. For example, the privacy policy may allow the privilege of information about the sender and the recipient of an email to be harvested. However, the privacy policy may also require that the names of the sender and the recipient remain anonymous. As one example of how the invention functions, the collaboration data is harvested from the specified metadata fields of the email, but the names of the sender and recipient of the email are replaced with their job titles, e.g., “manager”, “assistant”, “associate”, etc. thus preserving anonymity. In another embodiment, the names of the sender and recipient may also be replaced with a hash value to provide greater anonymity.
  • In one embodiment, user defined privacy policies can be stored in a library of privacy policies and shared with other users by selecting the “share policy” 211 option. This allows a department manager to create a privacy policy and a set of rules and privileges for each field and share the privacy policy with an entire group of co-workers. A person would be able to select the predefined privacy policy from the library without individually setting the rules and privileges for each category of collaboration data. A written description of how the privacy policy affects collaboration data is provided within dialog box 210. Dialog box 210 may also be used to provide any legal disclaimers or other information about the use of the privacy policy.
  • Referring back to FIG. 1, at step 104, the user is made aware of the privacy policy. In one embodiment, the user of the collaboration software is made aware of the privacy policy by a message displayed on a display screen of the user device after the user connects to a communications network. In one embodiment, the privacy policy is displayed to the user as a splash screen, and the user must acknowledge the privacy policy and its terms before being allowed to send any collaboration data, e.g., email, across the communications network. At decision step 105, the user is given the choice to “opt out”, i.e., not allow the harvesting of any collaboration data. If the user opts out the method immediately ends. There may be many reasons why the user may elect to opt out, including legal obligations to the organization, e.g., the user is part of the organization's legal department, or a heightened need for privacy, e.g. the user is a high level officer of the organization. If the user does not opt out, then the method proceeds to step 106. At step 106, filtering software on the client computer retrieves the collaboration data processed by the collaboration software. In one embodiment implementing collaboration software LOTUS NOTES®, collaboration data pertaining to email messages, calendaring and schedules are stored remotely on a server in a ‘mailfile’ and locally on the client computer in a ‘replica mailefile’ which is a duplicate copy of the remotely stored ‘mailfile’. The collaboration software may also have an instant messaging feature that generates collaboration data. As an example, LOTUS NOTES® has a built-in instant messaging feature that allows users to communicate with each other and in groups of users. In one embodiment, collaboration data related to instant messaging is stored in a file in the form of a ‘chat log’. These chat logs may be stored locally on the user's computer or remotely on a server and include at least the names or ‘user ids’ of the communicators in addition to other collaboration data. In one embodiment, the filtering software is a plug-in that interfaces with the collaboration software and is coded in an object oriented programming language such as JAVA®, PYTHON® or RUBY®. In another embodiment, the filtering software is a stand-alone external program that is separately operated on the client computer aside from the collaboration software and acts independently on the collaboration data. Both implementations of the filtering software access the collaboration data through either an actual file path or a relative file path to the ‘mailfile’.
  • At step 108, the plug-in applies the privacy policy and the rules that were set by the user at step 102 to the collaboration data. In one embodiment, the collaboration data is scanned by the plug-in to determine if any fields present match the fields set in the privacy policy. If the fields match, the privileges associated with those fields are checked to determine if the collaboration data stored in those fields can be harvested. In one embodiment, the plug-in separates the collaboration data into one of three categories: 1) collaboration data that is allowable to be captured; 2) collaboration data that is not allowable to be captured; and 3) collaboration data that is allowable to be captured if transformed.
  • At decision step 110, a determination is made as to whether the collaboration data is allowable to be harvested by the filtering software under the privacy policy. If the collaboration data is allowable to be harvested then the method proceeds to step 114. At step 114, the collaboration data is harvested by extracting the collaboration data from the metadata field in accordance with the user defined rules and privileges. The collaboration data may include the identity of the communicators, the time of communication, whether the communication was a reply to a prior communication, or a forward of a prior communication, routing information related to the communication, telephone numbers, etc. In one embodiment, the harvested collaboration data is stored in another file separate from the ‘mailfile’ or stored in a memory storage device, e.g., searchable database.
  • If the collaboration data is not allowable under the privacy policy, then the method proceeds to decision step 112. At step 112, the method determines if the collaboration data can be transformed in accordance with one or more user defined rules that would make it allowable under the privacy policy. Certain collaboration data may be “MARKED PRIVATE” by the end user and thus never harvestable or transformable. The “MARKED PRIVATE” function is a well documented feature of collaboration software such as IBM® LOTUS® NOTES®. If the decision is no, i.e., the collaboration data cannot be transformed, then the method ends. If the decision is yes, then the method proceeds to step 113. At step 113 the collaboration data is transformed in accordance with user defined rules. For example, emails from an attorney may be identified by the plug-in and removed from the collaboration data to preserve attorney-client privilege. In another embodiment, if the collaboration data includes an image or a photograph (commonly identified by the file extension .jpg, .gif, .bmp etc.) the transformation process may entail degrading the quality of the image so the people or objects in the image cannot be readily identified. Image quality may be degraded by applying a filter, such as a blur filter to the image, and only allowing the filtered image to be harvested. In another embodiment, an attached file may be identified by a hash, such as an MD5 hash, which identifies the file without revealing the contents of the file.
  • In yet another embodiment, the collaboration data may include information gathered from calendars and schedules. Calendars and schedules often contain information about meetings, times of meetings, meeting participants, telephone numbers associated with the meetings, and meeting locations. Often, a telephone number may be enough to reveal the identity of the meeting participants. For example, a telephone number beginning with an (888) area code may reveal that a teleconference call took place at a certain date and time. A pass code associated with the telephone number may reveal the identity of the participants who dialed into the teleconference call. The collaboration data may be transformed by masking all or part of the telephone number or masking the pass code with a character such as an ‘x” to preserve the identity of the participants. In another embodiment, the telephone number or a portion of the telephone number may be replaced by a hash. After the data is transformed, the method then proceeds to step 114, and as discussed above the collaboration data is harvested.
  • Collaboration data that is not allowed to be harvested and that cannot be transformed is allowed to pass through the plug-in unaltered and is not stored. In one embodiment, the collaboration data that is harvested is stored in a searchable database for later analysis. The stored information may reveal communication patterns, connections between coworkers, connections between employees and the outside world, and how decisions are made within the organization.
  • FIG. 4 is an architectural overview of a computing environment 400 in which the invention may be implemented. As illustrated in FIG. 4, a server 401 is situated between two client computers 416 1 and 416 2. The client computers 416 may be desktop computers, laptop computers, or any other device that may benefit from connection to a computer network. One would appreciate that there could be multiple client computers 416 n routing collaboration data through the server 401. The server 401 may be coupled directly to the client computers 416 as shown, or coupled indirectly via a network such as the Internet, Ethernet, private local area network (LAN) and the like.
  • In one embodiment, the server 401 comprises a central processing unit (CPU) 402, a memory 404, mass storage 412, and support circuitry 403. The CPU 402 is interconnected to the memory 404 and the support circuitry 403. The support circuitry includes cache, power supplies, clocks, input/output interface circuitry, a network interface and the like. The mass storage 412 may be physically present within the server or operably coupled to the server 401 as part of a common mass storage system that is shared by a plurality of servers. In one embodiment, the mass storage comprises a searchable database 418. In one embodiment, the database 418 stores the information harvested from the collaboration data.
  • The memory 404 may include random access memory, read only memory, removable disk memory, flash memory, and various combinations of these types of memory. The memory 404 is sometimes referred to as a main memory and may in part be used as cache memory. The memory 404 stores an operating system (OS) 406 and individual ‘mailfiles’ for each user of the collaboration software. As an example, ‘mailfile’ 405 1 corresponds to the user of client computer 416 2 and ‘mailfile’ 428 1 corresponds to the user of client computer 416 1. The mailfiles 405 1 and 428 1 store the collaboration data generated by their respective users.
  • The client computers 416 comprise a central processing unit (CPU) 420, a memory 424, and support circuitry 422. The CPU 420 is interconnected to the memory 424 and the support circuitry 422. The support circuitry includes cache, power supplies, clocks, input/output interface circuitry, a network interface and the like.
  • The memory 424 may include random access memory, read only memory, removable disk memory, flash memory, and various combinations of these types of memory. The memory 424 is sometimes referred to as a main memory and may in part be used as cache memory. The memory 424 stores an operating system (OS) 425, collaboration software 426 such as IBM® LOTUS® NOTES®, and filtering software 427. Each of the client computers 416 also stores a local copy or a ‘replica mailfile’ of the user's ‘mailfile’ that is stored on the server 401. As an example, client computer 416 2 stores a ‘replica mailfile’ 405 2 of remotely stored ‘mailfile’ 405 1 and client computer 416 1 stores a ‘replica mailfile’ 428 2 of remotely stored ‘mailfile’ 428 1. In one embodiment, the ‘mailfiles’ store email data (including email metadata) utilized by the collaboration software and the filtering software 427 implements the privacy policy and the rules described above. In one embodiment, the filtering software 427 is a plug-in which interacts with the collaboration software 426 via an API. In another embodiment, the filtering software 427 is an external process initiated by the user separately from the running of the collaboration software 426. The following example assumes the user's collaboration data is stored in both ‘mailfile’ 405 1 and ‘replica mailfile’ 405 2. In one embodiment, the filtering software 427 operates on the collaboration data stored in the user's ‘replica mailfile’ 405 2 associated with the collaboration software 426 to apply the rules defined in the privacy policy. In another embodiment, the filtering software 427 could operate on the collaboration data remotely stored on the server 401 in the user's ‘mailfile’ 405 1.
  • The filtering software 427 retrieves the collaboration data from the ‘replica mailfile’ 405 2 and filters the collaboration data in accordance with the privacy policy and the user defined rules as discussed above. The filtering software then harvests the filtered data and stores the harvested data, i.e., the fields and the content of the fields, in a searchable database, e.g., database 418 or another file, e.g., ‘harvested data’ 429. In one embodiment, the filtering software also enforces the privacy policy when another user or client computer requests information. For example, if client computer 416 2 requests information from client computer 416 1, the filtering software 427 may intercept the request and provide client computer 416 2 with access to only the anonymized data stored in file 429.
  • As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • Referring now to FIGS. 1 through 3. The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • While the present invention has been particularly shown and described with respect to preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in forms and details may be made without departing from the spirit and scope of the present invention. It is therefore intended that the present invention not be limited to the exact forms and details described and illustrated, but fall within the scope of the appended claims.

Claims (24)

1. A method for harvesting collaboration data, comprising:
defining a privacy policy for collaboration data, said privacy policy including a list of fields that identify the collaboration data to be harvested and privileges for harvesting the collaboration data from said fields;
harvesting the collaboration data associated with the fields whose privileges are specified as allowable under the privacy policy;
transforming the collaboration data associated with the fields whose privileges are specified as allowable if said collaboration data can be transformed in accordance with a set of rules defined in the privacy policy; and
storing the harvested collaboration data in a file.
2. The method of claim 1, wherein the set of rules includes replacing a name with a pseudonym.
3. The method of claim 1, wherein the set of rules includes masking digits of a telephone number with a character.
4. The method of claim 1, wherein the set of rules includes replacing at least a portion of a website address with a character string.
5. The method of claim 1, wherein the set of rules includes comprises degrading quality of an image.
6. The method of claim 1, wherein the set of rules includes calculating a hash of the collaboration data.
7. The method of claim 1, further comprising:
alerting a user to the privacy policy and the set of rules defined in the privacy policy for the collaboration data;
alerting the user to the fields associated with the collaboration data to be harvested and the privileges associated with the fields under the privacy policy;
allowing the user to modify the privileges associated with the fields; and
allowing the user to opt out from participating in the harvesting of the collaboration data.
8. The method of claim 1, further comprising:
retrieving the collaboration data from a file;
applying the set of rules defined in the privacy policy to the collaboration data stored in the file; and
harvesting the collaboration data after application of the set of rules defined in the privacy policy and storing the harvested collaboration data in at least one of another file and the database.
9. A computer program product for harvesting collaboration data, comprising:
a storage medium readable by a processor and storing instructions for execution by the processor for performing a method comprising:
defining a privacy policy for collaboration data, said privacy policy including a list of fields that identify the collaboration data to be harvested and privileges for harvesting the collaboration data from said fields;
harvesting the collaboration data associated with the fields whose privileges are specified as allowable under the privacy policy;
transforming the collaboration data associated with the fields whose privileges are specified as allowable if said collaboration data can be transformed in accordance with a set of rules defined in the privacy policy; and
storing the harvested collaboration data in a file.
10. The computer program product of claim 9, wherein the set of rules includes replacing a name with a pseudonym.
11. The computer program product of claim 9, wherein the set of rules includes masking digits of a telephone number with a character.
12. The computer program product of claim 9, wherein the set of rules includes replacing at least a portion of a website address with a character string.
13. The computer program product of claim 9, wherein the set of rules includes comprises degrading quality of an image.
14. The computer program product of claim 9, wherein the set of rules includes calculating a hash of the collaboration data.
15. The computer program product of claim 9, further comprising:
alerting a user to the privacy policy and the set of rules defined in the privacy policy for the collaboration data;
alerting the user to the fields associated with the collaboration data to be harvested and the privileges associated with the fields under the privacy policy;
allowing the user to modify the privileges associated with the fields; and
allowing the user to opt out from participating in the harvesting of the collaboration data.
16. The computer program product of claim 9, further comprising:
retrieving the collaboration data from a file;
applying the set of rules defined in the privacy policy to the collaboration data stored in the file; and
harvesting the collaboration data after application of the set of rules defined in the privacy policy and storing the harvested collaboration data in at least one of another file and the database.
17. A system for harvesting collaboration data, comprising:
a processor operable to define a privacy policy for collaboration data, said privacy policy including a list of fields associated with the collaboration data to be harvested and privileges for harvesting the collaboration data from said fields, harvest the collaboration data associated with the fields whose privileges are specified as allowable under the privacy policy, transform the collaboration data associated with the fields whose privileges are specified as allowable if said collaboration data can be transformed in accordance with a set of rules defined in the privacy policy, and store the harvested collaboration data in a file.
18. The system for of claim 17, wherein the processor is further operable to alert a user to the privacy policy and the set of rules defined in the privacy policy for the collaboration data, alert the user to the fields associated with the collaboration data to be harvested under the privacy policy, allow the user to modify the privileges associated with the fields and allow the user the opportunity to opt out from participating in the harvesting of the collaboration data.
19. The system for of claim 18, wherein the processor is further operable to retrieve the collaboration data from a file, apply the set of rules defined in the privacy policy to the collaboration data stored in the file and harvest the collaboration data after application of the set of rules defined in the privacy policy and store the harvested collaboration data in at least one of another file and the database.
20. The system of claim 17, wherein the processor is further operable to replace a name with a pseudonym.
21. The system of claim 17, wherein the processor is further operable to mask digits of a telephone number with a character.
22. The system of claim 17, wherein the processor is further operable to replace at least a portion of a website address with a character string.
23. The system of claim 17, wherein the processor is further operable to degrade quality of an image.
24. The system of claim 17, wherein the processor is further operable to calculate a hash of the collaboration data.
US12/723,193 2010-03-12 2010-03-12 Privacy-preserving method for skimming of data from a collaborative infrastructure Expired - Fee Related US8959097B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/723,193 US8959097B2 (en) 2010-03-12 2010-03-12 Privacy-preserving method for skimming of data from a collaborative infrastructure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/723,193 US8959097B2 (en) 2010-03-12 2010-03-12 Privacy-preserving method for skimming of data from a collaborative infrastructure

Publications (2)

Publication Number Publication Date
US20110225200A1 true US20110225200A1 (en) 2011-09-15
US8959097B2 US8959097B2 (en) 2015-02-17

Family

ID=44560935

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/723,193 Expired - Fee Related US8959097B2 (en) 2010-03-12 2010-03-12 Privacy-preserving method for skimming of data from a collaborative infrastructure

Country Status (1)

Country Link
US (1) US8959097B2 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120102120A1 (en) * 2010-10-20 2012-04-26 Qualcomm Incorporated Methods and apparatuses for affecting programming of content for transmission over a multicast network
US20130064142A1 (en) * 2011-09-12 2013-03-14 Plantronics, Inc. Method and Systems For Connection Into Conference Calls
US20130174211A1 (en) * 2011-12-30 2013-07-04 Nokia Corporation Method And Apparatus Providing Privacy Setting And Monitoring User Interface
US20150067881A1 (en) * 2013-09-03 2015-03-05 Kabel Deutschland Vertrieb Und Service Gmbh Method and system for providing anonymized data from a database
US9081986B2 (en) 2012-05-07 2015-07-14 Nokia Technologies Oy Method and apparatus for user information exchange
CN104813625A (en) * 2013-07-26 2015-07-29 华为终端有限公司 Synchronization signal bearing method and user equipment
US9130920B2 (en) * 2013-01-07 2015-09-08 Zettaset, Inc. Monitoring of authorization-exceeding activity in distributed networks
US9277364B2 (en) 2012-06-25 2016-03-01 Nokia Technologies Oy Methods and apparatus for reporting location privacy
US9401886B2 (en) 2012-05-30 2016-07-26 International Business Machines Corporation Preventing personal information from being posted to an internet
US9678617B2 (en) 2013-01-14 2017-06-13 Patrick Soon-Shiong Shared real-time content editing activated by an image
US9740876B1 (en) * 2015-09-15 2017-08-22 Symantec Corporation Securely storing and provisioning security telemetry of multiple organizations for cloud based analytics
US20180027019A1 (en) * 2016-07-20 2018-01-25 International Business Machines Corporation Privacy-preserving user-experience monitoring
US20180253560A1 (en) * 2017-03-02 2018-09-06 International Business Machines Corporation Presenting a data instance based on presentation rules
WO2019055573A1 (en) * 2017-09-15 2019-03-21 Endgame, Inc. Improved voice and textual interface for closed-domain environment
US20190087604A1 (en) * 2017-09-21 2019-03-21 International Business Machines Corporation Applying a differential privacy operation on a cluster of data
US10548105B2 (en) 2013-07-26 2020-01-28 Huawei Device Co., Ltd. Synchronization signal carrying method and user equipment
US20230229809A1 (en) * 2019-10-31 2023-07-20 Blackberry Limited Stored image privacy violation detection method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9161187B2 (en) * 2012-05-21 2015-10-13 Alcatel Lucent Caller ID for text messaging

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181579A1 (en) * 2003-03-13 2004-09-16 Oracle Corporation Control unit operations in a real-time collaboration server
US20040240652A1 (en) * 2003-05-26 2004-12-02 Yasushi Kanada Human communication system
US20050108372A1 (en) * 2003-10-29 2005-05-19 Nokia Corporation System, method and computer program product for managing user identities
US20050138110A1 (en) * 2000-11-13 2005-06-23 Redlich Ron M. Data security system and method with multiple independent levels of security
US20060041648A1 (en) * 2001-03-15 2006-02-23 Microsoft Corporation System and method for identifying and establishing preferred modalities or channels for communications based on participants' preferences and contexts
US20060052945A1 (en) * 2004-09-07 2006-03-09 Gene Security Network System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data
US20080082526A1 (en) * 2006-09-28 2008-04-03 Takuya Kanawa Method, apparatus, and computer program product for searching structured document
US20080196098A1 (en) * 2004-12-31 2008-08-14 Cottrell Lance M System For Protecting Identity in a Network Environment
US20090300716A1 (en) * 2008-05-27 2009-12-03 Open Invention Network Llc User agent to exercise privacy control management in a user-centric identity management system
US7784097B1 (en) * 2004-11-24 2010-08-24 The Trustees Of Columbia University In The City Of New York Systems and methods for correlating and distributing intrusion alert information among collaborating computer systems
US7945622B1 (en) * 2008-10-01 2011-05-17 Adobe Systems Incorporated User-aware collaboration playback and recording

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138110A1 (en) * 2000-11-13 2005-06-23 Redlich Ron M. Data security system and method with multiple independent levels of security
US20060041648A1 (en) * 2001-03-15 2006-02-23 Microsoft Corporation System and method for identifying and establishing preferred modalities or channels for communications based on participants' preferences and contexts
US20040181579A1 (en) * 2003-03-13 2004-09-16 Oracle Corporation Control unit operations in a real-time collaboration server
US20040240652A1 (en) * 2003-05-26 2004-12-02 Yasushi Kanada Human communication system
US20050108372A1 (en) * 2003-10-29 2005-05-19 Nokia Corporation System, method and computer program product for managing user identities
US20060052945A1 (en) * 2004-09-07 2006-03-09 Gene Security Network System and method for improving clinical decisions by aggregating, validating and analysing genetic and phenotypic data
US7784097B1 (en) * 2004-11-24 2010-08-24 The Trustees Of Columbia University In The City Of New York Systems and methods for correlating and distributing intrusion alert information among collaborating computer systems
US20080196098A1 (en) * 2004-12-31 2008-08-14 Cottrell Lance M System For Protecting Identity in a Network Environment
US20080082526A1 (en) * 2006-09-28 2008-04-03 Takuya Kanawa Method, apparatus, and computer program product for searching structured document
US20090300716A1 (en) * 2008-05-27 2009-12-03 Open Invention Network Llc User agent to exercise privacy control management in a user-centric identity management system
US7945622B1 (en) * 2008-10-01 2011-05-17 Adobe Systems Incorporated User-aware collaboration playback and recording

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8977767B2 (en) * 2010-10-20 2015-03-10 Qualcomm Incorporated Methods and apparatuses for affecting programming of content for transmission over a multicast network
US20120102120A1 (en) * 2010-10-20 2012-04-26 Qualcomm Incorporated Methods and apparatuses for affecting programming of content for transmission over a multicast network
US20130064142A1 (en) * 2011-09-12 2013-03-14 Plantronics, Inc. Method and Systems For Connection Into Conference Calls
US9673989B2 (en) * 2011-09-12 2017-06-06 Plantronics, Inc. Method and systems for connection into conference calls
US20130174211A1 (en) * 2011-12-30 2013-07-04 Nokia Corporation Method And Apparatus Providing Privacy Setting And Monitoring User Interface
US8646032B2 (en) * 2011-12-30 2014-02-04 Nokia Corporation Method and apparatus providing privacy setting and monitoring user interface
US9081986B2 (en) 2012-05-07 2015-07-14 Nokia Technologies Oy Method and apparatus for user information exchange
US9401886B2 (en) 2012-05-30 2016-07-26 International Business Machines Corporation Preventing personal information from being posted to an internet
US9277364B2 (en) 2012-06-25 2016-03-01 Nokia Technologies Oy Methods and apparatus for reporting location privacy
US9130920B2 (en) * 2013-01-07 2015-09-08 Zettaset, Inc. Monitoring of authorization-exceeding activity in distributed networks
US11861154B2 (en) 2013-01-14 2024-01-02 Nant Holdings Ip, Llc Shared real-time content editing activated by an image
US10891039B2 (en) 2013-01-14 2021-01-12 Nant Holdings Ip, Llc Shared real-time content editing activated by an image
US9678617B2 (en) 2013-01-14 2017-06-13 Patrick Soon-Shiong Shared real-time content editing activated by an image
US11543953B2 (en) 2013-01-14 2023-01-03 Nant Holdings Ip, Llc Shared real-time content editing activated by an image
US9857964B2 (en) 2013-01-14 2018-01-02 Patrick Soon-Shiong Shared real-time content editing activated by an image
US11237715B2 (en) 2013-01-14 2022-02-01 Nant Holdings Ip, Llc Shared real-time content editing activated by an image
CN104813625A (en) * 2013-07-26 2015-07-29 华为终端有限公司 Synchronization signal bearing method and user equipment
US10548105B2 (en) 2013-07-26 2020-01-28 Huawei Device Co., Ltd. Synchronization signal carrying method and user equipment
US10986598B2 (en) 2013-07-26 2021-04-20 Huawei Device Co., Ltd. Synchronization signal carrying method and user equipment
US20150067881A1 (en) * 2013-09-03 2015-03-05 Kabel Deutschland Vertrieb Und Service Gmbh Method and system for providing anonymized data from a database
US9971898B2 (en) * 2013-09-03 2018-05-15 Kabel Deutschland Vertrieb Und Service Gmbh Method and system for providing anonymized data from a database
US9740876B1 (en) * 2015-09-15 2017-08-22 Symantec Corporation Securely storing and provisioning security telemetry of multiple organizations for cloud based analytics
US20180027019A1 (en) * 2016-07-20 2018-01-25 International Business Machines Corporation Privacy-preserving user-experience monitoring
US11316896B2 (en) * 2016-07-20 2022-04-26 International Business Machines Corporation Privacy-preserving user-experience monitoring
US20180253560A1 (en) * 2017-03-02 2018-09-06 International Business Machines Corporation Presenting a data instance based on presentation rules
US10552500B2 (en) * 2017-03-02 2020-02-04 International Business Machines Corporation Presenting a data instance based on presentation rules
US10380998B2 (en) 2017-09-15 2019-08-13 Endgame, Inc. Voice and textual interface for closed-domain environment
WO2019055573A1 (en) * 2017-09-15 2019-03-21 Endgame, Inc. Improved voice and textual interface for closed-domain environment
US10769306B2 (en) * 2017-09-21 2020-09-08 International Business Machines Corporation Applying a differential privacy operation on a cluster of data
US20190087604A1 (en) * 2017-09-21 2019-03-21 International Business Machines Corporation Applying a differential privacy operation on a cluster of data
US20230229809A1 (en) * 2019-10-31 2023-07-20 Blackberry Limited Stored image privacy violation detection method and system

Also Published As

Publication number Publication date
US8959097B2 (en) 2015-02-17

Similar Documents

Publication Publication Date Title
US8959097B2 (en) Privacy-preserving method for skimming of data from a collaborative infrastructure
US11308435B2 (en) Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques
US11902234B2 (en) Prohibited content propagation using a social network data structure
US10212201B2 (en) Facilitating user communication about a common topic
US9954965B2 (en) Method, system and computer program product for tagging content on uncontrolled web application
US20190050597A1 (en) Data processing systems for webform crawling to map processing activities and related methods
US20190050596A1 (en) Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques
US9563878B2 (en) System and method for intelligent data mapping, including discovery, identification, correlation and exhibit of CRM related communication data
US8819009B2 (en) Automatic social graph calculation
US20190073640A1 (en) Automatically detecting and storing digital data items associated with digital calendar items
US9058590B2 (en) Content upload safety tool
US7870194B2 (en) Sharing calendar information
US20170132200A1 (en) Method, System, and Medium for Workflow Management of Document Processing
US20100057682A1 (en) Information Feeds of a Social Network
US9477574B2 (en) Collection of intranet activity data
US20190325064A1 (en) Contextual aggregation of communications within an applicant tracking system
US10846351B1 (en) Delaying social networking submissions for a minor's account based on submission content exceeding a reputation based threshold
US11328254B2 (en) Automatic group creation based on organization hierarchy
US11082811B2 (en) Identifying decisions and rendering decision records in a group-based communication interface
US20190147404A1 (en) Email streaming records
US9002950B2 (en) Method and system to file relayed e-mails
US20170169375A1 (en) Systems and methods for managing resource reservations
US11271914B2 (en) External contact connections to authenticate and contextualize entity
US11916865B1 (en) Methods and systems for organizing and managing electronic messages
US20230283620A1 (en) Systems and methods for monitoring anomalous messages based on relevance to a process

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DANIS, CATALINA M.;ERICKSON, THOMAS D.;HELANDER, MARY E.;AND OTHERS;SIGNING DATES FROM 20100219 TO 20100224;REEL/FRAME:024139/0068

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20190217