|Publication number||US20030028796 A1|
|Application number||US 10/208,189|
|Publication date||6 Feb 2003|
|Filing date||31 Jul 2002|
|Priority date||31 Jul 2001|
|Also published as||EP1421521A2, US8468357, US20100158488, US20100161656, WO2003012695A2, WO2003012695A3|
|Publication number||10208189, 208189, US 2003/0028796 A1, US 2003/028796 A1, US 20030028796 A1, US 20030028796A1, US 2003028796 A1, US 2003028796A1, US-A1-20030028796, US-A1-2003028796, US2003/0028796A1, US2003/028796A1, US20030028796 A1, US20030028796A1, US2003028796 A1, US2003028796A1|
|Inventors||Dale Roberts, David Hyman, Stephen White|
|Original Assignee||Gracenote, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (5), Referenced by (150), Classifications (19), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
 This application is related to and claims priority to U.S. provisional application entitled DIGITAL MUSIC MULTIPLE STEP IDENTIFICATION METHOD AND SYSTEM having serial No. 60/308,594, by Dale T. Roberts, et al., filed Jul. 31, 2001, and incorporated by reference herein.
 1. Field of the Invention
 The present invention is directed to recognition of recordings from their content, and, more particularly to combining fingerprint recognition with other information about a recording to increase reliability of recognition and to accomplish reliable recognition efficiently by using the least expensive forms of recognition first and layering on more complex forms as needed.
 2. Description of the Related Art
 There are many uses for recognition of audio (and video) recordings. Many of the uses relate to compensation or control by the rights holders for reproduction and performance of the works recorded. This use of such systems has increased in importance since the development of file sharing software, such as Napster, and the many other similar services available at the end of the twentieth century and the beginning of the twenty first century. Although the need for accurate recognition has been significant for several years, no system has been successful in meeting this need.
 Another use of recording recognition is to provide added value to users when listening (or watching) recordings. One example is the CDDB Music Recognition Service from Gracenote, Inc. of Berkeley, Calif. which recognizes compact discs (CDs) and supplies information regarding a recognized CD, such as album name, artist, track names and access to related content on the Internet, including album covers, artist and fan websites, etc. While the CDDB service is effective for recognizing compact discs, there are several draw backs in using it to recognize files that are not stored on a removable disc, such as CD or DVD.
 All audio fingerprinting techniques have “blind spots”, places where a system using that technique sees similarities and differences in audio where it shouldn't. By relying on just one fingerprinting technique, single source solutions are less accurate when encountering a ‘blind spot’.
 One of the more popular uses for the Gracenote CDDB system is in applications that digitally encode audio files into MP3 and other formats. These encoding applications utilize Gracenote's CDDB service to recognize the compact disc being encoded and to write the correct metadata into the title and ID tags. Gracenote's CDDB service returns a unique ID (TUID) for each track and supports the insertion of such IDs in the ID3V2 tags for MP3 files. The TUID is both hashed and proprietary, and can only be read by the Gracenote system. However, the ID3V2 tags can easily be manipulated to store a TUID for one file in the ID3V2 tag for another file and therefore, the TUID alone is not a reliable identifier of the audio content in a file.
 Gracenote's CDDB service also provides text matching capability that can be utilized to identify digital audio files from their file names, file paths, ID tags (titles), etc. by matching the text extracted by a client device to a metadata database of track, artist, and album names. Although this text matching utilizes user-generated spelling variants associated with each record to improve recognition, there has been no way to verify that the text matches the audio content of the recording once the recording has been separated from a compact disc and stored in a file in any format.
 An aspect of the present invention is maximizing identification of recordings while minimizing resource usage.
 Another aspect of the present invention is using multiple identification methods so that resource intensive methods, such as audio fingerprinting, are employed only when necessary.
 A further aspect of the invention is minimization of processing of unidentified data.
 Yet another aspect of the present invention is to use the least expensive recognition technique, with progressively more expensive recognition techniques layered onto the process until a desired confidence level is reached.
 A still further aspect of the invention is validation of content-based identification of a recording by comparing text associated with an unidentified recording and text associated with identification records.
 Yet another aspect of the present invention is use of recording identification methods from different sources to increase reliability.
 A still further aspect of the invention is validation of content-based recording identification using fuzzy track length analysis.
 Yet another aspect of the invention is automatic extraction of identification data for use in a reference database and for identification of recordings.
 A still further aspect of the invention is that unidentified recordings are periodically re-run through the system to determine if recently added data or recently improved techniques will result in recognition.
 The above aspects can be attained by a method of identifying recordings by extracting information about an unknown recording stored in media possessed by a user and at least one algorithmically determined fingerprint from at least one portion of the unknown recording; determining a possible identification of the unknown recording using at least one piece of the information extracted from the unknown recording and an identification database of corresponding information for reference recordings; and identifying the unknown recording when the possible identification based on each of the at least one piece of the information in combination with the at least one algorithmically determined fingerprint identifies a single reference recording with respective confidence levels. The at least one portion of the unknown recording may contain audio, video or both.
 Preferably, the database is maintained by a provider of identification services which supplies unique identifiers that can be recognized only by servers under the control of the provider of identification services. The unique identifiers are associated with recordings once they have been identified. Subsequently, copies of the recordings are recognized using the unique identifiers to greatly speed up the process. The unique identifiers optionally are cached in high-speed RAM or specially indexed database tables.
 When non-waveform data is not available for an unknown recording, the unknown recording is preferably identified by extracting fingerprints from at least one portion of the unknown recording using a plurality of algorithms; determining a possible identification of the unknown recording using at least two of the fingerprints extracted from the unknown recording and at least one database of correspondingly generated fingerprints for reference recordings; and identifying the unknown recording when the possible identification based on each of the fingerprints identifies a single reference recording with respective confidence levels.
 Preferably, an existing database, used to identify recordings possessed by users, which does not contain fingerprint information is expanded by obtaining non-waveform data associated with a recording possessed by a user of the database; extracting at least one fingerprint from at least one portion of the recording; and storing the at least one fingerprint as identifying information for the recording, when a match is found in the database for the non-waveform data. One example is that during the process of encoding digital music files from an audio CD possessed by a user, a recognition system can be used to identify the audio CD so that fingerprints extracted during the encoding process can be directly associated with the audio CD using a unique ID system.
 Recognition of recordings using either fingerprints or unique identifiers is preferably validated by other information maintained in the identification database, such as the length of the recording or a numeric identifier embedded within the recording. Information about recordings that do not pass validation or match some, but not all of the information used for identification, may be stored for later analysis of the reason for the error. If the fingerprints are obtained as described above, there may have been an error in obtaining the fingerprint. Therefore, errors may be output to an operator, or the system could correct the information stored in the database, based on recognition of patterns in the information that is stored for improper matches. For example, if a large percentage of matching fingerprints are stored, but the other information consistently does not match them, there could be an error in the fingerprint database which needs to be flagged to an operator.
 The present invention includes a system for identifying recordings that includes an extraction unit to extract information about an unknown recording stored in media possessed by a user and at least one algorithmically determined fingerprint from at least one portion of the unknown recording; and an identification unit, coupled to the extraction unit, to make a possible identification of the unknown recording using at least one piece of the information extracted from the unknown recording and an identification database of corresponding information for reference recordings, and to identify the unknown recording when the possible identification based on each of the at least one piece of the information in combination with the at least one algorithmically determined fingerprint identifies a single reference recording with respective confidence levels.
 The present invention also includes a system for identifying recordings that includes an extraction unit to extract fingerprints from at least one portion of an unknown recording using a plurality of algorithms, and an identification unit, coupled to said extraction unit, to make a possible identification of the unknown recording using at least two of the fingerprints extracted from the unknown recording and at least one database of correspondingly generated fingerprints for reference recordings, and to identify the unknown recording when the possible identification based on each of the fingerprints identifies a single reference recording with respective confidence levels.
 In either of the systems described above, the extraction unit is typically a client unit connected by a network, such as the Internet, to at least one server as the identification unit. The client device may be a personal computer with a drive accessing the recording, a consumer electronics device with a network connection, or a server computer transmitting the unknown recording from one location to another. Furthermore, a portion of the database may be available locally and the extraction unit and identification unit may reside in the same device and share components.
 The present invention also includes a system for obtaining reference information stored in a database used to identify unknown recordings, including a receiving unit to obtain non-waveform data associated with a recording possessed by a user of the database for identification of recordings possessed by the user; an extraction unit to extract at least one fingerprint from at least one portion of the recording; and a storage unit, coupled to said receiving unit and said extraction unit, to store the at least one fingerprint as identifying information for the recording, when a match is found in the database for the non-waveform data.
 These together with other aspects and advantages which will be subsequently apparent, reside in the details of construction and operation as more fully hereinafter described and claimed, reference being had to the accompanying drawings forming a part hereof, wherein like numerals refer to like parts throughout.
FIG. 1 is a functional block diagram of a system according to the present invention.
FIG. 2 is flowchart of a fingerprint extraction according to the present invention.
FIG. 3 is a flowchart of a method of recognizing unknown recordings.
 FIGS. 4A-4C are a block diagram of a system according to the present invention.
 According to the present invention, a suite of identification components are provided in a system like that illustrated in FIG. 1 to facilitate analysis and identification of audio (and video) files utilizing multiple methods. Preferably, an existing database 90 containing recording identifiers and text data is combined with text-based digital audio and audio fingerprinting identification methods. Preferably, the text data in database 90 is obtained from user submissions and includes user-submitted spelling variants. One such database is available as the CDDB Music Recognition Service from Gracenote, Inc..
 As illustrated in FIG. 1, a recording 100 is accessed by client device 110 via any conventional method, such as reading a digital audio file from a hard drive or a compact disc. Information is extracted from recording 100 and associated information (metadata). Fingerprints are extracted from recording 100, as described in more detail below. The information that is extracted from the metadata includes the duration of the recording which is the track length (from the TOC) for a CD track, the filename and ID3 tag if the recording is in an MP3 file, and the table of contents (TOC) data if the recording is on a CD. If the file containing the recording was produced by a client device operating according to the invention, a unique ID will be extracted from the ID3 file, but initially it will be assumed that information is not available.
 In an exemplary embodiment, the extracted information is sent from client 110 to server 120 to determine a possible identification of the unknown recording using at least one piece of the information extracted from recording 100 and a database 130 of correspondingly generated fingerprints for reference recordings. If text or a unique ID were extracted, an attempt is made to find a match. If a match is found using the text or unique ID, at least one algorithmically determined fingerprint is compared with the fingerprint(s) stored in the matching records to determine whether there is a single reference recording that matches the information extracted from recording 100 with respective confidence levels for each item of information that matches. If no matches can be found based on text and unique ID, an attempt is made to identify the a single reference recording using at least two of the fingerprints extracted from recording 100. If a single reference recording is located using either method, preferably the duration of recording 100 is compared with the duration of the single reference recording as a final validation step.
 Preferably related metadata is used for validation of the match obtained by fingerprint recognition. Like any recognition system fingerprinting can produce erroneous results. Without a validation component such an error can propagate throughout the system and return erroneous data to large percentages of users. The use of validation criteria such as track length comparison enables the system to catch potential errors and flag them for validation.
 A system according to the present invention preferably includes custom result reporting and flexible administrative interfaces 130 to enable weighting of various identification methods and the order of their engagement. Analysis of successful match rates for specific identification methods allows an administrator to manipulate the identifying criteria for each component to maximize the identification probability. A system according to the present invention preferably incorporates usage data from over 28 million users utilizing the CDDB database via Gracenote Data Services division, to help guide results 140.
 The flexibility of a system according to the present invention allows different configurations to be used for identifying recordings in different environments. An application that monitors streaming audio, for example, requires a very different system and solution architecture than one that identifies files in a peer-to-peer system, or one that identifies analog input. However the present invention can be configured for identification of recordings in each of these situations.
 A system according to the present invention maximizes identification while minimizing resource usage. The use of multiple identification methods ensures that more resource intensive methods, such as audio fingerprinting are employed only when necessary. The use of multiple audio fingerprinting technologies reduces data collision and covers any “blind spots” in a given audio fingerprint technology. The “blind spots” found in single source fingerprinting systems, are avoided by using multiple sources for different fingerprinting techniques. This also provides the ability to fine tune deployment for specific target applications.
 Preferably, fingerprints are obtained using multiple fingerprint recognition services using the method illustrated in FIG. 2. This increases the ability of the system to accurately recognize recordings of various types.
 As illustrated in FIG. 2, when unidentified (unknown) recording 100 is accessed by fingerprint extraction client 110, if possible conventional TOC/file recognition is performed by recognition system 210 and results 220 are returned to fingerprint client 110. Results 220 include a unique identifier (TUID) that points into a master metadata database (not shown in FIG. 2), if the TUID is found. Recording 100 is also processed by fingerprint extractor 230 using at least one and preferably several different algorithmically derived fingerprint extraction systems to obtain fingerprint(s) which are stored in fingerprint/ID send cache 240. As described below in more detail, instructions are received regarding when fingerprint uploader 250 should send the fingerprints to fingerprint recognition server 120.
 In fingerprint recognition server 120, the fingerprints transmitted by fingerprint uploader 250 are initially stored in fingerprint receive cache 260. The fingerprints then undergo fingerprint validation 270 using an algorithmic comparator that attempts to cross-correlate fingerprints for a recording with fingerprints uploaded and extracted by different end users. If it is found that the fingerprints are substantially similar, they would be validated. This is not the only method that's available for validation, but serves as one example of a process that could be used to reject bad data.
 In this embodiment, fingerprints that are determined to be valid and related undergo stitching 280. For example, if fingerprints are taken from 30 second segments of the recording, the fingerprints are assembled into a continuous fingerprint stream. This could simplify recognition of segments of the recording. The resulting fingerprints are stored in fingerprint database 290 associated with existing database 90 (FIG. 1).
 The CDDB database has in part been generated through user submissions to create a metadata database with over 12 million tracks and 900,000 albums as of mid-2002. This database contains both basic metadata (artist, album, and track names) as well as extended data (genre, label, etc.).
 A similar distributed collection method may be utilized in the creation of a waveform database using the system illustrated in FIG. 1. In the case where recording 100 is a raw audio waveform, e.g., when a CD is encoded into another format, such as an MP3 file, client device 110 obtains non-waveform data associated with recording 100 which is possessed by a user of database 90 and executes extraction algorithm(s) to extract fingerprints from at least one portion of the recording. The fingerprints are then sent to server 120 with a unique ID, preferably derived from the TOC of the CD. When the unique ID is available, i.e., , when a match is found in the database for the non-waveform data, server 120 is able to associate the appropriate metadata in database 90 and the fingerprint(s) with same level of accuracy as identification of CDs by the existing database 90 which is provided for identification of recordings possessed by users. Fingerprints dynamically gathered in this manner may be sent to a fingerprint collection server (not shown in FIG. 1) which would accumulate fingerprints from authenticated clients, as described in more detail below, prior to storing the at least one fingerprint as identifying information for the recording.
 Multiple fingerprint gathering extractors can also be run over a set of static waveforms from a commercial encoder such as Loudeye or Muse. The challenge with this approach is associating the fingerprints with the appropriate metadata. The method described above enables audio fingerprints to be logically associated with parent records and associated back to the original audio source. In the preferred embodiment, the unique ID provides differentiation between live and studio versions of the same song while simultaneously linking those records to the same artist and their respective albums.
 Preferably server(s) 120 store information in a parallel record set that are linked with unique IDs. When client 110 asks server 120 to recognize media (CD, digital audio file, video file) server 120 may also return a record about how fingerprints should be gathered for this particular CD. This is called the Gathering Instructions Record (GIR). The GIR may include a set of instructions that the remote fingerprint gathering code follows. The record may be pre-computed in off hours or may be dynamically computed at the time of recognition.
 Server 120 may use information it knows about the popularity of a CD to drive decisions about gathering. Everything about a rare CD could be gathered, because the opportunity to get the fingerprints would not want to be missed (even if it was somewhat burdensome to the user). The opposite situation could be true for a very popular CD. The load may be distributed across many users so that they would not even notice that any work for fingerprint gathering was occurring.
 The rules and procedures for building the GIR may be manual, automated and may change over time. They may also be applied uniquely to specific users, applications or geographic locations.
 In one embodiment, the server dynamically gathers fingerprints by modifying the GIR to remove fingerprints that have been gathered previously. The frequency of updating GIRs may vary from instant to delays of days, weeks or months. Some example instructions that may be included in the GIR are:
 A list of track and segments to be gathered and their priority.
 A fingerprint generator algorithm to use.
 Parameters that tell the fingerprint generator how to process the fingerprint, such as:
 Frequency of audio samples
 Bands of the frequency domain to process
 Resolution of the fingerprint
 Desired Quality of Audio
 When to do the fingerprint gathering, such as
 Before encoding the track
 After encoding the track
 In parallel with encoding the track
 Instructions for caching the fingerprint and when to transmit it back to the server, such as
 Before encoding the track
 After encoding the track
 After the CD has been fully encoded
 When the communication channel back to the server is not busy
 When the next CD is looked up
 When a group of fingerprints is ready for transmission
 Instructions to take CPU power into the process so as to not overload the computer
 Preferably, the system attempts to improve the quality of the fingerprints during operation. Quality of the source signal, the parameters used for fingerprinting, along with improvements in the fingerprinting algorithms will result in a complex quality matrix that is used by server 120 to determine what fingerprints to gather if higher quality is available. An example of source quality is provided below: Preferably, database 90 or a similar database maintained by fingerprint collection server(s) stores the source quality for fingerprints stored in the database, so that when a fingerprint from higher quality source is available, the fingerprint may be replaced.
Source Quality Table Name Bit Rate Compression Error Correction Quality Index CD_Audio_HEC 44100 kbps None Hardware 1 CD_Audio_SEC 44100 kbps None Software 2 CD_Audio 44100 kbps None None 3 CDR_Audio 44100 kbps None None 4 CDR_Made_From_MP3 44100 kbs mp3 None 5 MP3_File 160 kbps mp3 None 6
 Fingerprints dynamically gathered may contain information that helps validate quality. Information such as errors while reading from the media may be sent up to the fingerprint collector. The system may reject fingerprints that had high error rates from the source media.
 As noted above, instead of immediately storing a fingerprint, multiple fingerprints for a recording may be gathered in by a fingerprint collection server prior to being added to the database. These fingerprints may be compared algorithmically to determine their correlation. If correlation is not adequate then additional fingerprints may be gathered until adequate correlation is achieved and one of the fingerprints or a composite fingerprint is stored in the database. This prevents bad fingerprints from becoming part of the database.
 Stitching of the segmented fingerprints may be necessary since slight variations in timing could result in overlap of the fingerprints. Algorithmic stitching could result in a higher quality continuous fingerprint. Simple stitching appends segmented fingerprints in order of appearance in the recording. Complex stitching could involve scaling different qualities of fingerprints to the lowest common denominator and then appending them in order of their appearance in the recording. Preferably some form of mathematical fitting is utilized if the fingerprint segmentation contains jitter, so that appending is a fuzzy process rather simple addition of the datastream.
 One example of audio fingerprinting that can be used is described in the U.S. patent application entitled Automatic Identification of Sound Recordings, filed by Maxwell Wells et al. on Jul. 22, 2002 and incorporated herein by reference. However, any known algorithmically derived fingerprinting technique may be used, not only for digital audio, but also video, TV programs (both analog and digital) and DVDs. Appropriate identifiers and recognition techniques will be used for the media to be recognized in a particular application.
 The present invention provides great flexibility and can be utilized for a wide variety of environments, including MP3 recognition in a peer-to-peer environment, or identification of an audio stream for monitoring and reporting purposes. No other solution is known to use multiple recognition components; so it is the only solution that can be customized to meet the needs of any audio (or video) recognition application.
 A functional description for a deployment of the present invention in a peer-to-peer application will be described below with reference to FIG. 3. In this embodiment, audio files are identified before providing public access to them, to determine if the files are allowed in the system, a process known as “filter-in”.
 Client device 110 (FIG. 1) extracts information 310 (FIG. 3) from an audio file at the time of upload to server 120 (FIG. 1). The extracted information preferably includes non-waveform data, such as a unique ID, ID3 tag, filename text data, track duration, etc. and fingerprint(s) extracted from the recording and sent to server 120 for recognition.
 The initial match 320 is performed against the unique ID, if present. Use of Gracenote's TUID enables a match to be returned with 99.9% accuracy. This is also the least resource intensive recognition method and can achieve very fast recognition rates. If the unique ID is present the system moves to the validation stage. If no unique ID is present the system attempts identification using the next recognition methods 330.
 In this embodiment, text-based identification is tried next, using a metadata database, such as the Gracenote CDDB service which contains over 900,000 albums and over 12 million songs. Text matching utilizes available text, such as the filename, file path or text within the ID3 tag for MP3 files, to provide a set of data from which to attempt recognition. If an acceptable match is returned, the system moves to the validation stage. If a successful match is not returned, the system attempts identification utilizing the next recognition method.
 The next step is fingerprint identification, in this case using audio fingerprints. The fingerprints from an unknown recording are compared to the fingerprints in database 90 for reference recordings, one fingerprint at a time (or in parallel using different processors for different fingerprints). Each fingerprinting technology returns a match and a level of confidence. If a single reference recording has acceptable confidence levels the system moves to the validation stage. If an unsuccessful match is returned the system can, depending on the target application, ask the user for validation of the most likely result or it can return a “no match found” result.
 Validation is a key component to any successful recognition system. Preferably, key file attributes such as the duration of the recoding, are used to validate that a file is what the recognition system says it is by comparing an extracted length of the unknown recording with a stored length of the single reference recording.
 Preferably heuristic and voting algorithms 340 are used to determine if a match is what the system says it is. This self-monitoring reduces the possibility that the system returns inaccurate data that pollutes the system. The heuristics may be manually controlled or algorithmically controlled to produce the best match. These heuristics may also be used to determine which recognition techniques to apply and in what sequence.
 The administrator of each application can determine the level of accuracy needed by each stage (or component) of the system, and therefore has explicit control in optimizing the system. For example, if a 90% aggregate match is required the system administrator can use administrative interfaces 130 to adjust the levels of acceptable return to 90% and a successful result will not be generated unless that threshold is met. The administrator can also set result levels for each component. For example, a 99% text match can be required but only an 85% audio fingerprint match.
 Once a successful identification is returned the file will be retagged 350 with the unique ID allowing for population of the file with the correct ID throughout the system. As a result, future identification of the file will require the least resource intensive recognition method.
 The unique ID (TUID) assigned to the file is then matched 360 against a list 370 of TUIDs populated through the submission of Title/Artist pairs 370 by labels, publishers, and content owners of those files allowed in the system. In one embodiment, if the TUID is present in the database, the file is allowed to be shared, but if the TUID is not present in the database, the file is blocked. In another embodiment, if the TUID is present in the database, the file is blocked. Either of these embodiments could be applied to files recognized as they are accessed by a user, or transmitted from one computer to another.
 As illustrated in FIG. 4A, an embodiment of the present invention uses a plurality of related databases. Master metadata database 410 contains information on title, artist/author name, owner name and date. Related databases include audio fingerprint database 430 and video fingerprint database 440 which form fingerprint database 290 (FIG. 2). Also included are track length/TOC database 450, text database 460, and hash ID database 470 and guaranteed unique ID database 480.
 As illustrated in FIG. 4B, when unidentified (unknown) recording 100 is accessed by client device 110, information is extracted, including fingerprints 540, 550, metadata 560 and unique ID 570, if present. In addition, the duration 580 of the recording is determined and a numerical hash 590 is calculated. The extracted fingerprints are compared with fingerprints 600, 610. Similarly, matching 620, 630, 640 is performed on the numerical hash, text and unique ID. If a reference recording is located, validation is performed by comparing the duration of unidentified recording 100 with the duration of the reference recording. Results 660-710 with a level of confidence for each method of comparison is supplied to result aggregator 730.
 If no reference recording is found 750 matching unidentified recording 100, the extracted information 540-590 and results are stored in unrecognized holding bin 760 for periodic resubmission to recognition server 120 (FIGS. 2 & 4B). In this embodiment, if a reference recording is located 770 with a low aggregate confidence level, post recognition processing 780 is performed by applying heuristics 790, or a manual review 810, e.g., by presenting one or more possible matches to the user and receiving the user's selection in response. The results of such user selections may be included in the heuristics stored in heuristics database 820. If post recognition processing 780 results in identification of a single reference recording or result aggregator 730 outputs recognized results 770 with a high aggregate confidence level, the hash ID is generated 810 and sent to hash database 480 and client device 110, so that the hash and unique ID (TUID) can be stored in the ID3 tag, if a file is being created.
 In one embodiment, the system learns by watching errors in repeated attempts at recognition of similar files to improve its results. It also may receive manual stimulus from users who indicate that there are errors in the results. This allows recognition to be continuously validated over time. For example a file could be recognized by a system according to the invention, then over time the system determines that recognition of that file was flawed, and indicates to an operator that there was something wrong. In another embodiment, the system determines what is wrong by monitoring non-fingerprint based data and changing the recognition results accordingly.
 The present invention can be utilized to identify any audio content for tracking purposes. Digital audio streams, analog inputs or local audio files, can all be tracked. Such a tracking system could be a server side tracking system deployed at the point of audio delivery and integrated with a reporting, digital rights management (DRM) system, or rights payment system. If the audio content being tracked was from a non-participating third party a client version of the system may be deployed to monitor the content being distributed. In either case, multiple identification methods would be utilized to ensure the highest rate of accuracy.
 Utilizing waveform recognition as a digital rights management component is possible, and can be deployed to compare user created digital audio files with lists of approved content. This enables a filter-in approach within a peer-to-peer file sharing architecture such as the one described above.
 Audio fingerprinting technologies can be used as an anti-piracy tool, and can be customized to the type of audio being investigated. In the case of pirated CDs, the Gracenote's CDDB CD service may be utilized to provide table of content (TOC) recognition to augment audio fingerprinting technologies.
 Identification is the enabling component to deliver value-added services. Without explicit knowledge of the content being distributed it is impossible to distribute value-added content and services that relates to that audio content.
 The many features and advantages of the invention are apparent from the detailed specification and, thus, it is intended by the appended claims to cover all such features and advantages of the invention that fall within the true spirit and scope of the invention. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation illustrated and described, and accordingly all suitable modifications and equivalents may be resorted to, falling within the scope of the invention. For example the system and method have been described as using a unique identifier. However, a hashed identifier could be used instead.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US2151733||4 May 1936||28 Mar 1939||American Box Board Co||Container|
|CH283612A *||Title not available|
|FR1392029A *||Title not available|
|FR2166276A1 *||Title not available|
|GB533718A||Title not available|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US6973451||10 Oct 2003||6 Dec 2005||Sony Corporation||Medium content identification|
|US6993532||30 May 2001||31 Jan 2006||Microsoft Corporation||Auto playlist generator|
|US7024424||11 Mar 2005||4 Apr 2006||Microsoft Corporation||Auto playlist generator|
|US7082394||25 Jun 2002||25 Jul 2006||Microsoft Corporation||Noise-robust feature extraction using multi-layer principal component analysis|
|US7248715||20 Sep 2001||24 Jul 2007||Digimarc Corporation||Digitally watermarking physical media|
|US7269596 *||17 Oct 2003||11 Sep 2007||Sony United Kingdom Limited||Audio and/or video generation apparatus|
|US7277766||24 Oct 2000||2 Oct 2007||Moodlogic, Inc.||Method and system for analyzing digital audio files|
|US7296031||11 Mar 2005||13 Nov 2007||Microsoft Corporation||Auto playlist generator|
|US7313571||31 Oct 2005||25 Dec 2007||Microsoft Corporation||Auto playlist generator|
|US7313591||18 Jul 2003||25 Dec 2007||Microsoft Corporation||Methods, computer readable mediums and systems for requesting, retrieving and delivering metadata pages|
|US7334023 *||26 Mar 2003||19 Feb 2008||Kabushiki Kaisha Toshiba||Data transfer scheme for reducing network load using general purpose browser on client side|
|US7359900 *||29 Jul 2003||15 Apr 2008||All Media Guide, Llc||Digital audio track set recognition system|
|US7428572||8 Sep 2005||23 Sep 2008||Microsoft Corporation||Transferring metadata to a client|
|US7440975||21 Dec 2005||21 Oct 2008||Musicgiants, Inc.||Unified media collection system|
|US7451078||30 Dec 2004||11 Nov 2008||All Media Guide, Llc||Methods and apparatus for identifying media objects|
|US7477739||21 Jan 2003||13 Jan 2009||Gracenote, Inc.||Efficient storage of fingerprints|
|US7526506||23 Sep 2005||28 Apr 2009||Microsoft Corporation||Interlinking sports and television program listing metadata|
|US7548934||30 Mar 2006||16 Jun 2009||Microsoft Corporation||Auto playlist generator|
|US7549052||11 Feb 2002||16 Jun 2009||Gracenote, Inc.||Generating and matching hashes of multimedia content|
|US7549175 *||17 Apr 2007||16 Jun 2009||Sony Corporation||Recording medium, recording method, recording apparatus, reproduction apparatus, data transmission method, and server device|
|US7567899||30 Dec 2004||28 Jul 2009||All Media Guide, Llc||Methods and apparatus for audio recognition|
|US7574451 *||2 Nov 2004||11 Aug 2009||Microsoft Corporation||System and method for speeding up database lookups for multiple synchronized data streams|
|US7644077||21 Oct 2004||5 Jan 2010||Microsoft Corporation||Methods, computer readable mediums and systems for linking related data from at least two data sources based upon a scoring algorithm|
|US7647128||22 Apr 2005||12 Jan 2010||Microsoft Corporation||Methods, computer-readable media, and data structures for building an authoritative database of digital audio identifier elements and identifying media items|
|US7668059||11 Jul 2005||23 Feb 2010||Sony Corporation||Commercial/non-commercial medium test|
|US7672873||10 Sep 2004||2 Mar 2010||Yahoo! Inc.||Music purchasing and playing system and method|
|US7685210||30 Dec 2005||23 Mar 2010||Microsoft Corporation||Media discovery and curation of playlists|
|US7706570||9 Feb 2009||27 Apr 2010||Digimarc Corporation||Encoding and decoding auxiliary signals|
|US7707221||2 Apr 2003||27 Apr 2010||Yahoo! Inc.||Associating and linking compact disc metadata|
|US7711564||27 Jun 2002||4 May 2010||Digimarc Corporation||Connected audio and other media objects|
|US7711838||9 Nov 2000||4 May 2010||Yahoo! Inc.||Internet radio and broadcast method|
|US7720852||22 Jun 2006||18 May 2010||Yahoo! Inc.||Information retrieval engine|
|US7747864||29 Jun 2006||29 Jun 2010||Mircosoft Corporation||DVD identification and managed copy authorization|
|US7761513||18 May 2004||20 Jul 2010||Sony Corporation||Information recording device, information recording method, and information recording program|
|US7788684||8 Oct 2003||31 Aug 2010||Verance Corporation||Media monitoring, management and information system|
|US7849131||12 May 2006||7 Dec 2010||Gracenote, Inc.||Method of enhancing rendering of a content item, client system and server system|
|US7853344||16 Aug 2007||14 Dec 2010||Rovi Technologies Corporation||Method and system for analyzing ditigal audio files|
|US7877408||5 Feb 2008||25 Jan 2011||Rovi Technologies Corporation||Digital audio track set recognition system|
|US7890374||24 Oct 2000||15 Feb 2011||Rovi Technologies Corporation||System and method for presenting music to consumers|
|US7904503||21 Aug 2001||8 Mar 2011||Gracenote, Inc.||Method of enhancing rendering of content item, client system and server system|
|US7921296||7 May 2007||5 Apr 2011||Gracenote, Inc.||Generating and matching hashes of multimedia content|
|US7958485 *||21 Nov 2007||7 Jun 2011||General Electric Company||Methods and systems for managing content dependency deployment|
|US7979464||11 Apr 2007||12 Jul 2011||Motion Picture Laboratories, Inc.||Associating rights to multimedia content|
|US8005258||25 Sep 2009||23 Aug 2011||Verance Corporation||Methods and apparatus for enhancing the robustness of watermark extraction from digital host content|
|US8005724||26 Mar 2003||23 Aug 2011||Yahoo! Inc.||Relationship discovery engine|
|US8069255||18 Jun 2003||29 Nov 2011||AT&T Intellectual Property I, .L.P.||Apparatus and method for aggregating disparate storage on consumer electronics devices|
|US8121843||23 Apr 2007||21 Feb 2012||Digimarc Corporation||Fingerprint methods and systems for media signals|
|US8131708 *||30 Oct 2008||6 Mar 2012||Vobile, Inc.||Methods and systems for monitoring and tracking videos on the internet|
|US8150096||23 Mar 2006||3 Apr 2012||Digimarc Corporation||Video fingerprinting to identify video content|
|US8170273||27 Apr 2010||1 May 2012||Digimarc Corporation||Encoding and decoding auxiliary signals|
|US8259938||19 Jun 2009||4 Sep 2012||Verance Corporation||Efficient and secure forensic marking in compressed|
|US8271333||30 Oct 2001||18 Sep 2012||Yahoo! Inc.||Content-related wallpaper|
|US8280103||19 Nov 2010||2 Oct 2012||Verance Corporation||System reactions to the detection of embedded watermarks in a digital host content|
|US8316238||25 Oct 2006||20 Nov 2012||Verizon Patent And Licensing Inc.||Method and system for providing image processing to track digital information|
|US8340348||25 Dec 2012||Verance Corporation||Methods and apparatus for thwarting watermark detection circumvention|
|US8346567||6 Aug 2012||1 Jan 2013||Verance Corporation||Efficient and secure forensic marking in compressed domain|
|US8352259||20 Jun 2009||8 Jan 2013||Rovi Technologies Corporation||Methods and apparatus for audio recognition|
|US8352331||30 Apr 2001||8 Jan 2013||Yahoo! Inc.||Relationship discovery engine|
|US8451086||30 Jan 2012||28 May 2013||Verance Corporation||Remote control signaling using audio watermarks|
|US8458156 *||18 May 2012||4 Jun 2013||Google Inc.||Learning common spelling errors through content matching|
|US8468357||9 Mar 2010||18 Jun 2013||Gracenote, Inc.||Multiple step identification of recordings|
|US8483423||20 Apr 2006||9 Jul 2013||Sony Pictures Entertainment Inc.||Fingerprinting of data|
|US8490131 *||5 Nov 2009||16 Jul 2013||Sony Corporation||Automatic capture of data for acquisition of metadata|
|US8495075 *||8 Mar 2006||23 Jul 2013||Apple Inc.||Fuzzy string matching of media meta-data|
|US8533481||3 Nov 2011||10 Sep 2013||Verance Corporation||Extraction of embedded watermarks from a host content based on extrapolation techniques|
|US8538066||4 Sep 2012||17 Sep 2013||Verance Corporation||Asymmetric watermark embedding/extraction|
|US8549307||29 Aug 2011||1 Oct 2013||Verance Corporation||Forensic marking using a common customization function|
|US8595315||19 Oct 2011||26 Nov 2013||At&T Intellectual Property I, L.P.||Apparatus and method for aggregating disparate storage on consumer electronics devices|
|US8601504||20 Jun 2003||3 Dec 2013||Verance Corporation||Secure tracking system and method for video program content|
|US8615104||3 Nov 2011||24 Dec 2013||Verance Corporation||Watermark extraction based on tentative watermarks|
|US8615506 *||27 Jan 2012||24 Dec 2013||Vobile, Inc.||Methods and systems for monitoring and tracking videos on the internet|
|US8620967||11 Jun 2009||31 Dec 2013||Rovi Technologies Corporation||Managing metadata for occurrences of a recording|
|US8677400||30 Sep 2009||18 Mar 2014||United Video Properties, Inc.||Systems and methods for identifying audio content using an interactive media guidance application|
|US8681978||17 Dec 2012||25 Mar 2014||Verance Corporation||Efficient and secure forensic marking in compressed domain|
|US8682026||3 Nov 2011||25 Mar 2014||Verance Corporation||Efficient extraction of embedded watermarks in the presence of host content distortions|
|US8689337||27 Feb 2007||1 Apr 2014||Vobile, Inc.||Systems and methods of fingerprinting and identifying video objects|
|US8700641 *||1 Aug 2011||15 Apr 2014||Google Inc.||Detecting repeating content in broadcast media|
|US8726304||13 Sep 2012||13 May 2014||Verance Corporation||Time varying evaluation of multimedia content|
|US8738354||19 Jun 2009||27 May 2014||Microsoft Corporation||Trans-lingual representation of text documents|
|US8745403||23 Nov 2011||3 Jun 2014||Verance Corporation||Enhanced content management based on watermark extraction records|
|US8745404||20 Nov 2012||3 Jun 2014||Verance Corporation||Pre-processed information embedding system|
|US8781967||7 Jul 2006||15 Jul 2014||Verance Corporation||Watermarking in an encrypted domain|
|US8791789||24 May 2013||29 Jul 2014||Verance Corporation||Remote control signaling using audio watermarks|
|US8806517||10 May 2010||12 Aug 2014||Verance Corporation||Media monitoring, management and information system|
|US8811655||4 Sep 2012||19 Aug 2014||Verance Corporation||Circumvention of watermark analysis in a host content|
|US8838977||5 Apr 2011||16 Sep 2014||Verance Corporation||Watermark extraction and content screening in a networked environment|
|US8838978||5 Apr 2011||16 Sep 2014||Verance Corporation||Content access management using extracted watermark information|
|US8869222||13 Sep 2012||21 Oct 2014||Verance Corporation||Second screen content|
|US8886531||13 Jan 2010||11 Nov 2014||Rovi Technologies Corporation||Apparatus and method for generating an audio fingerprint and using a two-stage query|
|US8918382||8 May 2013||23 Dec 2014||Google Inc.||Learning common spelling errors through content matching|
|US8918428||13 Mar 2012||23 Dec 2014||United Video Properties, Inc.||Systems and methods for audio asset storage and management|
|US8923548||3 Nov 2011||30 Dec 2014||Verance Corporation||Extraction of embedded watermarks from a host content using a plurality of tentative watermarks|
|US8977067||1 Apr 2013||10 Mar 2015||Google Inc.||Audio identification using wavelet-based signatures|
|US9009482||26 Sep 2013||14 Apr 2015||Verance Corporation||Forensic marking using a common customization function|
|US9055239||19 Jul 2007||9 Jun 2015||Verance Corporation||Signal continuity assessment using embedded watermarks|
|US9069771 *||8 Dec 2009||30 Jun 2015||Xerox Corporation||Music recognition method and system based on socialized music server|
|US9106964||8 Feb 2013||11 Aug 2015||Verance Corporation||Enhanced content distribution using advertisements|
|US9117270||2 Jun 2014||25 Aug 2015||Verance Corporation||Pre-processed information embedding system|
|US20020023123 *||26 Jul 1999||21 Feb 2002||Justin P. Madison||Geographic data locator|
|US20020028000 *||21 Jun 2001||7 Mar 2002||Conwell William Y.||Content identifiers triggering corresponding responses through collaborative processing|
|US20020146148 *||20 Sep 2001||10 Oct 2002||Levy Kenneth L.||Digitally watermarking physical media|
|US20020157099 *||12 Jul 2001||24 Oct 2002||Schrader Joseph A.||Enhanced television service|
|US20020157101 *||12 Jul 2001||24 Oct 2002||Schrader Joseph A.||System for creating and delivering enhanced television services|
|US20020178410 *||11 Feb 2002||28 Nov 2002||Haitsma Jaap Andre||Generating and matching hashes of multimedia content|
|US20030236661 *||25 Jun 2002||25 Dec 2003||Chris Burges||System and method for noise-robust feature extraction|
|US20040073916 *||8 Oct 2003||15 Apr 2004||Verance Corporation||Media monitoring, management and information system|
|US20040085342 *||17 Oct 2003||6 May 2004||Williams Michael John||Audio and/or video generation apparatus|
|US20040249859 *||15 Mar 2004||9 Dec 2004||Relatable, Llc||System and method for fingerprint based media recognition|
|US20050010671 *||18 Jun 2003||13 Jan 2005||Sbc Knowledge Ventures, L.P.||Apparatus and method for aggregating disparate storage on consumer electronics devices|
|US20050015551 *||18 Jul 2003||20 Jan 2005||Microsoft Corporation||Methods, computer readable mediums and systems for requesting, retrieving and delivering metadata pages|
|US20050027689 *||29 Jul 2003||3 Feb 2005||Aec One Stop Group, Inc.||Digital audio track set recognition system|
|US20050091268 *||3 Dec 2004||28 Apr 2005||Meyer Joel R.||Systems and methods of managing audio and other media|
|US20050187968 *||28 Apr 2005||25 Aug 2005||Dunning Ted E.||File splitting, scalable coding, and asynchronous transmission in streamed data transfer|
|US20050197906 *||10 Sep 2004||8 Sep 2005||Kindig Bradley D.||Music purchasing and playing system and method|
|US20050198061 *||17 Feb 2005||8 Sep 2005||David Robinson||Process and product for selectively processing data accesses|
|US20050229204 *||22 Apr 2003||13 Oct 2005||Koninklijke Philips Electronics N.V.||Signal processing method and arragement|
|US20050249075 *||11 Jul 2005||10 Nov 2005||Laronne Shai A||Commercial/non-commercial medium test|
|US20060020879 *||8 Sep 2005||26 Jan 2006||Microsoft Corporation||Transferring metadata to a client|
|US20060041753 *||11 Aug 2003||23 Feb 2006||Koninklijke Philips Electronics N.V.||Fingerprint extraction|
|US20060075237 *||31 Oct 2003||6 Apr 2006||Koninklijke Philips Electronics N.V.||Fingerprinting multimedia contents|
|US20060090020 *||8 Oct 2004||27 Apr 2006||Time Trax Technologies Corporation||Connector for satellite radio-computer interface|
|US20060106867 *||2 Nov 2004||18 May 2006||Microsoft Corporation||System and method for speeding up database lookups for multiple synchronized data streams|
|US20060136502 *||21 Dec 2005||22 Jun 2006||Musicgiants, Inc.||Unified media collection system|
|US20060149533 *||30 Dec 2004||6 Jul 2006||Aec One Stop Group, Inc.||Methods and Apparatus for Identifying Media Objects|
|US20060149552 *||30 Dec 2004||6 Jul 2006||Aec One Stop Group, Inc.||Methods and Apparatus for Audio Recognition|
|US20060156374 *||14 Feb 2003||13 Jul 2006||Hu Carl C||Automatic synchronization of audio and video based media services of media content|
|US20060177096 *||20 Apr 2006||10 Aug 2006||Sony Pictures Entertainment, Inc.||Fingerprinting of Data|
|US20060229878 *||27 May 2004||12 Oct 2006||Eric Scheirer||Waveform recognition method and apparatus|
|US20060242193 *||22 Jun 2006||26 Oct 2006||Dunning Ted E||Information retrieval engine|
|US20060242198 *||22 Apr 2005||26 Oct 2006||Microsoft Corporation||Methods, computer-readable media, and data structures for building an authoritative database of digital audio identifier elements and identifying media items|
|US20060253207 *||22 Apr 2005||9 Nov 2006||Microsoft Corporation||Methods, computer-readable media, and data structures for building an authoritative database of digital audio identifier elements and identifying media items|
|US20070050409 *||16 Dec 2005||1 Mar 2007||Harris Corporation||System, methods, and program product to trace content genealogy|
|US20070073649 *||18 May 2004||29 Mar 2007||Hiroyuki Kikkoji||Information recording device, information recording method, and information recording program|
|US20070078773 *||31 Aug 2006||5 Apr 2007||Arik Czerniak||Posting digital media|
|US20080201201 *||25 Sep 2007||21 Aug 2008||Sms.Ac||Methods and systems for finding, tagging, rating and suggesting content provided by networked application pods|
|US20090106297 *||13 Mar 2008||23 Apr 2009||David Howell Wright||Methods and apparatus to create a media measurement reference database from a plurality of distributed sources|
|US20090324006 *||31 Dec 2009||Jian Lu||Methods and systems for monitoring and tracking videos on the internet|
|US20110102684 *||5 May 2011||Nobukazu Sugiyama||Automatic capture of data for acquisition of metadata|
|US20120059845 *||1 Aug 2011||8 Mar 2012||Google Inc.||Detecting Repeating Content In Broadcast Media|
|US20120179666 *||12 Jul 2012||Vobile, Inc.||Methods and systems for monitoring and tracking videos on the internet|
|CN1742492B||14 Feb 2003||20 Jul 2011||汤姆森特许公司||Automatic synchronization of audio and video based media services of media content|
|EP1595200A2 *||20 Feb 2004||16 Nov 2005||Sony Electronics Inc.||Medium content identification|
|EP1872199A2 *||16 Mar 2006||2 Jan 2008||Microsoft Corporation|
|EP2074527A2 *||25 Oct 2007||1 Jul 2009||Verizon Business Global LLC||Method and system for providing image processing to track digital information|
|EP2811416A1 *||6 Jun 2013||10 Dec 2014||Vestel Elektronik Sanayi ve Ticaret A.S.||An identification method|
|WO2004017180A2 *||18 Aug 2003||26 Feb 2004||Digital Innovations Llc||System and method for creating an index of audio tracks|
|WO2006112843A1 *||19 Apr 2005||26 Oct 2006||Sean Ward||Distributed acoustic fingerprint based recognition|
|WO2006115617A2||16 Mar 2006||2 Nov 2006||Microsoft Corp|
|WO2010071455A1 *||15 Dec 2009||24 Jun 2010||Muller Montgomerie Media Limited||File transfer method and apparatus|
|WO2011019473A1 *||15 Jul 2010||17 Feb 2011||Rovi Technologies Corporation||Content recognition and synchronization on a television or consumer electronics device|
|U.S. Classification||713/193, 707/E17.028, 707/E17.009|
|International Classification||G10L15/10, G10L11/00, G10L15/00, G06F17/30|
|Cooperative Classification||G06F17/30758, G06F17/30787, G06F17/30817, G06F17/30825, G06F17/30796, G06F17/30743|
|European Classification||G06F17/30V2, G06F17/30V3E, G06F17/30V1T, G06F17/30V1A, G06F17/30U3E, G06F17/30U1|
|13 May 2004||AS||Assignment|
Owner name: GRACENOTE, INC., CALIFORNIA
Free format text: CHANGE OF NAME;ASSIGNOR:CDDB, INC.;REEL/FRAME:015341/0243
Effective date: 20020625
|13 Jun 2008||AS||Assignment|
Owner name: GRACENOTE, INC., CALIFORNIA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ROBERTS, DALE T.;HYMAN, DAVID C.;WHITE, STEPHEN HELLING;REEL/FRAME:021097/0860;SIGNING DATES FROM 20080505 TO 20080605
|19 Mar 2014||AS||Assignment|
Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, IL
Free format text: SECURITY INTEREST;ASSIGNOR:GRACENOTE, INC.;REEL/FRAME:032480/0272
Effective date: 20140314