WO2005091622A1 - Device for capturing audio/video data and metadata - Google Patents

Device for capturing audio/video data and metadata Download PDF

Info

Publication number
WO2005091622A1
WO2005091622A1 PCT/EP2005/050863 EP2005050863W WO2005091622A1 WO 2005091622 A1 WO2005091622 A1 WO 2005091622A1 EP 2005050863 W EP2005050863 W EP 2005050863W WO 2005091622 A1 WO2005091622 A1 WO 2005091622A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
event
video
video data
type
Prior art date
Application number
PCT/EP2005/050863
Other languages
French (fr)
Inventor
Pierrick Jouet
Lionel Oisel
Philippe Schmouker
Philippe Robert
Robert Forthofer
Original Assignee
Thomson Licensing Sa
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing Sa filed Critical Thomson Licensing Sa
Publication of WO2005091622A1 publication Critical patent/WO2005091622A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/268Signal distribution or switching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof

Definitions

  • the invention relates to a device and a method of capturing video data and associated metadata.
  • the invention relates more particularly to the addition of metadata to audio/video data running in real time.
  • the images are indexed automatically according to their content.
  • the images are analysed with respect to certain attributes that can be low level, such as colour, texture, etc, or of semantic type, such as the presence of landscapes, people, etc.
  • attributes that can be low level, such as colour, texture, etc, or of semantic type, such as the presence of landscapes, people, etc.
  • Patent Application US 2002/0001395 published on 3 January 2002 and registered in the name of Digimarc Corporation proposes a device for authenticating metadata and including authentication information in the data.
  • a device for authenticating metadata and including authentication information in the data.
  • Such a device is dedicated to indexing devices and does not apply to real time devices.
  • the added metadata is not specifically representative of actions representative of the video clip.
  • the invention relates more specifically to the insertion of metadata concerning an event which takes place live in the associated video data and also relates to the triggering of actions in a post-production studio according to the metadata associated with the video data.
  • the invention proposes a device for capturing audio/video data representative of an event, said device comprising means of adding metadata to the captured data, characterized in that the means of adding metadata associate a predefined metadata item with a type of event.
  • the invention can therefore make it possible to mark the captured audio/video data according to its content. This means that, if necessary, in a subsequent processing step, the captured data can be manipulated more easily to edit, modify or save it for example.
  • FIG. 1 represents a system comprising a device according to an embodiment of the invention
  • FIG. 2 represents an embodiment of an application of the invention.
  • the device 1 is in the preferred embodiment, a video camera comprising a user interface 3 with three buttons 6, 7 and 8.
  • buttons (or keys) 6, 7 and 8 enable the user to add information to the video that is being filmed in real time by the use of audio/video capture means 2 which are the conventional capture means of a standard video camera.
  • the standard configuration of a video camera is well known to those skilled in the art.
  • the camera comprises an optical system, an image sensor and control means such as a microprocessor, storage means, various means of communication with the external environment.
  • the microprocessor uses a known operating system such as windows CE marketed by Microsoft.
  • the memory can include ROM- (read only memory), RAM- (random access memory) type memories or memory cards in PCMCIA format for example.
  • the camera is distinguished from the conventional cameras known to those skilled in the art by the fact that it also comprises a user interface for adding information directly to the content of the filmed video.
  • the user interface is represented in Figure 1 by three buttons 6, 7 and 8.
  • the buttons 6, 7 and 8 are each associated with different events.
  • Each event is then transcribed into an MXF (material exchange format) type stream to be transferred with the video data to which the event relates, in the MXF stream.
  • MXF material exchange format
  • the MXF standard is described in the SMPTE standard document 380M.
  • the representative events are events that can be contained in the "shot" type field.
  • the "shot start position" field is associated with the time at which the event occurs
  • the "shot track Ids” field is incremented with each noteworthy event, - the "shot description” field comprises the event type, in other words "goal”, “red card”, etc.
  • buttons 3 According to the type of event being filmed, it is also possible to modify the functions associated with these buttons.
  • the user interface 3 is provided with an additional button, a control knob for example, not shown, for indicating the type of event being captured.
  • Each camera can thus be not dedicated to a type of event but configurable according to the event.
  • Button 7 card (red or yellow).
  • Button 8 fight.
  • buttons 6 set.
  • Button 7 point.
  • Button 8 injury.
  • Each button 6, 7 and 8 can also correspond not to a single function but to several functions. It is possible, for example, to press the buttons for a longer or shorter time, or even to press them a number of times to obtain another function.
  • buttons are of course given by way of indication and does not constitute a limitation of the invention. It is, of course, possible to have wider capabilities in user interface terms.
  • the user interface 3 can also be provided with sound pick-up means, independently of the conventional sound pick-up means of a video camera included in the means 2.
  • the cameraman can add representative words for the event, such as "goal”, “red card”, “yellow card”, “substitution”, in the context of a football match.
  • the remote editing device 5 searches for the words associated with the type of event in the MXF stream received and sets up the video editing in the same way as it does when the information is entered via the buttons.
  • buttons and the layout of the buttons must also be designed according to the ergonomics required for the camera. Too many buttons can lead to operating difficulties on the part of the user. The ergonomics of this interface are not the concern of this patent application.
  • the information received from the means 3 and the means 2 is then transmitted to means 4 of creating MXF streams.
  • buttons or the sound pick-up means are then converted into MXF data according to the event indicated by the control knob.
  • buttons 6, 7 and 8 The conversion of the functions associated with the buttons into MXF data is achieved through means available in the camera, such as programmable- type circuits hardwired to the buttons 6, 7 and 8 and/or sound-pick-up means, or even through processors.
  • the MXF data is normally used to transmit information linked to the shot parameters.
  • the added camera parameters are linked to the lens (aperture, depth of field) to which can be added metadata values such as the time code.
  • the stream creation means 4 create the MXF stream.
  • Figure 2 represents an embodiment of the device according to the invention, the event corresponding to a football-match-type event.
  • Two actions are illustrated, one representing a storage action and the other an alarm action.
  • Cameras 10, 11 and 12 film a football match 9.
  • the cameras 10, 11 and 12 are arranged around a football pitch and designed to display all of the pitch area.
  • the cameras are connected to parsers 13, 14, 15, 17, 18 and 19.
  • the parsers 13, 14, 15, 17, 18 and 19 receive the video streams from the cameras and analyse them.
  • the video streams are then transmitted to the wall of video screens 16 which displays the videos from the different cameras on a plurality of screens.
  • the metadata is representative of an important action and representative of a highlight of the event.
  • the reception device will act according to the metadata.
  • the reception devices are also designed to arrange the metadata.
  • the metadata is classified according to its order of importance for the current event. For example, a metadata item carrying goal information is metadata of the highest importance for a football match.
  • the action of the reception device will therefore in this case be to transmit the information coming from the camera displaying this goal and not from the camera showing any player on the pitch or a fight on the terraces.
  • the reception device is provided with selection means enabling it to select the video information.
  • Another action that can be considered in the case of a goal is to retransmit a slow-motion replay of the last few seconds of video representative of the action.
  • the reception device constantly analyses the metadata that it receives from the various capture devices.
  • the reception device is programmed to generate an alarm on reception of a particular metadata item and, in this case, in the case of a football match, a goal.
  • the reception device When a goal is detected, the reception device transmits the video images received from the camera having captured the goal. Naturally, when talking of a goal, this can comprise the action preceding the goal, in other words, a situation that may perhaps lead to a goal. In this case, the cameraman presses the corresponding button and the alarm is triggered on the reception device.

Abstract

The invention relates to a device for capturing audio/video data representative of an event, said device comprising means of adding metadata to the captured data. According to the invention, the means of adding metadata associate a predefined metadata item with a type of event. Application to sport-type events.

Description

Device for capturing audio/video data and metadata
The invention relates to a device and a method of capturing video data and associated metadata.
The invention relates more particularly to the addition of metadata to audio/video data running in real time.
Content indexing has become a necessity in devices supporting large storage capacities. The emergence of digital products and the integration in such products of high capacity storage means, such as hard disks, optical disks, has led to data indexing requirements which enable high speed access to the stored data.
In the known systems, the video clips were indexed manually by keywords, but the accumulation of digital data has made it necessary to develop robust tools for automatically analysing videos by their content, in other words using attributes extracted automatically.
In other systems, the images are indexed automatically according to their content. The images are analysed with respect to certain attributes that can be low level, such as colour, texture, etc, or of semantic type, such as the presence of landscapes, people, etc. Such systems are therefore not always suited to the requirements of the users.
Patent Application US 2002/0001395 published on 3 January 2002 and registered in the name of Digimarc Corporation proposes a device for authenticating metadata and including authentication information in the data. However, such a device is dedicated to indexing devices and does not apply to real time devices. The added metadata is not specifically representative of actions representative of the video clip.
The invention relates more specifically to the insertion of metadata concerning an event which takes place live in the associated video data and also relates to the triggering of actions in a post-production studio according to the metadata associated with the video data.
The invention proposes a device for capturing audio/video data representative of an event, said device comprising means of adding metadata to the captured data, characterized in that the means of adding metadata associate a predefined metadata item with a type of event.
The invention can therefore make it possible to mark the captured audio/video data according to its content. This means that, if necessary, in a subsequent processing step, the captured data can be manipulated more easily to edit, modify or save it for example.
The invention will be better understood and illustrated by means of exemplary and advantageous embodiments, by no means limiting, given with reference to the appended figures in which:
- Figure 1 represents a system comprising a device according to an embodiment of the invention, - Figure 2 represents an embodiment of an application of the invention.
According to Figure 1 , the device 1 is in the preferred embodiment, a video camera comprising a user interface 3 with three buttons 6, 7 and 8.
The buttons (or keys) 6, 7 and 8 enable the user to add information to the video that is being filmed in real time by the use of audio/video capture means 2 which are the conventional capture means of a standard video camera.
The standard configuration of a video camera is well known to those skilled in the art. The camera comprises an optical system, an image sensor and control means such as a microprocessor, storage means, various means of communication with the external environment. The microprocessor uses a known operating system such as windows CE marketed by Microsoft.
The memory can include ROM- (read only memory), RAM- (random access memory) type memories or memory cards in PCMCIA format for example.
The camera is distinguished from the conventional cameras known to those skilled in the art by the fact that it also comprises a user interface for adding information directly to the content of the filmed video.
The user interface is represented in Figure 1 by three buttons 6, 7 and 8. The buttons 6, 7 and 8 are each associated with different events.
When an event that is noteworthy with respect to the event being filmed occurs, the cameraman presses one of the buttons associated with the event.
Each event is then transcribed into an MXF (material exchange format) type stream to be transferred with the video data to which the event relates, in the MXF stream.
The MXF standard is described in the SMPTE standard document 380M.
The representative events are events that can be contained in the "shot" type field.
- the "shot start position" field is associated with the time at which the event occurs,
- the "shot duration" field is not used,
- the "shot track Ids" field is incremented with each noteworthy event, - the "shot description" field comprises the event type, in other words "goal", "red card", etc.
According to the type of event being filmed, it is also possible to modify the functions associated with these buttons. To this end, the user interface 3 is provided with an additional button, a control knob for example, not shown, for indicating the type of event being captured.
Each camera can thus be not dedicated to a type of event but configurable according to the event.
In the case of an event corresponding to a football match, the three buttons
6, 7 and 8 correspond, for example, to the following functions:
Button 6: goal.
Button 7: card (red or yellow). Button 8: fight.
In the case of an event corresponding to a tennis match, the three buttons correspond, for example, to the following functions: Button 6: set. Button 7: point. Button 8: injury.
Each button 6, 7 and 8 can also correspond not to a single function but to several functions. It is possible, for example, to press the buttons for a longer or shorter time, or even to press them a number of times to obtain another function.
The number of buttons is of course given by way of indication and does not constitute a limitation of the invention. It is, of course, possible to have wider capabilities in user interface terms.
In other embodiments, the user interface 3 can also be provided with sound pick-up means, independently of the conventional sound pick-up means of a video camera included in the means 2. Thus, the cameraman can add representative words for the event, such as "goal", "red card", "yellow card", "substitution", in the context of a football match.
It is also possible to associate a set of authorized words with each event, the cameraman then using only those words.
The remote editing device 5 then searches for the words associated with the type of event in the MXF stream received and sets up the video editing in the same way as it does when the information is entered via the buttons.
It is also possible, in other embodiments, to combine a user interface having both buttons and sound pick-up means.
The number of buttons and the layout of the buttons must also be designed according to the ergonomics required for the camera. Too many buttons can lead to operating difficulties on the part of the user. The ergonomics of this interface are not the concern of this patent application.
The information received from the means 3 and the means 2 is then transmitted to means 4 of creating MXF streams.
The functions associated with the buttons or the sound pick-up means are then converted into MXF data according to the event indicated by the control knob.
The conversion of the functions associated with the buttons into MXF data is achieved through means available in the camera, such as programmable- type circuits hardwired to the buttons 6, 7 and 8 and/or sound-pick-up means, or even through processors.
The MXF data is normally used to transmit information linked to the shot parameters.
The added camera parameters are linked to the lens (aperture, depth of field) to which can be added metadata values such as the time code.
The stream creation means 4 create the MXF stream.
Figure 2 represents an embodiment of the device according to the invention, the event corresponding to a football-match-type event.
Two actions are illustrated, one representing a storage action and the other an alarm action.
Cameras 10, 11 and 12 film a football match 9. The cameras 10, 11 and 12 are arranged around a football pitch and designed to display all of the pitch area.
The cameras are connected to parsers 13, 14, 15, 17, 18 and 19. The parsers 13, 14, 15, 17, 18 and 19 receive the video streams from the cameras and analyse them. The video streams are then transmitted to the wall of video screens 16 which displays the videos from the different cameras on a plurality of screens.
The metadata is representative of an important action and representative of a highlight of the event.
The reception device will act according to the metadata.
Several types of action are provided for according to the variety of metadata. The reception devices are also designed to arrange the metadata.
The metadata is classified according to its order of importance for the current event. For example, a metadata item carrying goal information is metadata of the highest importance for a football match. The action of the reception device will therefore in this case be to transmit the information coming from the camera displaying this goal and not from the camera showing any player on the pitch or a fight on the terraces. To this end, the reception device is provided with selection means enabling it to select the video information. Another action that can be considered in the case of a goal is to retransmit a slow-motion replay of the last few seconds of video representative of the action.
There now follows a description of the first conditional alert application.
The reception device constantly analyses the metadata that it receives from the various capture devices.
The reception device is programmed to generate an alarm on reception of a particular metadata item and, in this case, in the case of a football match, a goal.
When a goal is detected, the reception device transmits the video images received from the camera having captured the goal. Naturally, when talking of a goal, this can comprise the action preceding the goal, in other words, a situation that may perhaps lead to a goal. In this case, the cameraman presses the corresponding button and the alarm is triggered on the reception device.
There now follows a description of the second conditional storage application.
It is possible to store in memory only video sequences associated with metadata having a predefined value, this predefined value, furthermore, possibly also being dependent on the event displayed. The mechanism implemented is similar to that implemented in the conditional alarm application. The data is stored only if it is associated with a particular metadata item, or with several particular metadata items. This application can then be used for teaching purposes.
Other actions can, of course, be considered and fall within the context of this invention. Worthy of note, but in a non-exhaustive manner, is the example of video editing which consists in remodelling the video information, retaining, for example, only the highlights that are therefore associated with metadata. This could, for example, lead to the automatic or non-automatic creation of audio and/or video summaries.

Claims

Claims
1. Device for capturing audio/video data representative of an event, said device comprising means of adding metadata to the captured data, characterized in that the means of adding metadata associate predefined metadata item with a type of event.
2. Device according to Claim 1 , characterized in that the means of adding metadata can be configured according to the type of event.
3. Device according to Claim 1 or 2, characterized in that the means of adding metadata comprise a number of keys corresponding to the number of metadata items representative of the event, a key being associated with a metadata item.
4. Device according to one of the preceding claims, characterized in that it comprises means of picking up sounds for associating audio-type metadata with the captured video.
5. Device according to one of the preceding claims, characterized in that it comprises means of encapsulating the captured audio/video data and the associated metadata in one and the same stream.
6. Method of capturing audio/video data representative of an event, said method comprising a step for adding metadata to the captured data, characterized in that, in the step for adding metadata, a predefined metadata item is associated with a type of event.
PCT/EP2005/050863 2004-03-18 2005-03-01 Device for capturing audio/video data and metadata WO2005091622A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0402830 2004-03-18
FR04/02830 2004-03-18

Publications (1)

Publication Number Publication Date
WO2005091622A1 true WO2005091622A1 (en) 2005-09-29

Family

ID=34944314

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2005/050863 WO2005091622A1 (en) 2004-03-18 2005-03-01 Device for capturing audio/video data and metadata

Country Status (1)

Country Link
WO (1) WO2005091622A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ITTO20110042A1 (en) * 2011-01-20 2012-07-21 Sisvel Technology Srl PROCEDURES AND DEVICES FOR RECORDING AND REPRODUCTION OF MULTIMEDIA CONTENT USING DYNAMIC METADATES

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001052178A1 (en) * 2000-01-13 2001-07-19 Digimarc Corporation Authenticating metadata and embedding metadata in watermarks of media signals
US20020170068A1 (en) * 2001-03-19 2002-11-14 Rafey Richter A. Virtual and condensed television programs
US20030002715A1 (en) * 1999-12-14 2003-01-02 Kowald Julie Rae Visual language classification system
US20030187730A1 (en) * 2002-03-27 2003-10-02 Jai Natarajan System and method of measuring exposure of assets on the client side

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030002715A1 (en) * 1999-12-14 2003-01-02 Kowald Julie Rae Visual language classification system
WO2001052178A1 (en) * 2000-01-13 2001-07-19 Digimarc Corporation Authenticating metadata and embedding metadata in watermarks of media signals
US20020170068A1 (en) * 2001-03-19 2002-11-14 Rafey Richter A. Virtual and condensed television programs
US20030187730A1 (en) * 2002-03-27 2003-10-02 Jai Natarajan System and method of measuring exposure of assets on the client side

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BRUCE DEVLIN, SNELL & WILLCOX, UK: "The Material eXchange Format", ABE FACTS 2002, 31 August 2002 (2002-08-31), pages 1 - 5, XP002297732, Retrieved from the Internet <URL:www.broadcastpapers.com/sigdis/Snell&WilcoxMXF01.htm> [retrieved on 20040923] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ITTO20110042A1 (en) * 2011-01-20 2012-07-21 Sisvel Technology Srl PROCEDURES AND DEVICES FOR RECORDING AND REPRODUCTION OF MULTIMEDIA CONTENT USING DYNAMIC METADATES
WO2012098509A1 (en) * 2011-01-20 2012-07-26 Sisvel Technology S.R.L. Processes and devices for recording and reproducing multimedia contents using dynamic metadata
US9800854B2 (en) 2011-01-20 2017-10-24 Sisvel Technology S.R.L. Processes and devices for recording and reproducing multimedia contents using dynamic metadata

Similar Documents

Publication Publication Date Title
EP2710594B1 (en) Video summary including a feature of interest
US9013604B2 (en) Video summary including a particular person
US8212911B2 (en) Imaging apparatus, imaging system, and imaging method displaying recommendation information
US7483624B2 (en) System and method for indexing a video sequence
US8289410B2 (en) Recording apparatus and method, playback apparatus and method, and program
US20060039674A1 (en) Image editing apparatus, method, and program
WO2006098418A1 (en) Image capturing apparatus, image capturing method, album creating apparatus, album creating method, album creating system and program
TW200904181A (en) Information processing apparatus, imaging apparatus, image display control method and computer program
JP2009118510A (en) System and method for managing video file efficiently
EP1347455A2 (en) Contents recording/playback apparatus and contents edit method
US20060050166A1 (en) Digital still camera
KR100967551B1 (en) Method and device for linking multimedia data
US20200051594A1 (en) Generating method and playing method of multimedia file, multimedia file generation apparatus and multimedia file playback apparatus
JPWO2009150827A1 (en) Content editing device
KR101752759B1 (en) Method and apparatus for editing event image in game
US20050001903A1 (en) Methods and apparatuses for displaying and rating content
JP4723901B2 (en) Television display device
WO2005091622A1 (en) Device for capturing audio/video data and metadata
US8850323B2 (en) Electronic device, content reproduction method, and program therefor
KR20150108562A (en) Image processing apparatus, control method thereof and computer readable medium having computer program recorded therefor
KR101063768B1 (en) Simultaneous storage of camera name when saving video data
JP4725892B2 (en) Facial image extraction and storage system and facial image extraction and storage method
CN112822554A (en) Multimedia processing method and device and electronic equipment
JP2017121001A (en) Image processing device, image processing method and program
JP2006060741A (en) Television terminal and server

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase