WO2002062062A1

WO2002062062A1 - Method and arrangement for creation of a still shot video sequence, via an apparatus, and transmission of the sequence to a mobile communication device for utilization

Info

Publication number: WO2002062062A1
Application number: PCT/CH2002/000047
Authority: WO
Inventors: Fabrice Moscheni; Stefan Fischer
Original assignee: Fastcom Technology Sa
Priority date: 2001-01-30
Filing date: 2002-01-28
Publication date: 2002-08-08
Also published as: WO2002062062A8

Abstract

An arrangement and associated method create, for display, a video sequence (2) (e.g., from a television show) that has a reduced data quantity (e.g. a still shot video sequence) and transfer the sequence to a remote, wireless location. Within the arrangement, continuous video (1) and audio data (17) is provided. A sequence group of still video images (3) is selected from the continuous video. Each still video image is associated with a time portion of the continuous audio such that upon presentation of the still video images and the continuous audio each still video image is presented during the respective associated portion of the audio (27a, 27b). The selection and association is accomplished by an apparatus. Data of the selected sequence of still video images and the continuous audio is transmitted to the remote, wireless location.

Description

Method and Arrangement for Creation of a Still Shot Video

Sequence, via an Apparatus, and Transmission of the Sequence to a

Mobile Communication Device for Utilization

Field of the Invention

The present invention relates to the provision of video and audio information to a remote, wireless location. The present invention also relates to the mobile communication terminals (e.g., wireless, mobile computers and telephones) that are increasingly in commonplace use.

Background of the Invention

Various methods for processing and manipulating video imagery have been developed.

In one technology, methods have been developed to detect cuts or splices in the normal recorded sequence. This technology involves the use pixel-based or region-based metrics. Examples of such technology are found in US Patent 05835163 entitled "Apparatus for detecting a cut in a video", US Patent 05767923 entitled "Method and system for detecting cuts in a video signal", US Patent 05732146 entitled "Scene change detecting method for video and movie", US Patent 05719643 entitled "Scene cut frame detector and scene cut frame group detector", and European Patent 00659016B1 entitled "Method and apparatus for video cut detection".

In another technology, methods have been developed for detecting streaming transitions between scenes. Examples of such technology are found in US Patent 05959697 entitled "Method and system for detecting dissolve transitions in a video signal", and US Patent 05990980 entitled "Detection of transitions in video sequences".

Methods have been developed for detecting scene transitions in MPEG data streams based on changes in the macro blocks of MPEG-coded images. See for example US Patent 05900919 entitled "Efficient shot change detection on compressed video data".

Some know methods for detecting transitional markers are based on determining homogenous fields in video signals. See for example US Patent 05778108 entitled "Method and system for detecting transitional markers such as uniform fields in a video signal", or US Patent 06061471 entitled "Method and system for detecting uniform images in video signal".

Complete systems for generating visual indexes are also known to be suitable for creating video summaries. See for example US Patent 06125229 entitled "Visual indexing system", or US Patent 05995095 entitled "Method for hierarchical summarization and browsing of digital video".

Mobile communication terminals are increasing being utilized. Mobile devices such as so-called portable digital assistants (PDAs) or mobile, portable computers increasingly have the ability to receive and transmit data wirelessly. As such, these portable devices have evolved into mobile communication terminals. Further, these mobile devices display visual data, such as text and images.

One well known type of mobile communication terminal is the mobile telephone. Many mobile telephones now also have displays that make it possible to display visual information.

The quantity and quality of displayed visual information by such mobile communication terminals are limited by the transfer bandwidth to the terminals. Compared to terrestrial television transmission or other transmission via satellite networks or cable networks, transmissions associated such mobile communication terminals have a much smaller transfer bandwidth. As such, current mobile communication terminals mentioned above cannot support reception and display of visual data, such as transmission of television shows or the like, at the same ability as the mentioned television transmission/networks. It would be beneficial to permit users of mobile communication terminals to watch television shows or the like in some manner, taking into account the low transfer bandwidth of the mobile communication terminals.

Summary of the Invention

In accordance with one aspect, the present invention provides a method for providing video imagery and associated audio data at a remote, wireless location. Continuous video and audio data are provided. The method is characterized in that a sequence group of still video images is selected from the continuous video. Each still video image is associated with a time portion of the continuous audio such that upon presentation of the still video images and the continuous audio, each still video image is presented during the respective associated portion of the audio. Data of the selected sequence of still video images and the continuous audio is transmitted to the remote, wireless location.

In accordance with another aspect, the present invention provides an arrangement for providing video imagery and associated audio data at a remote, wireless location. A provision device provides continuous video and audio data. The arrangement is characterized in that an image selection device selects a sequence group of still video images from the continuous video. An audio association device associates each still video image with a time portion of the continuous audio such that upon presentation of the still video images and the continuous audio, each still video image is presented during the respective associated portion of the audio. A transmission device transmits data of the selected sequence of still video images and the continuous audio to the remote, wireless location.

In accordance with yet another aspect, the present invention provides an apparatus for use in an arrangement for providing video imagery and associated audio data at a remote, wireless location. The arrangement includes a provision device for providing continuous video and audio data. The arrangement includes a transmission device for transmitting data of a selected sequence of still video images and a continuous audio to the remote, wireless location. The apparatus is characterized in that it includes an image selection device for selecting the sequence group of still video images from the continuous video, and an audio association device for associating each still video image with a time portion of the continuous audio such that upon presentation of the still video images and the continuous audio each still video image is presented during the respective associated portion of the audio.

Brief Description of the Drawings

The foregoing and other features and advantages of the present invention will become apparent to those skilled in the art to which the present invention relates upon reading the following description with reference to the accompanying drawing, in which:

Fig. 1 is a pictorial view of an example of an arrangement that includes a device in accordance with the present invention wherein still shot video signals are selected and transmitted; and

Fig. 2 is a schematic representation of images to show an example of key image selection and presentation.

An example of an arrangement 100 for providing video imagery and associated audio data at a remote, wireless location is shown in Fig. 1.

In the example arrangement 100, remote, wireless locations are at a mobile computer 110 (e.g., a personal digital assistant) and at a mobile telephone 112. The mobile computer 110 and the mobile telephone 112 are examples of mobile communication terminals. However, it is to be appreciated that the remote, wireless location may be another, different mobile communication terminal.

A video camera device 114 of the arrangement 100 directly records perceived images and sounds and provides video and audio data signals to a processing device 116 (e.g., a computer-based video processor). As such, the video camera device 114 is an example provision device for providing continuous video and audio data. However, it is also possible for the provided data not to be perceived/recorded in real-time, but rather to originate from some other provision device, such as a video server, video player, television show archive, data carrier, etc.

A further example of an alternative provision device is a receiving unit that receives analog or digital signals from a terrestrial transmitter, cable network or satellite external to the arrangement 10. Such signals could convey a television show. The received signals are provided as output accordingly. The data from the provision device may be analog or digital, also the data may be uncompressed or compressed.

Within the processing device 116, the data (e.g., the camera or television provided signals) is compressed in accordance with the present invention. A stream of the compressed data is forwarded from the processing device 116 to a transmitting device 118 that interacts with a wireless communication network 120, like a GSM mobile radiotelephone network. The network 120 can be of any configuration, for example, DECT, GSM, GPRS or UMTS standard. Via the network 120 the compressed data is conveyed toward at least one remote, wireless location (e.g., the mobile computer 110 or the mobile telephone 112) via wireless signal.

The mobile computer 110 receives the transmitted data via an associated wireless modem 122. At the mobile computer 110, the data may be utilized currently and/or at a later time. Utilization of the data currently may include the user perceiving video and audio as the data is received (e.g., real time). Utilization of the data at a later time may include storage within a memory of the data within the mobile computer 110.

The mobile telephone 112 receives the transmitted data via an integrated receiver arrangement. At the mobile telephone 112, the data is typically utilized currently. However, the data may be stored for later utilization, dependent upon memory capability of the mobile telephone 112.

In accordance with the present invention, not all of the provided images, as provided by the provision device (e.g., camera device 114), are transmitted to the remote location (e.g., telephone 112), but rather only a limited selection of images. Specifically, only the data of a sequence group of still video images from the continuous video is transmitted. As such, the video aspect is reduced and can be considered to be compressed. The selection of images occurs at the processing device 1 16. In other words, the processing device 116 creates a video summary. However, all of the provided audio data, as provided by the video camera device 114, is transmitted. As such, the audio aspect can be considered to be uncompressed, or at least uninterrupted, i.e. without cuts in the audio stream. In one example, the processing to achieve the data reduction is accomplished by a suitably programmed digital signal processor within the processing device 116.

The continuous audio is associated with the selected sequence group of images. Specifically, each image is associated with a time portion of the continuous audio. In the illustrated example, the processing device 116 includes processing means to accomplish the selection and the association. As such, in one example the selection and association is performed automatically by the means of the processing device 116. It is to be appreciated that the processing device 116 may have a different configuration to be utilized for manual selection and association.

In an embodiment, in order to achieve a better data compression rate, each still image is compressed with a still image compression process, such as but not limited to JPEG, GIF or LZV-Tif. In a various embodiment, the stream of selected still sequences and the associated audio data are compressed with in one of the MPEG formats or other appropriate format. Furthermore, the continuous stream of audio data can be compressed with an appropriate audio compression method, such as but not limited to MP3, without loosing the synchronization between each time portion of the continuous audio and the associated still image.

The presented (i.e., displayed) sequence on the mobile computer 110 or telephone 112 consists of individual, still images of the continuous video accompanied by continuous audio. An appropriate queue or trigger could be transmitted to control change of image at the mobile communication terminal (e.g., computer 110 or telephone 112). This advantageously provides for a reduced volume of data transmitted to and handled by the mobile communication terminal (e.g., computer 110 or telephone 112).

It is to be appreciated that the data quantity to be transmitted for the selected images and the audio data can be further reduced by a suitable data compression method. For example, the transmitting device 118 may contain a data processing unit that is able to create summaries from a digital data stream. In this connection, the data can be compressed as a digital data stream in the formats MPEG-1, MPEG-2, MPEG-4 or other appropriate formats. Such data compression is in distinction to the compression aspect provided to the video achieved via the transmission of only selected video images. Along these lines, the processing device 116 or the transmitting device 118 can be provided with a digitalization unit that makes it possible to digitalize analog video signals.

The selection of images can be based upon any useful criteria. In one example, each selected image to be transmitted represents a scene, that is, a segment of a video sequence between two scene changes. A scene is defined as a video sequence with sound, in which the primary information can be passed on to the viewer with a single image. A scene can also be defined as a segment of continuous action recorded in one place with one camera or image source. A scene change is defined as an event in which a change takes place in the image content. The identification of scenes and scene changes is particularly important when the video camera device 1 14 moves and, in so doing, records a lot of new information for understanding the television show. As one example of a scene, consider two people in a room holding a conversation; one image is sufficient to provide a good representation of this scene.

Turning to identification of a scene change, a change in image content occurs at a cut, that is, switching to other image sources or by trick processes in which transitions between the image sources take place continuously and in which a superimposition of the images of two image sources is visible.

In a first step, the processing device 116 detects the scene changes in the continuous video 1 received from the provision device (camera) 114. This detection can be performed with any of the above described methods for detecting cuts or transitions between scenes. Each time a scene change is found in the video, an inaudible synchronization marker is recorded in the audio data stream or in a separate synchronization stream. The time position of each scene change can also be stored in a file. In this way, the continuous audio is split in a plurality of successive continuous time portions. In a next step, the processing device 116 selects a reduced number, preferably one, of representative pictures in each scene. Each selected picture is associated with the corresponding audio time portion. The resulting audio and video data streams can then be further compressed using conventional compression methods, stored on a record media and transmitted to a remote wireless device 110, 112.

Another example event that can be used as criteria to select an image is the mixing-in of writing into the video portion. Also, the audio itself can be used to select an image. As an example event that can be used as a criteria to select an image is the utterance of a keyword in the audio data. As another example, an image may be selected when the volume suddenly changes.

As yet another criteria used in selecting each still video image, the selection may be based upon choosing an image that either represents a related portion of the continuous video or is necessary for understanding the video data. Watching a video sequence (e.g., a television program) provided in accordance with the present invention can be compared to reading a comic book or a photo album. The important or key images in each case are shown as stills. The dialog and all other sounds are not presented in caption balloons as in a comic book, but rather the dialog and other sounds are conveyed continuously and in sync with the shown images.

Compared to taking in information that is read from data carriers or that was previously stored in the memory of the mobile communication terminal (e.g., the mobile computer 110), the present invention makes it possible to receive and display television and multimedia content in realtime.

Fig. 2 shows the reduction in transmitted data (e.g., compression) that can be achieved by application of the present invention. In the illustrated example of Fig. 1, the processing device 116 performs these functions. An image sequence 1 is the original sequence from which a video summary is intended to be created and transmitted. The original sequence 1 consists of the individual images 10-16. Audio data 17 that accompanies the image sequence 1 is continuous.

Within the image sequence 1, each individual image (e.g., 10) can contain different data. The sequence 1 consists of two scenes. The first scene includes images 10-12, and the second scene includes images 14-16. Image 13 shows an example of a scene transition in which portions of both sequences are superimposed.

Image sequence 2 indicates the clearly reduced data quantity that has to be transmitted. The audio data 27 accompanying the image sequence 2 is unchanged from the audio data 17 and is continuous. For the presentation as sequence 3, in this example, the image 11 is selected from " the original sequence, to provide an image 21 within image sequence 2. The image 21 is transmitted and displayed over a long time span (images 30-33) within the image sequence 3 at the mobile communication terminal (e.g., the mobile computer 110). The next scene is proceeded with in the same way. The image 15 of the original sequence is selected as being representative of the second scene to provide an image 25. The image 25 is transmitted and shown for a long time span (images 34-36) within the image sequence 3 at the mobile communication terminal (e.g., the mobile computer 110).

A first section 27A of the audio data 27 is associated with the use of image 21, and a second section 27B of the audio data is associated with the use of image 25. The audio data 27 is transmitted continuously and unchanged to provide the unchanged audio data 37. Of course, some audio data losses may occur due to transmissions issues, data compression or change in the speed of restitution unrelated to the present invention. At the remote wireless location, the still image changes at the junction between the first and second portions 37A and 37B of the audio data 37.

It is to be noted that between the processing of the image sequence 1 to provide sequence 2 and the presentation of the image sequence 3, there is typically a time delay brought about by creation of the image sequence 2, as a video summary. Time delay also may occur due to transmission, decoding, and display generation. However, such delays typically will not detract from the use and enjoyment provided to the user of the mobile communication terminal.

With such capability, the transmission of data can occur shortly after selection of the images and association of the images with the continuous audio. Of course, it is possible that transmission of data may occur after protracted delay subsequent to selection and association. Suitable memory or storage (e.g., at the processing device 116 or the transmitting device 118) would then be utilized.

The video and audio (e.g., sequence 3) at the mobile communication terminal (e.g., the mobile computer 110) achieves a presentation of the video information similar to television or the like. The essential difference between television and operation of the present invention is that only a representative image of a scene needs to be transmitted and displayed.

Accordingly, one objective of the present invention is to offer users of mobile devices a perception of image and audio information comparable to television. As such the present invention can adapt existing and already available content to the transmission conditions of such devices. The quantity of data of a video sequence or a TV show is reduced without losing the primary information of the show.

The present invention has other applications and other objectives that are provided. For example, the present invention is useable for video monitoring surveillance with remote transmission of selected images and continuous audio. Such video monitoring is useful for smoke detection in a building or a tunnel, intrusion detection in a building or other secure area, etc. Typically, conventional video surveillance results in long, almost still scenes. With the present invention, such long scenes may be represented by a single image, with associated continuous audio. The data transmitted and possibly stored is significantly reduced.

The direct creation of video summaries from video signals of analog or digital cameras represents another application of the invention.

Various configurations of the components of the arrangement are provided via the present invention. It is to be appreciated that various components, having various circuitries, processes, etc. may be employed to accomplish the present invention. As one specific example, technology provided by an iMVS-155, from FASTCOM TECHNOLOGY, Switzerland, may be employed.

It is to be appreciated that some or all of the components associated with providing, processing, and transmitting could be integrated together. For example, the camera device 114, the processing device 1 16, and the transmitting device 118 could be combined into a single unit. Also, various other aspects are provided via the present invention. For example, it is possible to transmit other data before, during, and after transmission of the video summary. The other transmitted data may contain personal profiles, personal preferences, subjective evaluations of the content received, comments, or replies from the user.

Another aspect provided via the present invention is the transmission of text information that describes an original sequence or contains additional background information on the content of the original television show. Such information could aid the viewer in quickly understanding the presented video summary.

Still additional aspects provided via the present invention are security and/or viewing control. To accomplish these aspects, the data (e.g., the image data) could be encoded. A security key and decoding ability would be required at the mobile communication terminal. As such, the user would only able to display the video summary if the user has the key and decoding process. These approaches could be utilized to ensure proper subscription or licensing. Also, these approaches could be utilized to provide parental control over viewed content.

Claims

Claims:

1. A method for providing video imagery and associated audio data at a remote, wireless location (110, 112), the method including: providing continuous video and audio data; and characterized in that the method includes: selecting a sequence group (2) of still video images (21, 25) from the continuous video (1); associating each still video image with a time portion of the continuous audio (17) such that upon presentation of the still video images and the continuous audio each still video image is presented during the respective associated portion of the audio; and transmitting data of the selected sequence of still video images and the continuous audio to the remote, wireless location.

2. A method as set forth in claim 1, wherein the step of selecting a sequence group of still video images includes automatically selecting the images.

3. A method as set forth in one of the claim 1 or 2, wherein the step of selecting a sequence group of still video images includes selecting each still video image that either represents a related portion of the continuous video or is necessary for understanding the video data.

4. A method as set forth in one of the claim 1 to 3, wherein each said selected still image corresponds to a scene in said continuous video.

5. A method as set forth in one of the claim 1 to 4, wherein the step of transmitting data occurs shortly after the step of associating each still video image with a time portion of the continuous audio.

6. A method as set forth in one of the claim 1 to 5, wherein the step of transmitting data occurs after protracted delay from the step of associating each still video image with a time portion of the continuous audio.

7. A method as set forth in one of the claim 1 to 6, wherein the step of transmitting data includes transmission via a wireless communication network to a mobile computer (110, 112).

8. A method as set forth in claim 7, wherein said mobile computer (110, 112) is capable of presenting or storing the still video images and the continuous audio.

9. A method as set forth in one of the claim 1 to 8, wherein the step of providing continuous video and audio data includes providing the video and audio from a television show.

10. A method as set forth in one of the claim 1 to 9, wherein the step of selecting a sequence group of still video images includes using audio signals for the selection of the individual images.

11. A method as set forth in one of the claim 1 to 10, wherein the step of selecting using audio signals occurs during a manual selection of the individual images.

12. A method as set forth in one of the claim 1 to 11, wherein the step of selecting using audio signals occurs during an automatic selection of the individual images.

13. An arrangement for providing video imagery and associated audio data at a remote, wireless location, the arrangement including: a provision device (114) for providing continuous video and audio data; and characterized that the arrangement includes: an image selection device (116) for selecting a sequence group of still video images from the continuous video; an audio association device (116) for associating each still video image with a time portion of the continuous audio such that upon presentation of the still video images and the continuous audio each still video image is presented during the respective associated portion of the audio; and a transmission device (118) for transmitting data of the selected sequence of still video images and the continuous audio to the remote, wireless location.

14. An arrangement as set forth in claim 13, wherein the image selection device (116) includes means for automatically selecting the images.

15. An arrangement as set forth in one of the claim 13 to 14, wherein the image selection device (116) includes means for selecting each still video image that either represents a related portion of the continuous video or is necessary for understanding the video data.

16. An arrangement as set forth in one of the claim 13 to 15, wherein the transmission device (118) includes means for wirelessly transmitting to a mobile computer (110, 1 12) at the remote location.

17. An arrangement as set forth in one of the claim 13 to 16, wherein the mobile computer (110, 112) includes means for storing the transmitted video images and continuous audio.

18. An arrangement as set forth in claim 17, wherein the image selection device (1 16) includes means for using keywords in the audio data to select individual images.

19. An apparatus for use in an arrangement for providing video imagery and associated audio data at a remote, wireless location, the arrangement including a provision device (114) for providing continuous video and audio data, and a transmission device for transmitting data of a selected sequence of still video images and a continuous audio to the remote, wireless location, the apparatus is characterized in that it includes: an image selection device (116) for selecting the sequence group of still video images from the continuous video; and an audio association device (116) for associating each still video image with a time portion of the continuous audio such that upon presentation of the still video images and the continuous audio each still video image is presented during the respective associated portion of the audio.

20. An apparatus as set forth in claim 20, wherein the image selection device (116) includes means for automatically selecting the images.

21. An apparatus as set forth in one of the claims 19 or 20, wherein the image selection device (116) includes means for selecting each still video image that either represents a related portion of the continuous video or is necessary for understanding the video data.

22. An apparatus as set forth in one of the claims 19 or 20, wherein the image selection device (116) includes means for selecting a limited number of still video image from each scene in said continuous video.

23. A wireless apparatus for receiving and restituting video, including wireless reception means for receiving imagery and associated audio data including a plurality of time portions of various lengths, characterized in that it further includes: audio restitution means for restituting said continuous audio data, video restitution means for displaying a sequence of still video images, said video restitution means including means for changing the displayed video sequence once during each of said time portions.

24. A computer program product stored on a computer-usable medium comprising computer-readable program means for causing said computer to perform the steps of one of the claims 1 to 12.