WO2011057507A1

WO2011057507A1 - Method and apparatus for emphasizing video conference on-site atmosphere

Info

Publication number: WO2011057507A1
Application number: PCT/CN2010/075229
Authority: WO
Inventors: 黄杰华
Original assignee: 华为终端有限公司
Priority date: 2009-11-11
Filing date: 2010-07-19
Publication date: 2011-05-19
Also published as: CN102065266B; CN102065266A

Abstract

A method and apparatus for emphasizing video conference on-site atmosphere is provided in the embodiment of the present invention. The method includes the following steps: receiving video and audio data from each assembly room; performing image and voice processing for said video and audio data to emphasize the current video conference atmosphere; feeding back said video and audio data the image and voice processing has been performed for to each assembly room. The image and voice processing is performed for the video and audio data from each assembly room by means of the image processing technology to emphasize the current video conference atmosphere in the embodiment of the present invention, such that conferees can feel the on-site atmosphere of the conference visually from the video and audio.

Description

The present invention claims priority to Chinese Patent Application No. 200910221646. filed on Nov. 11, 2009, the entire contents of which is hereby incorporated by reference. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of image processing, and more particularly to a method and apparatus for highlighting a live atmosphere of a video conference. BACKGROUND A video conference system refers to an individual or a group of two or more different places, and transmits sound, image, and document data to each other through a transmission line and a multimedia device (such as a video camera) to achieve instant and interactive communication to complete The system of conference purposes, the system is a typical image communication system. At the transmitting end of the video conferencing system, the image and sound signals are encoded into digital signals, which are then decoded and displayed as visual and audible information at the receiving end, which is intuitive and has a large amount of information compared with the conference call. Features.

However, the current video conferencing system can not reflect the atmosphere theme of the conference site. For example, it is now necessary to use the video conferencing system to open a sad-themed memorial service, while the existing video conferencing system can only capture the live video captured by the camera. Passed over, the picture is only a real reflection of the scene, and people can only perceive the sadness of the field according to the content in the picture, and can not intuitively perceive the sadness.

Therefore, in general, the video conferencing system in the prior art cannot clearly highlight the atmosphere of the meeting. Wai. SUMMARY OF THE INVENTION Embodiments of the present invention provide a method and apparatus for highlighting a live atmosphere of a video conference, which is used to clearly highlight the live atmosphere of the conference, so that the participant can intuitively feel the atmosphere of the conference.

An embodiment of the present invention provides a method for highlighting a live atmosphere of a video conference, the method comprising: receiving video and audio data of each venue; performing image and sound processing on the video and audio data to highlight an atmosphere of the current video conference; The video and audio data after the image sound processing is fed back to each venue.

The embodiment of the present invention further provides an apparatus for highlighting a live atmosphere of a video conference, comprising: a receiving unit, configured to receive video and audio data of each venue; and an ambience processing unit, configured to perform image and sound processing on the video and audio data. To highlight the atmosphere of the current video conference; the sending unit is configured to feed back the video and audio data processed by the image sound to each venue.

The image processing technology performs image sound processing on the video and audio data of each venue by image processing technology to highlight the atmosphere of the current video conference, so that the participants can intuitively feel the live atmosphere of the conference from the video and audio.

DRAWINGS

1 is a structural diagram of a video conference system according to Embodiment 1 of the present invention;

FIG. 2 is a flowchart of a method for highlighting a live atmosphere of a video conference according to Embodiment 1 of the present invention; FIG. 3 is a flowchart of a method for highlighting a live atmosphere of a video conference according to Embodiment 2 of the present invention; FIG. 5 is a schematic structural diagram of a method for highlighting a live atmosphere of a video conference according to Embodiment 4 of the present invention; FIG. 6 is a schematic structural diagram of a device for highlighting a live atmosphere of a video conference according to Embodiment 4 of the present invention; A device structure diagram highlighting the atmosphere of a video conference. detailed description

Embodiment 1

The working environment of the embodiment of the present invention is briefly introduced. FIG. 1 is a structural diagram of a video conference system according to Embodiment 1 of the present invention. The system includes: a multi-point control unit 101 (MCU), site cameras 102, 103, and 104, wherein the site cameras 102, 103, and 104 are distributed in three different venues, and they are transmitted through multiple transmission lines. The point control unit 101 is connected, for example, via an IP network. The multipoint control unit 101 can forward the video and audio data from each site to other sites, such as forwarding the data collected by the field camera 102 to the site cameras 103 and 104. Of course, the multipoint control unit 101 can forward the data during the forwarding process. Perform certain processing, such as splicing each picture to enhance the conference experience.

FIG. 2 is a flowchart of a method for highlighting a live atmosphere of a video conference according to Embodiment 1 of the present invention, where the method includes the following steps:

S201: Receive video and audio data of each site, specifically, the multi-point control unit receives video and audio data from each site, and the process of receiving data belongs to the prior art, and is not described herein.

S202: Perform image and sound processing on the video and audio data to highlight the atmosphere of the current video conference. In order to make the video and audio highlight the atmosphere of the current video conference, the multi-point control unit needs to perform special image and sound processing on the received video and audio data, so that the participants can intuitively feel the atmosphere of the conference from the video and audio.

As an embodiment of the present invention, the above-described image sound processing may include various processing techniques such as rhythm control processing, color rendering processing, line optimization processing, background fusion processing, special effect generation superimposition processing, or texture simulation processing. Among them, the rhythm control process is to change the frame rate of the received video and audio data, and change the progress of the entire video and audio data processing to make the playback speed faster or slower. Speeding up can highlight a cheerful or rushing live atmosphere, while slowing down can highlight a heavy or serious live atmosphere. For example, converting 25 frames per second of video and audio sequences to 30 frames per second for a cheerful, rushing, and tense feeling; or converting 30 frames per second of video and audio sequences to 25 frames per second. Processing, in order to make people feel heavy or serious atmosphere, of course, embodiments of the present invention are not limited to such a frame rate conversion.

The color rendering process is to enhance the color and color of the received video and audio data: for example, for a positive and positive scene atmosphere, color enhancement, and more warm colors, giving a feeling of sunshine upwards; Negative live atmosphere, using color fade, and more use of cool colors, giving a dark and cloudy feeling.

The line optimization process detects the received video data, finds large outlines in the image, and has obvious edges. For example, for a positive positive scene atmosphere, the curve of the contour lines is optimized to make the shape of the object in the image appear more Graceful; for the negative atmosphere, the contour lines are optimized in a straight line, so that the shape and contour of the objects in the image appear more monotonous, to highlight dignity and seriousness.

The background fusion process is to select some backgrounds of the theme atmosphere in advance. The selection range includes different themes, different industries, different seasons, different meeting places, and some theme of different meeting sizes. Then, according to the actual meeting, select the materials that are suitable for the meeting. , the fusion of the background and the actual video and audio received.

Special effects generation overlay processing is based on the needs of some special applications, to generate some special effects, to highlight the atmosphere of the meeting. For example: When you open a meeting with a sad theme, you can make an image of tears, superimposed on some objects, such as tables and cups superimposed on the meeting scene. The texture simulation process first detects the texture area in the image, and then processes the texture area according to the current environmental parameters, temperature, humidity, brightness, etc., so that other people in the venue can see the texture of the table and the meeting site. People see the same.

Of course, it should be noted that the image and sound processing performed by the multipoint control unit of the embodiment of the present invention is not limited to the above-mentioned several modes, and is not limited to the simultaneous use of several of the above processing modes.

S203: The video and audio data processed by the image sound is fed back to each venue. Taking the video conference scene shown in FIG. 1 as an example, the multipoint control unit 101 receives the video and audio data of the conference camera 102, and after performing the image sound processing in step S202, retransmits the processed image. Give the site cameras 102, 103, and 104 the same processing for the data of the received sites 103 and 104.

The image processing technology performs image sound processing on the video and audio data of each venue by image processing technology to highlight the atmosphere of the current video conference, so that the participants can intuitively feel the live atmosphere of the conference from the video and audio. Embodiment 2

FIG. 3 is a flowchart of a method for highlighting a live atmosphere of a video conference according to Embodiment 2 of the present invention. It should be noted that the embodiment of the present invention also describes the present invention from the perspective of a multipoint control unit. Including the following steps:

S301: Set an ambient ambience mode of the current video conference, where the ambient ambience mode includes an ambience preset value. Before the start of the video conference, firstly, an environment atmosphere mode conforming to the atmosphere of the conference is set in the multi-point control unit, and the environment atmosphere mode can be pre-stored in the storage unit in the multi-point control unit, and can be selected when needed to be called. One of them will be run. These ambient climate models include A series of ambience presets, a sad atmosphere mode can include a series of presets as follows:

30-25; 0; 0; 4; 5. The preset value of this group is divided into five parts, which correspond to the following five processing methods: rhythm control processing, color rendering processing, line optimization processing, background fusion processing, special effect generation superposition processing, and their meanings are:

The rhythm control processing uses a slow playback processing method that converts a 30 frame/second sequence into 25 frames/second.

Color rendering processing adopts color fade processing;

Line optimization processing is used to optimize the line of the corridor line;

The background fusion process uses a preset background numbered 4;

Special effects generation overlay processing uses a preset special effect numbered 5.

Of course, the above is only a list of possible preset values. The embodiment of the present invention is not limited to this form, and other forms of various preset values are also possible.

S 302: Receive video and audio data and ambient parameters of each venue.

As the atmosphere of the video conference may change with the progress of the conference, for example, the originally moderated conference becomes more and more tense due to some disputes. Therefore, if the original ambience default value is used, it will not match the actual situation.

In this case, in this step, the multi-point control unit receives the video and audio data of each site, and also receives the ambience parameters sent by each site. In this embodiment, each site can use the ambience parameter and video. The audio data is sent to the multi-point control unit together, and the ambient parameter can also be sent through the auxiliary stream, which can reflect the change of the atmosphere in each venue. In practical applications, the transmission of the ambient parameters can be performed by the person in charge of the recording in each venue.

S 303: According to the ambience parameter, it is determined whether the ambience preset value needs to be corrected: if necessary, the atmosphere The preset value is corrected, and if not, the process proceeds to step S304.

The ambience parameter may include information about whether the ambience preset value needs to be modified, and how to modify the information. The multi-point control unit only needs to perform corresponding operations according to the information. Of course, the ambience parameter may also be The above information is directly included, but the multi-point control unit is required to perform corresponding processing to obtain the above information, for example, a certain comparison between the ambient parameter and the ambience preset value.

S304: Perform image and sound processing on the video and audio data according to the ambience preset value to highlight an atmosphere of the current video conference.

In this step, the audio and video processing of the video and audio data according to the ambience preset value is similar to that described in step S301, and the specific content of the image sound processing is similar to that of the first embodiment, so it is no longer I will go into details.

S 305: Perform an effect evaluation on the video-audio-processed video and audio to determine whether it is necessary to update the ambience preset value.

In order to ensure that the video and audio processed by the image sound processing in step S304 conform to the atmosphere of the current venue, or the processing effect is satisfactory, the embodiment performs the effect evaluation on the processed video and audio in this step. The effect evaluation can be achieved by comparing the processed video and audio with a preset template. After the effect evaluation, the multi-point control unit judges whether it is necessary to update the ambience preset value according to the evaluation result, if necessary, modifies the ambience preset value, and returns to step S304; if not, proceeds to step S306.

S 306: The video and audio data processed by the image sound is fed back to each venue. The image processing technology performs image and sound processing on the video and audio data of each site by the image processing technology to highlight the atmosphere of the current video conference, so that the participant can intuitively feel the atmosphere of the conference from the video and audio. In addition, the embodiment of the present invention not only detects the situation of each venue in real time. In other words, the ambience preset value is changed according to the specific situation, and the processed video and audio are also evaluated for effects, so that the embodiment of the present invention is more in line with the actual atmosphere of the conference and the effect of highlighting the conference atmosphere is more obvious. Embodiment 3

FIG. 4 is a flowchart of a method for highlighting a live atmosphere of a video conference according to Embodiment 3 of the present invention, where the method includes the following steps:

S401: Each venue selects its own ambient atmosphere mode according to the atmosphere of the venue, and the ambient atmosphere mode includes an atmosphere preset value.

S402: Each site performs pre-image sound processing on the video and audio data collected according to the respective preset values of the atmosphere.

The above steps S401 and S402 are similar to the steps S 301 and S 304 in the second embodiment, respectively, except that the steps S 301 and S 304 in the second embodiment are all performed by the multipoint control unit, and in this embodiment, The operation of the step is performed by each site end. Specifically, the operation may be performed by a video recording device in each site, or may be performed by a separate device connected to the video recording device. of.

After the previous image sound processing step is completed, each site will send the processed video and audio data to the multipoint control unit.

S404: The multipoint control unit performs uniform adaptation optimization on the video and audio data processed by the prior image sound to highlight the overall atmosphere of the current video conference.

In a video conference, the video, audio protocol, format, network bandwidth, and so on are not necessarily the same. Therefore, in order to allow all sites to view video data of other sites, you need The data of each venue is converted to a certain extent. This conversion process is called "adaptation optimization".

S405: The multi-point control unit feeds back the video and audio data after the unified adaptation optimization processing to each site.

As an embodiment of the present invention, after completing step S402, each site may also perform an effect evaluation on the video and audio data after the previous image sound processing to determine whether it is necessary to modify the preset value of the atmosphere, if necessary After the modification, the audio and preset data are processed again, and the previous image sound processing is performed again. If no modification is needed, the video and audio data are sent to the multi-point control unit.

The embodiment of the present invention highlights the atmosphere of the current video conference by performing pre-image sound processing on the respective video and audio data by each site, so that the participant can intuitively feel the atmosphere of the conference from the video and audio. In addition, since the image sound processing is mostly done at the site end, the burden on the multipoint controller is greatly reduced. Embodiment 4

FIG. 5 is a structural diagram of a device for highlighting a live atmosphere of a video conference according to Embodiment 4 of the present invention. The device includes: a receiving unit 510, an ambience processing unit 520, and a sending unit 530, which are sequentially connected.

The receiving unit 510 is configured to receive video and audio data of each site. In this embodiment, the receiving unit 51 0 can receive video and audio data from each site through the Internet, a dedicated line network, or a direct cable, etc., specifically, Receive video and audio data from the camera unit of each venue.

The ambience processing unit 520 is configured to perform image sound processing on the video and audio data to highlight an atmosphere of the current video conference. In order to enable video and audio to highlight the atmosphere of the current video conference, the present invention The apparatus of an embodiment requires special image and sound processing of the received video and audio data so that the participant can intuitively feel the atmosphere of the meeting from the video and audio. The specific image sound processing methods can be various, such as described in the first embodiment: rhythm control processing, color rendering processing, line optimization processing, background fusion processing, special effect generation overlay processing or texture simulation processing, etc. The processing principle and process are similar to those in the first embodiment, and will not be described again.

The sending unit 530 is configured to feed back the video and audio data processed by the image sound to each venue.

As an embodiment of the present invention, the receiving unit 510 is further configured to receive video and audio data that has undergone prior image sound processing, which is similar to the above image sound processing, such as rhythm control processing, color rendering processing. , line optimization processing, background fusion processing, special effect generation overlay processing or texture simulation processing. The difference is that the prior image sound processing is performed by each site end. Specifically, the operation may be performed by a video recording device in each site, or may be an independent connection between the video recording device and the video recording device. The device to complete the operation.

The ambience processing unit 520 is also used to perform uniform adaptation of the video and audio data that has undergone prior image sound processing to highlight the overall ambience of the current video conference.

Thus, most of the image and sound processing is shared by the respective venues, and the burden on the embodiment of the present invention can be greatly alleviated.

The image processing technology performs image and sound processing on the video and audio data of each site by image processing technology to highlight the atmosphere of the current video conference, so that the participant can intuitively feel the live atmosphere of the conference from the video and audio. Embodiment 5 FIG. 6 is a structural diagram of a device for highlighting a live atmosphere of a video conference according to Embodiment 5 of the present invention. The device includes: a receiving unit 610, an ambience processing unit 620, a sending unit 630, a mode setting unit 640, and a tampering unit 650. , a judging unit 660 and an updating unit 670.

The receiving unit 610 is configured to receive video and audio data of each site and the ambience parameters of each site. Since the atmosphere of the video conference may change according to the progress of the conference, if the original ambience preset value is always used, The actual situation does not match. The ambience parameter received by the receiving unit 610 in this embodiment can reflect the change of the atmosphere in each venue. In practical applications, the sending of the ambience parameter can be performed by the person in charge of Nie Lu in each venue.

In this embodiment, each venue may send the ambience parameter together with the video and audio data to the multipoint control unit, and the ambience parameter may also be sent through the auxiliary stream.

The mode setting unit 640 is configured to set an ambient ambience mode of the current video conference, where the ambient ambience mode includes an ambience preset value, and the ambient ambience mode may be pre-stored in a storage unit of the device, and may be selected when needed to be called. One of them will be run. The ambience processing unit 620 may include an ambience processing sub-unit for performing audiovisual processing on the video and audio data received by the receiving unit 610 according to the ambience preset value to highlight the ambience of the current video conference.

The modifying unit 650 is configured to modify the ambience preset value according to the ambience parameter received by the receiving unit 610. The ambience parameter may include information about whether the ambience preset value needs to be modified, and how to modify the information, and the modifying unit 650 only needs to According to the information, the corresponding operation may be performed. Of course, the ambience parameter may not directly include the above information, but the modification unit 650 is required to perform corresponding processing to obtain the above information, for example, the ambience parameter and the ambience preset value. Make a certain comparison.

The determining unit 660 is configured to perform an effect evaluation on the video and audio data subjected to the image sound processing to determine whether it is necessary to update the ambience preset value. In order to ensure the image sound passing through the atmosphere processing unit 620 The processed video and audio conform to the atmosphere of the current conference site, or the processing effect is satisfactory. The determining unit 660 of the embodiment performs an effect evaluation on the processed video and audio, and the effect evaluation can be performed by using the processed video. And audio is compared with a preset template.

The updating unit 670 is configured to update the ambience preset value according to the effect evaluation of the judging unit 660, so that the video and audio ambience processed by the ambience processing unit 620 is better.

The image processing technology performs image and sound processing on the video and audio data of each site by the image processing technology to highlight the atmosphere of the current video conference, so that the participant can intuitively feel the atmosphere of the conference from the video and audio. In addition, the embodiment of the present invention not only detects the situation of each site in real time, but also changes the preset value of the atmosphere according to the specific situation, and also evaluates the effect of the processed video and audio, so that the embodiment of the present invention is more in line with the actual atmosphere of the conference and The effect of highlighting the atmosphere of the meeting is even more obvious. This may be accomplished by a computer program instructing the associated hardware, which may be stored in a computer readable storage medium, which, when executed, may include the flow of an embodiment of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

The specific embodiments of the present invention have been described in detail with reference to the preferred embodiments of the present invention. The scope of the protection, any modifications, equivalents, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

Claim

A method for highlighting a live atmosphere of a video conference, the method comprising: receiving video and audio data of each venue;

Performing image and sound processing on the video and audio data to highlight the atmosphere of the current video conference; and feeding the video and audio data processed by the image sound to each venue.

The method according to claim 1, wherein before receiving the video and audio data of each site, the method further comprises:

Setting an ambient ambience mode of the current video conference, the ambient ambience mode including an ambience preset value; and performing image sound processing on the video and audio data to highlight an atmosphere of the current video conference includes:

The video and audio data are subjected to image sound processing according to the ambience preset value to highlight the atmosphere of the current video conference.

3. The method of claim 2, further comprising:

Receiving the ambience parameter of each site, and modifying the ambience preset value according to the ambience parameter.

4. The method of claim 2, further comprising:

Performing an effect evaluation on the video and audio data after the image sound processing to determine whether it is necessary to update the ambience preset value, and if the ambience preset value needs to be updated, re-pairing the video and audio data according to the updated ambience preset value. Perform image sound processing.

The method of claim 1, wherein the receiving video and audio data of each venue comprises:

Receiving video and audio data of each site that has undergone prior image sound processing;

Performing image and sound processing on the video and audio data to highlight the current video conference atmosphere The surrounding includes:

The video and audio data processed by the prior image sound is uniformly adapted to highlight the overall atmosphere of the current video conference.

The method according to claim 5, wherein the receiving the video and audio data processed by the prior image sound of each site comprises:

The video and audio data of the prior image sound processing are received by each site according to the respective preset values of the atmosphere.

The method according to any one of claims 1 to 6, wherein the image sound processing comprises: rhythm control processing, color rendering processing, line optimization processing, background fusion processing, special effect generation superimposition processing, or texture simulation deal with.

8. A device for highlighting the atmosphere of a video conference, characterized in that it comprises:

a receiving unit, configured to receive video and audio data of each site;

An ambience processing unit, configured to perform image and sound processing on the video and audio data to highlight an atmosphere of the current video conference;

And a sending unit, configured to feed back the video and audio data processed by the image sound to each venue.

The device according to claim 8, further comprising:

a mode setting unit, configured to set an ambient ambience mode of the current video conference, where the ambient ambience mode includes an ambience preset value;

The ambience processing unit includes: an ambience processing sub-unit, configured to perform image sound processing on the video and audio data according to the ambience preset value to highlight an atmosphere of the current video conference.

10. Apparatus according to claim 9 wherein: The receiving unit is further configured to receive an ambient parameter of each site;

The device also includes:

And a modifying unit, configured to modify the ambience preset value according to the ambience parameter.

The device according to claim 9, further comprising: a determining unit, configured to perform an effect evaluation on the video and audio data after the image and sound processing, whether it is necessary to update an ambience preset value;

And an updating unit, configured to update the ambience preset value according to the effect evaluation.

The device according to claim 8, wherein the receiving unit is further configured to receive video and audio data that has undergone prior image sound processing; the ambience processing unit is further configured to: pass the previous image The sound-processed video and audio data are uniformly adapted to highlight the overall atmosphere of the current video conference.