DE102005000728A1

DE102005000728A1 - Image processing system for face detection uses still frame images subjected to semantic analysis

Info

Publication number: DE102005000728A1
Application number: DE102005000728A
Authority: DE
Inventors: Fu-Sheng Wang
Original assignee: Ulead Systems Inc
Current assignee: Ulead Systems Inc
Priority date: 2005-01-04
Filing date: 2005-01-04
Publication date: 2006-07-13

Abstract

The images obtained by a digital still frame camera are subjected to semantic analysis [10] to identify regions of interest [11]. These can be used represent a movement from one orientation to another in a scaled process. This allows the detection and identification of facial appearances in humans. Start and end points are determined [12] and the scaling levels fixed [13] before viewing occurs [14].

Description

Die Erfindung bezieht sich auf ein Verfahren zum Erzeugen einer Diaschau eines Bildes, insbesondere ein Verfahren zum Erzeugen einer Diaschau eines oder mehrerer digitaler Bilder mittels Skalieren und Schwenken der digitalen Bilder.The The invention relates to a method for generating a slide show an image, in particular a method for generating a slide show of a or multiple digital images by scaling and panning the digital pictures.

Hintergrund der Erfindungbackground the invention

Leute lieben es, bei besonderen Ereignissen oder in Touristenorten Bilder aufzunehmen. Mit dem Fortschritt der digitalen Bildtechnologie werden herkömmliche Kameras allmählich durch Digitalkameras ersetzt. In den vergangenen Jahren wurde zur Verbesserung der Fotoqualität eines aufgenommenen Bildes die Anzahl der Bildpunkte bzw. Pixel eines CCD-Bildaufnahmegerätes in einer Digitalkamera wesentlich erhöht. Je größer die Anzahl der Anzeigebildpunkte wird, umso langsamer wird eine Verarbeitungsgeschwindigkeit, was zu Schwierigkeiten beim glatten Anzeigen einer Liniendarstellung führt. Folglich ist es nicht realistisch, einen Anzeigeprozeß auszuführen, indem alle Bildpunkte des Bildaufnahmeteils genutzt werden. Diese Fotos können in digitaler Form als Bilddatensätze gespeichert werden, so daß der Benutzer die Fotos mittels eines Computers oder irgendeines anderen digitalen Gerätes darstellen kann, beispielsweise eines Mobiltelefons, eines PDA's oder auch einer Digitalkamera. Die mit Hilfe einer Digitalkamera aufgenommenen Bilddaten werden allgemein auf einen Personalcomputer oder andere Datenverarbeitungsgeräte übertragen, um sie zu bearbeiten, zu speichern oder zu drucken. Die Digitalkamera ist mit einer Hostschnittstelle zum Verbinden der Digitalkamera mit einem Personalcomputer ausgerüstet, um es zu ermöglichen, Bilddaten auf den als Host- bzw. Hauptcomputer ausgebildeten Personalcomputer zu übertragen, um die Bilddaten zu verarbeiten und/oder zu speichern. Die Anwendungen des digitalisierten Bildes sind deshalb aufgrund der digitalisierten Funktion der Digitalkamera und der Datenverarbeitungsfähigkeit des Personalcomputers mehr und mehr populär im täglichen Leben. Darüber hinaus können in einem Personalcomputer gespeicherte Fotografien verarbeitet werden oder mit Multimediaeffekten versehen werden. Obwohl die Bilder mit einer herkömmlichen Kamera aufgenommen werden, können sie über einen Scanner als Datensätze gespeichert werden. Folglich können fast alle Fotos bzw. Bilder als digitale Datensätze behandelt werden. Der Benutzer kann deshalb das Gerät mit der Fähigkeit einer einfachen Bildverarbeitung zur Rückschau auf die Ereignisse und die Landschaft der Bilder nutzen. Die Fotos können jedoch nur ein ruhendes Bild einfangen und vorhalten. Im Vergleich zum Video ist das Bild langweilig und stumpfsinnig. Jeder auf dem Foto ruht und steht still. Insbesondere können hierauf viele Freunde und Bekannte bei diesem Ereignis sein.People love to take pictures at special events or in tourist resorts take. With the advancement of digital picture technology will be conventional Cameras gradually replaced by digital cameras. In recent years, the Improvement of photo quality a captured image, the number of pixels or pixels a CCD imaging device in one Digital camera significantly increased. The bigger the Number of display pixels becomes, the slower a processing speed, causing difficulty in smoothly displaying a line representation leads. Consequently, it is not realistic to execute a display process by: all pixels of the image-taking part are used. These photos can in digital form as image data sets be stored so that the User the photos by means of a computer or any other digital device can represent, for example, a mobile phone, a PDA or even a Digital camera. The image data taken with the help of a digital camera are generally transmitted to a personal computer or other data processing equipment, to edit, save or print them. The digital camera comes with a host interface for connecting the digital camera equipped with a personal computer to enable it Image data on the personal computer designed as a host computer transferred to, to process and / or store the image data. The applications of the digitized image are therefore due to the digitized Function of digital camera and data processing capability of personal computer more and more popular in daily life. Furthermore can photographs stored in a personal computer are processed or be provided with multimedia effects. Although the pictures with a usual Camera can be recorded she over a scanner as records get saved. Consequently, almost all photos or images are treated as digital records. The user can therefore the device with the ability a simple image processing to review the events and use the landscape of the pictures. The photos, however, can capture and hold only a still picture. Compared to Video is the picture boring and dull. Everyone in the photo rest and stand still. In particular, many friends can do this and be acquainted with this event.

Der herkömmliche Weg, das ruhende Bild lebendiger zu machen, ist die Erzeugung einer Diaschau mit vielen Bildern. Zuerst wählt der Benutzer einige Bilddatensätze von den Speichermitteln aus und bestimmt die Anzeigefolge oder ein Zeitintervall, sofern dieses notwendig ist. Dann zeigt der Computer diese ausgewählten Bilder nacheinander manuell oder automatisch an. Mit komplexeren Funktionen kann der Computer die gesamten Bilder auch verkleinern oder vergrößern, um das Fenster oder den Anzeigeschirm anzupassen.Of the conventional The way to make the dormant image more alive is to create one Slide Show with many pictures. First, the user selects some image records from the storage means and determines the display sequence or a time interval, if this is necessary. Then the computer shows these selected pictures successively manually or automatically. With more complex features can the computer also shrinks or enlarges the entire images to adjust the window or display screen.

Auch wenn die Fotos mit einer herkömmlichen Diaschau angezeigt werden, sind die Fotos, wie zuvor, noch Bilder. Es besteht Bedarf an einer attraktiven Art zum Präsentieren eines Bildes, bei dem die Fotos oder die ruhenden Bilder mehr animiert sind und mit Multimediaeffekten versehen sind.Also if the photos with a conventional Slideshow are displayed, the photos, as before, still pictures. There is a need for an attractive way of presenting an image in which the photos or the still images animates more are and are provided with multimedia effects.

Die ErfindungThe invention

Ausgehend von den beschriebenen Problemen liefert die Erfindung Verfahren und Speichermedien zum Erzeugen einer Diaschau eines Bildes. Mit Hilfe der Erfindung können Fotos angezeigt werden, um den Effekt von Videokameras zu simulieren. Mit anderen Worten kann der Benutzer beim Darstellen der Diaschau sich fühlen als ob er ein Video anschaut. Die Erfindung liefert deshalb interessantere und attraktivere Wege zum Genießen von Fotos.outgoing Of the problems described, the invention provides methods and storage media for producing a slide show of an image. With help of the invention Photos are displayed to simulate the effect of video cameras. In other words, the user can when presenting the slideshow to feel as if he is watching a video. The invention therefore provides more interesting and more attractive ways to enjoy of photos.

Nach einem Aspekt der Erfindung ist ein Verfahren zum Erzeugen einer Diaschau eines Bildes geschaffen. Zuerst wird zum Erfassen semantischer Bereiche eine semantische Analyse des Bildes ausgeführt, wobei die semantische Analyse das Erfassen eines Menschengesichtes umfassen kann und wobei die semantischen Bereiche ein Menschengesicht oder eine Nachbarschaft mit einer vorbestimmten Fläche um einen Fokus des Bildes umfassen kann. Dann werden ein erster Bereich und ein zweiter Bereich aus den erfaßten semantischen Bereichen ausgewählt, und es werden das erste und das zweite Skalierungs- bzw. Zoomniveau des ersten bzw. des zweiten Bereiches bestimmt. Unmittelbar danach wird die Diaschau des Bildes erzeugt, indem das Bild von dem ersten Bereich auf den zweiten Bereich geschwenkt wird, während das Bild von dem ersten Skalierungsniveau auf das zweite Skalierungsniveau allmählich skaliert wird.To One aspect of the invention is a method for generating a Slideshow of a picture created. First, to capture semantic areas performed a semantic analysis of the image, the semantic Analysis may include capturing a human face and wherein the semantic areas a human face or a neighborhood with a predetermined area to include a focus of the picture. Then become a first Area and a second area selected from the detected semantic areas, and it will be the first and the second scaling or zoom levels of the first and the second area determined. Immediately afterwards the slideshow of the image is generated by the image of the first area is panned to the second area while the image is from the first scale level is gradually scaled to the second scaling level.

Nach einem anderen Aspekt der Erfindung ist ein Verfahren zum Erzeugen einer Diaschau eines Bildes geschaffen. Zuerst wird eine semantische Analyse des Bildes zum Erfassen eines semantischen Bereiches ausgeführt, wobei die semantische Analyse das Erfassen eines Menschengesichts umfassen kann und wobei der semantische Bereich ein Menschengesicht oder eine Nachbarschaft mit einer vorbestimmten Fläche um einen Fokus des Bildes umfassen kann. Danach wird einem Skalierungscode entsprechend der Skalierungsstatus des semantischen Bereichs bestimmt. Die Daten in dem Skalierungscode können eine erste Anzahl von Verkleinerungszeiten, eine zweite Anzahl von Vergrößerungszeiten, ein Verhältnis der semantischen Bereichsfläche zu der Bildfläche und einen Abstand zwischen der semantischen Bereichsmitte und der Bildmitte umfassen. Dann wird die Diaschau des Bildes erzeugt, indem das Bild dem Skalierungsstatus entsprechend skaliert wird.According to another aspect of the invention, a method of generating a slide show of an image is provided. First, a semantic analysis of the image to capture a semanti The semantic analysis may include detecting a human face, and wherein the semantic region may include a human face or a neighborhood having a predetermined area about a focus of the image. Thereafter, a scale code corresponding to the scaling status of the semantic area is determined. The data in the scaling code may include a first number of reduction times, a second number of enlargement times, a ratio of the semantic area area to the image area, and a distance between the semantic area center and the center of the image. Then, the slideshow of the image is generated by scaling the image according to the scaling status.

Nach einem weiteren Aspekt der Erfindung ist ein Verfahren zum Erzeugen einer Diaschau eines Bildes geschaffen. Zuerst wird eine nicht-semantische Analyse des Bildes zum Erfassen nicht-semantischer Bereiche durchgeführt, wobei die nicht-semantische Analyse die symmetrische Bereichserfassung umfassen kann und wobei die nicht-semantischen Bereiche die Bereiche mit symmetrischen Bereichen und eine Nachbarschaft mit einer vorbestimmten Fläche um einen Fokus des Bildes umfassen können. Danach werden ein erster Bereich und ein zweiter Bereich aus den erfaßten nicht-semantischen Bereichen ausgewählt, und es werden das erste und das zweite Skalierungsniveau des ersten bzw. des zweiten Bereichs bestimmt. Unmittelbar danach wird die Diaschau des Bildes erzeugt, indem das Bild von dem ersten Bereich auf den zweiten Bereich geschwenkt wird, während das Bild von dem ersten Skalierungsniveau auf das zweite Skalierungsniveau allmählich skaliert wird.To Another aspect of the invention is a method for generating a slide show of an image created. First, a non-semantic Analysis of the image is performed to capture non-semantic regions, where the non-semantic analysis the symmetric domain detection and where the non-semantic areas are the areas with symmetrical areas and a neighborhood with a predetermined one area to include a focus of the image. After that, become a first Area and a second area from the detected non-semantic areas selected, and the first and second scaling levels of the first and second of the second area. Immediately thereafter, the slideshow of the image generated by the image of the first area on the the second area is panned while the image of the first Scaling level gradually scaled to the second scaling level becomes.

Ausführungsbeispieleembodiments

Die Erfindung wird im Folgenden anhand von Ausführungsbeispielen unter Bezugnahme auf Figuren der Zeichnung näher erläutert. Hierbei zeigen:The Invention will be described below with reference to exemplary embodiments with reference on figures of the drawing closer explained. Hereby show:

1 ein Ablaufdiagramm des erfindungsgemäßen Verfahrens; und 1 a flow diagram of the method according to the invention; and

2 ein erfindungsgemäßes Speichermedium. 2 an inventive storage medium.

Die Erfindung wird anhand bevorzugter Ausführungsbeispiele und der Zeichnung erläutert. Es wird darauf hingewiesen, daß alle Ausführungsbeispiele lediglich der Illustration dienen. Folglich kann die Erfindung auch in verschiedenen Ausführungsformen angewendet werden, die von den bevorzugten Ausführungsformen verschieden sind. Darüber hinaus ist die Erfindung nicht auf irgendeine Ausführungsform begrenzt, sondern durch die Ansprüche und deren Äquivalente bestimmt.The Invention is based on preferred embodiments and the drawings explained. It should be noted that all embodiments only serve for illustration. Consequently, the invention can also in different embodiments which are different from the preferred embodiments. About that In addition, the invention is not limited to any embodiment limited but by the claims and their equivalents certainly.

1 zeigt ein Verfahren zum Erzeugen einer Diaschau eines Bildes. Es wird darauf hingewiesen, daß die Erfindung mittels verarbeiten mehrerer Bilder nacheinander auf die mehreren Bilder angewendet werden kann, obwohl bei der bevorzugten Ausführungsform eine Diaschau für ein einzelnes Bild erzeugt wird. Die Erfindung kann somit genutzt werden, um eine Diaschau für mehrere Bilder zu erzeugen. 1 shows a method for generating a slide show of an image. It should be understood that the invention may be applied to the multiple images sequentially by processing multiple images, although in the preferred embodiment, a single image slide show is generated. The invention can thus be used to produce a slide show for multiple images.

Bevor die semantische Analyse im Schritt 10 ausgeführt wird, ist wenigstens ein Bild auszuwählen. Wenn zwei oder mehr Bilder ausgewählt sind, werden diese aufeinanderfolgend verarbeitet. Die Bilder sind vorzugsweise in einem Speichermittel als digitale Datensätze gespeichert. Im Schritt 10 wird die semantische Analyse des Bildes ausgeführt, um semantische Bereiche bzw. Abschnitte zu erfassen. Diese semantischen Bereiche können semantische Objekte enthalten, die für einen Benutzer eine gewisse Bedeutung haben, beispielsweise Menschengesichter, Autos, Text und andere Objekte, an denen ein Benutzer Interesse haben kann. In einer Ausführungsform der Erfindung kann die semantische Analyse das Erfassen eines Menschengesichts umfassen, und jeder semantische Bereich kann hierbei ein Menschengesicht enthalten. Die Technologie zum Erfassen des Menschengesichts ist unterschiedlich und dem Fachmann wohl bekannt, so daß die Beschreibung von Details weggelassen wird, um das Verständnis der Erfindung nicht zu behindern. Referenzen zu Algorithmen der Gesichtserfassung umfassen: „Neural Network-Based Face Detection", Rowley H., Baluja S. und Kanade T., Proc. IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, S. 203–207; „Example-based learning for view-based human face detection", Poggio T., Sung K.K., Proc. of the ARPA Image Understanding Workshop, S. 843–850, 1994; „Face detection in color images", E.-L. Hsu, M. Abdel-Mottaleb und A.K. Jain, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, S. 696–706, Mai 2002; and „Face detection by aggregated bayesian network classiers", T.V. Pham, M. Worring und A.W.M. Smeulders, Technical Report 2001-04, Intelligent Sensory Information Systems Group, University of Amsterdam, Februar 2001. Weil die Menschengesichter üblicherweise die relativ wichtigen Teile eines Bildes sind, sollte die erzeugte Diaschau diese semantischen Bereiche unterstreichen.Before the semantic analysis in step 10 is executed, at least one image is to be selected. If two or more images are selected, they will be processed sequentially. The images are preferably stored in a storage means as digital data records. In step 10 The semantic analysis of the image is performed to capture semantic regions or sections. These semantic regions may contain semantic objects that have some meaning to a user, such as human faces, cars, text, and other objects that a user may be interested in. In one embodiment of the invention, the semantic analysis may include detecting a human face, and each semantic region may include a human face. The technology for detecting the human face is different and well known to those skilled in the art, so that the description of details is omitted so as not to obscure the understanding of the invention. References to facial recognition algorithms include: "Neural Network-Based Face Detection", Rowley H., Baluja S. and Kanade T., Proc. IEEE Conf. On Computer Vision and Pattern Recognition, San Francisco, CA, pp. 203-207 "Example-based learning for view-based human face detection", Poggio T., Sung KK, Proc. of the ARPA Image Understanding Workshop, pp. 843-850, 1994; "Face detection in color images", E.L. Hsu, M. Abdel-Mottaleb and AK Jain, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 696-706, May 2002; and "Face Detection by aggregated bayesian network classiers," TV Pham, M. Worring, and AWM Smeulders, Technical Report 2001-04, Intelligent Sensory Information Systems Group, University of Amsterdam, February 2001. Because human faces are usually the relatively important parts of an image should the generated slideshow underline these semantic areas.

Nach dem Ausführen der semantischen Analyse können einige semantische Bereiche erfaßt bzw. detektiert werden. Im Schritt 11 wird die Anzahl der erfaßten semantischen Bereiche bzw. Abschnitte geprüft. Wenn in einem Bild zwei oder mehr semantische Bereiche erfaßt werden, werden zwei dieser semantischen Bereiche als der Anfangsbereich und der Endbereich (im Fall von zwei Bereichen) ausgewählt, wie dies Schritt 12 zeigt. Beispielsweise können die semantischen Bereiche entsprechend ihrer x-Koordinaten sortiert werden, und dann wird der am weitesten links befindliche semantische Bereich als der Anfangsbereich und der am weitesten rechts befindliche semantische Bereich als der Endbereich ausgewählt. Alternativ kann festgelegt sein, mehr als zwei Bereiche auszuwählen, weshalb ein erster, ein zweiter und ein dritter Bereich möglich sind. Dann wird das Verhältnis der Anfangsbereichsfläche zu der Bildfläche und das Verhältnis der Endbereichsfläche zu der Bildfläche berechnet. Des weiteren werden der Abstand zwischen der Anfangsbereichsmitte und der Bildmitte als auch der Abstand zwischen der Endbereichsmitte und der Bildmitte berechnet. Die genannten Verhältnisse und Abstände sind die Parameter, welche genutzt werden, um das erste Skalierungsniveau des Anfangsbereichs und das zweite Skalierungsniveau des Endbereichs gemäß Schritt 13 zu bestimmen. Die Skalierungsniveaus können bei anderen Ausführungsformen der Erfindung auch entsprechend anderer Faktoren bestimmt werden, beispielsweise den Skalierungs zuständen anderer Bilder. Zusätzlich ist das erste Skalierungsniveau vorzugsweise verschieden von dem zweiten Skalierungsniveau.After performing the semantic analysis, some semantic regions can be detected. In step 11 the number of detected semantic areas or sections is checked. If two or more semantic areas are detected in an image, two of these will be se mantic areas are selected as the start area and the end area (in the case of two areas) as this step 12 shows. For example, the semantic regions may be sorted according to their x-coordinates, and then the leftmost semantic region is selected as the initial region and the rightmost semantic region is selected as the final region. Alternatively, it may be specified to select more than two areas, so that a first, a second and a third area are possible. Then, the ratio of the initial area area to the image area and the ratio of the end area to the image area are calculated. Furthermore, the distance between the start center and the center of the image as well as the distance between the end region center and the center of the image are calculated. Said ratios and distances are the parameters which are used to determine the first scaling level of the starting area and the second scaling level of the ending area according to step 13 to determine. The scaling levels can also be determined according to other factors, for example the scaling states of other images, in other embodiments of the invention. In addition, the first scaling level is preferably different from the second scaling level.

Um im Schritt 14 die Diaschau zu erzeugen, wird das Bild von dem Anfangsbereich zu dem Endbereich geschwenkt. Zur gleichen Zeit wird, wenn das erste und das zweite Skalierungsniveau verschieden sind, das Bild von dem ersten Skalierungsniveau auf das zweite Skalierungsniveau in dem Schwenkprozeß allmählich skaliert bzw. gezoomt. Wenn jedoch die beiden Skalierungsniveaus identisch sind, kann das Bild bei festem Skalierungsniveau geschwenkt werden. Deshalb kann die erzeugte Diaschau eine Videokamera simulieren, die die Szene filmt, bei der das Bild aufgenommen wurde. Mit anderen Worten fühlt sich der Benutzer als ob er ein Video der Szene schaut, wobei er sich tatsächlich eine Diaschau eines ruhenden Bildes ansieht.To step in 14 To generate the slide show, the image is panned from the beginning area to the end area. At the same time, when the first and second scaling levels are different, the image is gradually scaled from the first scaling level to the second scaling level in the pan process. However, if the two scaling levels are identical, the image can be panned at a fixed scale level. Therefore, the generated slideshow can simulate a video camera that films the scene where the picture was taken. In other words, the user feels as if he is watching a video of the scene, actually viewing a slideshow of a still picture.

In einer Ausführungsform der Erfindung wird im Schritt 13 auch der Skalierungszustand bestimmt. Es gibt zwei Arten von Skalierungszuständen, nämlich Verkleinern und Vergrößern. Um einen Skalierungszustand auszuwählen, wird der Skalierungscode betrachtet. Der Skalierungscode wird hauptsächlich genutzt, um eine bestimmte Information über das aktuelle Bild und über die Skalierungszustände früherer Bilder zu speichern. Bei einer Ausführungsform der Erfindung kann der Skalierungscode die Daten der Anzahl von Verkleinerungen, die Anzahl von Vergrößerungen, das Verhältnis des erfaßten semantischen Bereiches zu dem gesamten Bild und den Abstand zwischen der Mitte des erfaßten semantischen Bereiches und der Bildmitte umfassen. Die Anzahl der Verkleinerungen und die Anzahl der Vergrößerungen zeichnet die Skalierungszustände früher Bilder auf. Beispielsweise wird in einer Situation, in der das Verhältnis der erfaßten semantischen Bereichsfläche zu der gesamten Bildfläche kleiner als ein vorbestimmter Schwellwert ist, die Verkleinerung vorzugsweise als der Skalierungszustand für den Zweck des klaren Zeigens der interessierenden Bereiche für den Benutzer ausgewählt. Wenn jedoch die Anzahl der Verkleinerungen größer als ein vorbestimmter Schwellwert ist, wird der Skalierungszustand vorzugsweise auf Vergrößerung gesetzt. Wenn zu viele Verkleinerungen oder Vergrößerungen aufeinanderfolgend ausgeführt wurden, kann dies für unsere Zuschauer ermüdend sein, und deshalb sollte auch die Anzahl der Verkleinerungen und die Anzahl der Vergrößerungen betrachtet werden.In one embodiment of the invention is in step 13 also determines the scaling state. There are two types of scaling states, namely shrinking and enlarging. To select a scaling state, consider the scaling code. The scaling code is mainly used to store specific information about the current image and the scaling conditions of previous images. In one embodiment of the invention, the scaling code may include the data of the number of reductions, the number of magnifications, the ratio of the detected semantic range to the entire image, and the distance between the center of the detected semantic range and the image center. The number of reductions and the number of magnifications records the scaling states of early images. For example, in a situation where the ratio of the detected semantic area area to the entire image area is less than a predetermined threshold, the reduction is preferably selected as the scaling state for the purpose of clearly showing the areas of interest to the user. However, if the number of reductions is greater than a predetermined threshold, the scaling state is preferably set to increase. If too many reductions or enlargements have been performed consecutively, it may be tiring for our viewers, and therefore the number of reductions and the number of enlargements should also be considered.

Wenn nur ein semantischer Bereich erfaßt wird, werden das Skalierungsniveau dieses semantischen Bereiches und der Skalierungszustand des Bildes im Schritt 13 bestimmt. Danach wird die Diaschau des Bildes mit nur einem erfaßten semantischen Bereich im Schritt 14 erzeugt. In einer Ausführungsform der Erfindung ist das Bild mittig zu dem semantischen Bereich und dem Skalierungszustand entsprechend skaliert. Folglich kann die Skalierungs- bzw. Zoomfunktion einer Videokamera beim Darstellen der Diaschau simuliert werden. Wenn jedoch eine Person ein Foto aufnimmt, ist der Fokus üblicherweise auf den interessantesten Ort gerichtet. Deshalb ist im Allgemeinen die Nachbarschaft um den Fokus für den Benutzer wesentlich. In einer Ausführungsform der Erfindung umfassen die semantischen Bereiche weiterhin die Nachbarschaft um den Fokus auf dem Bild. Die Fläche einer solchen Nachbarschaft ist vorzugsweise vorbestimmt oder bestimmt entsprechend der Fläche des gesamten Bildes. Zusätzlich umfassen die semantischen Bereiche in der Situation, daß ein Bild mehrere Fokusse aufweist, die Nachbarschaft um jeden Fokus in diesem Bild. In einer Ausführungsform der Erfindung kann die Information des Fokus von dem Bilddatensatz in Exif-Format erhalten werden. In dieser Situation kann die Nachbarschaft um den Fokus des Bildes als ein zusätzlicher semantischer Bereich genommen werden, da nur ein semantischer Bereich in einem Bild erfaßt wird. Deshalb kann das Bild als das Bild mit mehreren erfaßten semantischen Bereichen behandelt werden. In der Situation, daß das Bild mehrere erfaßte semantische Bereiche aufweist, können die obigen Nachbarschaften immer noch als die semantischen Bereiche eingeschlossen werden.If only one semantic area is detected, the scale level of this semantic area and the scaling state of the image in step 13 certainly. Thereafter, the slide show of the image with only one detected semantic area in the step 14 generated. In one embodiment of the invention, the image is scaled centered to the semantic region and the scaling state. Consequently, the scaling or zooming function of a video camera can be simulated in presenting the slide show. However, when a person takes a photo, the focus is usually directed to the most interesting place. Therefore, in general, the neighborhood around the focus is essential to the user. In one embodiment of the invention, the semantic regions further include the neighborhood around the focus on the image. The area of such a neighborhood is preferably predetermined or determined according to the area of the entire image. In addition, in the situation where an image has multiple focuses, the semantic regions include the neighborhood around each focus in that image. In one embodiment of the invention, the information of the focus can be obtained from the image data set in Exif format. In this situation, the neighborhood around the focus of the image may be taken as an additional semantic area since only one semantic area in an image is captured. Therefore, the image can be treated as the image with a plurality of captured semantic regions. In the situation that the image has a plurality of detected semantic regions, the above neighborhoods may still be included as the semantic regions.

Wenn im Schritt 10 kein semantischer Bereich erfaßt wird, wird im Schritt 15 die nicht-semantische Analyse ausgeführt. In einer Ausführungsform der Erfindung kann eine solche nicht-semantische Analyse das Erfassen symmetrischer Bereiche bzw. Abschnitte umfassen, und die erfaßten nicht-semantischen Bereiche können Bereiche mit einem symmetrischen Muster umfassen. Das Erfassen eines symmetrischen Bereiches kann implementiert werden, indem symmetrische Punkte gesucht werden und ein Bereich gesetzt wird, der jede Gruppe relativ symmetrischer Punkte enthält. Weiterhin sind nicht-semantische Bereiche eines Bildes die hervorstechenden Teile, welche Bildinformationen von niedrigem Niveau enthalten, die die Aufmerksamkeit des Benutzers erwecken können, beispielsweise ein Objekt, welches eine hohe Symmetrie aufweist, und ein Bereich, der einen hohen Kontrast oder scharfe Kanten bzw. Ränder aufweist. Allgemein tendiert die Aufmerksamkeit des Benutzers dazu, von nicht-semantischen Bereichen angezogen zu werden, ohne das Ziel ein spezielles Objekt zu suchen. Die Technologie zum Erfassen eines symmetrischen Bereiches ist unterschiedlich und dem Fachmann gut bekannt, so daß eine Beschreibung von Details weggelassen ist, um das Verständnis der Erfindung nicht zu behindern. Referenzen zu Symmetrieoperatoren und bildliche Merkmale niedrigen Niveaus umfassen: „Context free attentional operators: the generalized symmetry tranform", D. Reisfeld, H. Wolfson und Y. Yeshurun, Int. J. of Computer Vision, Special Issue on Qualitative Vision, 14:119–130, 1995, „The discrete symmetry transform in computer vision", Di Ges'u V. und Valenti C., Tech. Rep. DMA-011/95, DMA Univ. Palermo, Palermo, 1995; „Detecting symmetry in grey level images: The global optimization approach", Y. Gofman und N. Kiryati., In ICPR, Seite A94.2, 1996; „Automatic identification of perceptually important regions in an image using a model of the human visual system," W. Osberger and A. Maeder, Submitted to 14^th International Conference on Pattern Recognition, Aug 1998; und „Evaluation of Interest Point Detectors", C. Schmid, R. Mohnrand and C. Bauckhage, Int'l Journal of Computer Vision, 37(2), 151–172, 2000.When in step 10 no semantic range is detected, in step 15 performed the non-semantic analysis. In one embodiment of the invention, such non-semantic analysis may include detecting symmetric regions, and the non-semantic detected Areas may include areas having a symmetrical pattern. The detection of a symmetric region can be implemented by searching for symmetric points and setting an area containing each group of relatively symmetrical points. Furthermore, non-semantic regions of an image are the salient parts that contain low-level image information that may attract the user's attention, for example, an object that has high symmetry and an area that has high contrast or sharp edges or edges. Has edges. Generally, the user's attention tends to be attracted to non-semantic regions without the search for a specific object. The technology for detecting a symmetrical region is different and well known to those skilled in the art, so that a description of details is omitted so as not to obscure the understanding of the invention. References to symmetry operators and low level pictorial features include: Context free attentional operators: the generalized symmetry tranform, D. Reisfeld, H. Wolfson and Y. Yeshurun, Int. J. of Computer Vision, Special Issue on Qualitative Vision, 14: 119-130, 1995, "The discrete symmetry transform in computer vision", Di Ges'u V. and Valenti C., Tech. Rep. DMA-011/95, DMA Univ. Palermo, Palermo, 1995; "Detecting symmetry in gray level images: The Global Optimization Approach", Y. Gofman and N. Kiryati., In ICPR, page A94.2, 1996; "Automatic identification of regions in an image using a model of the human visual System, "W. Osberger and A. Maeder," Submitted to 14 ^th International Conference on Pattern Recognition, Aug 1998; and "Evaluation of Interest Point Detectors", C. Schmid, R. Popprand and C. Bauckhage, Int'l Journal of Computer Vision, 37 (2), 151-172, 2000.

Mit diesen erfaßten, nicht-semantischen Bereichen wird das Bild in ähnlicher Weise wie das Bild mit mehreren semantischen Bereichen verarbeitet, wie dies oben beschrieben ist. In Schritt 12 werden zwei nicht semantische Bereiche als Anfangs- bzw. Endbereich ausgewählt. Danach werden entsprechend denselben Faktoren, wie dies oben in Schritt 13 erwähnt ist, erste und zweite Skalierungsniveaus einzeln entsprechend dem Anfangs- und dem Endbereich bestimmt. Schließlich wird eine Diaschau des Bildes mit nicht-semantischen Bereichen im Schritt 14 erzeugt. Weil der Prozeß einer nicht-semantischen Situation ähnlich zu dem einer semantischen Situation ist, werden aus Gründen der Straffung die meisten Details weggelassen.With these detected non-semantic regions, the image is processed in a manner similar to the multi-semantic region image as described above. In step 12 Two non-semantic ranges are selected as the start and end ranges, respectively. After that, according to the same factors as above in step 13 is mentioned, first and second scaling levels determined individually corresponding to the beginning and the end area. Finally, a slideshow of the image with non-semantic areas in the step 14 generated. Because the process of a non-semantic situation is similar to that of a semantic situation, most of the details are omitted for purposes of streamlining.

Ein anderer Aspekt der Erfindung bezieht sich auf ein computerlesbares Speichermedium mit computerausführbaren Instruktionen bzw. Befehlen zum Erzeugen einer Diaschau eines Bildes gemäß 2. In einer Ausführungsform der Erfindung umfaßt ein solches computerlesbares Speichermedium 20 vier Module, bei denen es sich um das nicht-semantische Analysemodul 201, das semantische Analysemodul 202, das Skalierungsniveau/-zustand-Bestimmungsmodul 203 und das Diaschau-Erzeugungsmodul 204 handelt. Diese Module können Computersoftware oder Folgen computerausführbarer Instruktionen bzw. Befehle sein. In einer Ausführungsform der Erfindung kann das computerlesbare Speichermedium eine Kompaktdiskette (CD), eine digitale Videodiskette (DVD), eine Blue-Ray-Diskette, eine Diskette, eine Festplatte oder ein Flash-Speicher sein. Es wird darauf hingewiesen, daß diese Arten lediglich der Illustration dienen und nicht limitierend sind. Alle anderen möglichen Arten von Speichermedien mit der Fähigkeit zum Speichern digitaler Daten sind von der Erfindung umfaßt.Another aspect of the invention relates to a computer-readable storage medium having computer-executable instructions for generating a slide show of an image according to 2 , In one embodiment of the invention, such a computer-readable storage medium comprises 20 four modules, which are the non-semantic analysis module 201 , the semantic analysis module 202 , the scaling level / state determination module 203 and the slide show generation module 204 is. These modules may be computer software or sequences of computer-executable instructions. In one embodiment of the invention, the computer-readable storage medium may be a compact disk (CD), a digital video diskette (DVD), a blue-ray diskette, a floppy disk, a hard disk, or a flash memory. It should be understood that these species are for illustration only and are not limiting. All other possible types of storage media capable of storing digital data are encompassed by the invention.

Das Bild wird zuerst mittels des semantischen Moduls 201 bearbeitet, um den semantischen Bereich zu erfassen. Wenn mehrere semantische Bereiche erfaßt werden, werden zwei von ihnen als der Anfangsbereich und der Endbereich ausgewählt. Das Skalierungsniveau/-zustand-Bestimmungsmodul 203 entscheidet über das erste Skalierungsniveau des Anfangsbereichs und das zweite Skalierungsniveau des Endbereiches. Die Skalierungsniveaus werden beide entsprechend dem Verhältnis ihre Fläche zu der Bildfläche und des Abstands zwischen ihren Mitten und der Bildmitte entschieden. Das Skalierungsniveau/-zustand-Bestimmungsmodul 203 entscheidet auch über den Skalierungszustand entsprechend dem Skalierungscode. Die Daten und die Funktionen des Skalierungscodes wurden oben beschrieben und werden hier deshalb weggelassen.The picture is first made by means of the semantic module 201 edited to capture the semantic range. If multiple semantic areas are detected, two of them are selected as the start area and the end area. The scaling level / state determination module 203 Decides on the first scale level of the start area and the second scale level of the end area. The scaling levels are both decided according to the ratio of their area to the image area and the distance between their centers and the center of the image. The scaling level / state determination module 203 Also decides about the scaling state according to the scaling code. The data and the functions of the scaling code have been described above and are therefore omitted here.

In einer Ausführungsform der Erfindung kann das semantische Analysemodul 201 die Funktion des Erfassens eines Menschengesichts enthalten. Die semantischen Bereiche können darüber hinaus den Bereich eines Menschengesichts und die Nachbarschaft um jeden Fokus umfassen.In one embodiment of the invention, the semantic analysis module 201 contain the function of detecting a human face. The semantic areas may also include the area of a human face and the neighborhood around each focus.

Wenn es unvorteilhaft ist, daß kein semantischer Bereich erfaßt wurde, wird das Bild ohne semantische Bereich mit dem nicht-semantischen Analysemodul 202 behandelt, um die nicht-semantischen Bereiche zu erfassen. In ähnlicher Weise werden zwei der nicht-semantischen Bereiche als der Anfangs- und der Endbereich ausgewählt. Das Skalierungsniveau/-zustand-Bestimmungsmodul 203 entscheidet über das erste Skalierungsniveau des Anfangsbereiches und das zweite Skalierungsniveau des Endbereiches.When it is unfavorable that no semantic area has been detected, the image without semantic area is treated with the non-semantic analysis module 202 to detect the non-semantic areas. Similarly, two of the non-semantic regions are selected as the start and end regions. The scaling level / state determination module 203 decides the first scaling level of the beginning area and the second scaling level of the end area.

Zum Erzeugen einer Diaschau wird das Bild mit dem Anfangs- und dem Endbereich von dem Anfangsbereich zu dem Endbereich geschwenkt. Wenn das Skalierungsniveau jedes Bereiches unterschiedlich ist, wird das Bild auch vergrößert oder verkleinert beim Schwenken. Beispielsweise, wenn das Skalierungsniveau zweifach (2 X) ist und das zweite Skalierungsniveau ein Drittel (1/3 X) ist. Das Bild wird zuerst dreimal verkleinert und dann allmählich um ein Drittel vergrößert. In der Situation, daß das Bild nur einen semantischen Bereich aufweist, ist das Bild mittig zu dem semantischen Bereich. Gleichzeitig wird das Bild dem Skalierungszustand entsprechend vergrößert oder verkleinert. Deshalb wird eine lebendige Diaschau des Bildes erzeugt, um den Effekt des Videos zu simulieren.To create a slide show, the image with the start and end regions is panned from the beginning region to the end region. If the scaling level of each area is different, the image is also enlarged or reduced in panning. For example, if the scaling level is twice (2 X) and the second scaling level is one-third (1/3 X). The picture is first reduced in size three times and then gradually enlarged by one third. In the situation that the image has only one semantic range, the image is centered on the semantic range. At the same time, the image is enlarged or reduced in accordance with the scaling state. Therefore, a vivid slide show of the image is created to simulate the effect of the video.

In einer Ausführungsform der Erfindung wird ein Schwenkweg entsprechend der Verteilung der semantischen oder der nicht-semantischen Bereiche bestimmt. Bei der Diaschau wird das Bild entlang des Schwenkweges über wenigstens zwei Bereiche geschwenkt. Die beiden Enden des Schwenkweges sind vorzugsweise die Mitte des Anfangsbereiches und die Mitte des Endbereiches. Die Linienform variiert mit der Verteilung der Bereiche. Beispielsweise kann der Schwenkweg eine gerade Linie oder eine Kurve sein.In an embodiment The invention is a pivoting path according to the distribution of the semantic or the non-semantic ranges. At the slideshow the image is taken along the pivoting path over at least two areas pivoted. The two ends of the pivoting path are preferably the middle of the starting area and the middle of the end area. The Line shape varies with the distribution of the areas. For example, can the swivel path will be a straight line or a curve.

Darüber hinaus sind die vorgenannten Module des computerlesbaren Speichermediums 20 in der Lage, mehrere Bilder zu verarbeiten, obwohl die obigen Erläuterungen sich alle auf einen Prozeß mit einem einzelnen Bild beziehen. Um mehrere Bilder zu verarbeiten, können die Module diese nacheinander behandeln und eine Diaschau mehrerer Bild erzeugen. Folglich kann eine solche Diaschau mehrere Bilder aufeinanderfolgend mit der Simulation von Vi deoeffekten darstellen. Darüber hinaus können die auf einem Personalcomputer oder einem tragbaren Gerät gespeicherten Fotos mit phantastischem Multimediaeffekt bearbeitet werden.In addition, the aforementioned modules of the computer-readable storage medium 20 being able to process multiple images, although the above explanations all refer to a single image process. To process multiple images, the modules can treat them one after another and create a slide show of multiple images. Consequently, such a slide show can display several images consecutively with the simulation of video effects. In addition, photos stored on a personal computer or a portable device can be edited with a fantastic multimedia effect.

Für den Fachmann ergibt sich, daß die beschriebenen bevorzugten Ausführungsformen eher der Illustration als einer Begrenzung der Erfindung dienen. Es ist beabsichtigt, verschiedene Modifikationen und ähnliche Anordnungen abzudecken, die im Bereich der Ansprüche liegen. Der Bereich der Ansprüche ist möglichst breit zu interpretieren und soll alle diese Modifikationen und ähnliche Strukturen umfassen. Während bevorzugte Ausführungsformen der Erfindung beschrieben wurden, ergibt sich, daß verschiedene Änderungen gemacht werden können, ohne den Bereich der Erfindung zu verlassen.For the expert it follows that the described preferred embodiments serve for illustration rather than limitation of the invention. It is intended to be various modifications and the like To cover arrangements which are within the scope of the claims. The area of claims is possible to interpret broadly and to all these modifications and the like Structures include. While preferred embodiments of Invention, it will be apparent that various changes can be done without to abandon the scope of the invention.

Claims

Method for generating a slide show, wherein the method comprises the following steps: Running a semantic analysis of the image to capture semantic areas; Select one first area and a second area from the semantic areas; Determining a first scaling level of the first one Range and a second scaling level of the second area; and Generating the slide show of the picture by panning the picture Image from the first area to the second area during the gradual Scaling the image from the first scaling level to the second one Scaling level.

Method according to claim, characterized that the semantic analysis involves a human face detection.

Method according to claim 1, characterized in that that the semantic areas a neighborhood with a predetermined area to cover a focus of the image and that information of the focus in the identifier of the image is saved in Exif format become.

Method according to claim 1, characterized that this Picture along a swivel path over at least two of the semantic areas are panned.

Method according to claim 1, characterized in that that this first scaling level corresponding to a first ratio of the first area area to the picture surface and a first distance between the first area center and the Image center and the second scaling level corresponding to a second relationship to the second area area the picture surface and a second distance between the second area center and the center of the image.

Method according to claim 1, characterized by a Step for determining a scaling state of the image accordingly the scaling state of at least one previous image.

Method for generating a slide show of an image, the method comprising the following steps: Running a semantic analysis of the image to capture a semantic range; Determining a scale level of the area and a scaling state of the image according to a scaling code, wherein the scaling code comprises information about the image; and Produce the slideshow of the image by scaling the image in the middle the semantic range according to the scaling level and the scaling state.

The method of claim 7, characterized in that the data in the scaling code further comprises: a first number of decreases; a second number of magnifications a ratio of the semantic area area to the image area; and a distance between the semantic center area and the center of the image.

Method according to claim 7, characterized in that that the semantic analysis involves a human face detection.

Method for generating a slide show of an image, the method comprising the following steps: Running a non-semantic analysis of the image for capturing non-semantic regions; Select one first area and a second area from the non-semantic areas; Determine a first scale level of the first area and a second scale level Scaling levels of the second area; and Create the slideshow of the image by panning the image from the first area the second area during a gradual Scaling the image from the first scaling level to the second one Scaling level.

Method according to claim 10, characterized in that that the non-semantic Analysis includes a symmetric area detection.

Method according to claim 10, characterized in that that the non-semantic Area a neighborhood with a predetermined area around one Include focus of the image and that information of the focus in the identifier of the image are stored in Exif format.

Method according to claim 10, characterized in that that this Picture along a swivel path over at least two of the non-semantic areas are panned.

Method according to claim 10, characterized in that that this first scaling level corresponding to a first ratio of the first area area to the picture surface and a first distance between the first region center and the image center and that second scaling level corresponding to a ratio of the second area area to the scene and a second distance between the second area center and the center of the image.

A method according to claim 10, characterized by a step for determining a scaling state of the image according to the scaling state of at least one previous image.

Computer-readable storage medium with computer-executable Instructions for generating a slide show of an image for executing the following steps: To run a semantic analysis of the image to capture semantic areas; Choose a first area and a second area from the semantic sectors; Determining a first scaling level of the first one Range and a second scaling level of the second area; and Generating the slide show of the picture by panning the picture Image from the first area to the second area during the gradual Scaling the image from the first scaling level to the second one Scaling level.

Computer-readable storage medium according to claim 16, characterized in that the semantic analysis of a human face detection.

Computer-readable storage medium according to claim 16, characterized in that the semantic areas a neighborhood with a predetermined area to cover a focus of the image and that information of the focus in the identifier of the image can be saved in Exif format.

Computer-readable storage medium according to claim 16, characterized in that the Picture along a swivel path over at least two of the semantic areas are panned.

Computer-readable storage medium according to claim 16, characterized in that the first scaling level corresponding to a first ratio of the first area area to the picture surface and a first distance between the first area center and the Image center and the second scaling level corresponding to a second relationship the second area area to the picture surface and a second distance between the second area center and the center of the image.

Computer-readable storage medium according to claim 16, characterized by a step of determining a scaling state of the image according to the scaling state of at least one earlier Image.

A computer-readable storage medium having computer-executable instructions for generating a slide show of an image by performing the steps of: performing a semantic analysis of the image to capture a semantic range; Determining a scale level of the area and a scale state of the picture according to a scale code, the scale code comprising information about the picture; and Generating the slideshow of the image by scaling the image centered on the semantic region according to the scale level and the scaling state.

Computer-readable storage medium according to claim 22, characterized in that the Data in the scale code includes: a first number of reductions; a second number of enlargements one relationship the semantic area area to the picture surface; and a distance between the semantic area center and the center of the picture.

Computer-readable storage medium according to claim 22, characterized in that the semantic analysis involves a human face detection.

Computer-readable storage medium with computer-executable Instructions for generating a slide show of an image by executing the following steps: To run a non-semantic analysis of the image for capturing non-semantic regions; Select one first area and a second area from the non-semantic areas; Determine a first scale level of the first area and a second scale level Scaling levels of the second area; and Create the slideshow of the image by panning the image from the first area the second area during a gradual Scaling the image from the first scaling level to the second one Scaling level.

Computer-readable storage medium according to claim 25, characterized in that the non-semantic Analysis includes a symmetric area detection.

Computer-readable storage medium according to claim 25, characterized in that the non-semantic Area a neighborhood with a predetermined area around one Embrace focus of the image and that information of the focus in the Identifier of the image are saved in Exif format.

Computer-readable storage medium according to claim 25, characterized in that the Picture along a swivel path over at least two of the non-semantic ones Panels is panned.

Computer-readable storage medium according to claim 25, characterized in that the first scaling level corresponding to a first ratio of the first area area to the picture surface and a first distance between the first loading center and the Image center and the second scaling level corresponding to a second relationship the second area area to the picture surface and a second distance between the second area center and the center of the image.

Computer-readable storage medium according to claim 25, characterized by a step of determining a scaling state of the image corresponding to the scaling state of at least one previous image.