WO2000039998A2

WO2000039998A2 - System and method for recording and broadcasting three-dimensional video

Info

Publication number: WO2000039998A2
Application number: PCT/US1999/031233
Authority: WO
Inventors: Amber Davidson; Robert Boatright
Original assignee: Chequemate International, Inc.
Priority date: 1998-12-30
Filing date: 1999-12-30
Publication date: 2000-07-06
Also published as: WO2000039998A3; AU2487000A

Abstract

The present invention is directed to systems and methods for synthesizing a three-dimensional video stream from a two-dimensional video source (see fig.9). A frame from the two-dimensional video source is digitized and split into a plurality of field (fig.9, 202, 204, 206). Each field contains a portion of the information in the frame. The fields are then separately processed and transformed to introduce visual clues that, when assembled with the other fields, will be interpreted by a viewer as a three-dimensional image (fig.9, 212). The present invention is also directed to systems and methods for recording (fig.9, 214) and broadcasting (fig.9, 226) the three-dimensional video stream once synthesized from the two-dimensional video source. Further aspects of the invention include a selective scaling system and method (see fig.10) and a dynamic variance of temporal and/or spatial delay system and method (see fig.11).

Description

SYSTEM AND METHOD FOR RECORDING AND BROADCASTING THREE-DIMENSIONAL VIDEO

BACKGROUND OF THE INVENTION 1. The Field of the Invention

The present invention relates to systems and methods for processing, recording, broadcasting, and displaying video imagery.

2. The Prior State of the Art Realistic three-dimensional video is useful in entertainment, business, industry, and research. Each area has differing requirements and differing goals. Some systems that are suitable for use in one area are totally unsuitable for use in other areas due to the differing requirements. In general, however, three-dimensional video imagery must be comfortable to view for extended periods of time without having the viewing system impart stress and eye strain. In addition, systems should be of sufficient resolution and quality to allow for a pleasing experience. However, prior art systems have not always accomplished these goals in a sufficient manner.

Any approach designed to produce three-dimensional video images relies on the ability to project a different video stream to each eye of the viewer. The video streams contain visual clues that are inteφreted by the viewer as a three-dimensional image. Many different systems have been developed to present these two video streams to different eyes of an individual. Some systems utilize twin screen displays using passive polarized or differently colored viewing lenses and glasses that are worn by the viewer in order to allow each eye to perceive a different video stream. Other approaches use field or frame multiplexing which utilizes a single display screen that quickly switches between the two video streams. These systems typically have a pair of shuttered glasses that are worn by an individual and the shutters alternately cover one eye and then the other in order to allow each eye to perceive a different video stream. Finally, some systems, such as those commonly used in virtual reality systems, use dual liquid crystal or dual CRT displays that are built into an assembly worn on the viewers head. Other technologies include projection systems and various auto stereoscopic systems that do not require the wearing of glasses.

Prior art systems that generate and display three-dimensional video imagery have typically taken one of two approaches. The first approach has been to employ a binocular system, e.g., two lenses or two cameras to produce two channels of visual information. The spacial offset of the two channels creates a parallax effect that mimics the effect created by an individual's eyes. The key factor in producing high quality stereoscopic video that uses two cameras is the maintenance of proper alignment of the two channels of image data. The alignment of the camera lenses must be maintained and the video signals generated by the cameras must maintain a proper temporal alignment as they are processed by system electronics or optics. Misalignment will be perceived as distortion to a viewer. Twin screen viewing systems are known to be particularly prone to misalignment, tend to be bulky and cumbersome, and tend to be rather expensive due to the cost of multiple displays. Single screen solutions which multiplex fields or frames tend to minimize the problems associated with dual display monitors, yet these systems also rely on the accuracy of alignment of the input video data. The second approach taken by various systems has been an attempt to convert an input two-dimensional video signal into a form that is suitable for stereoscopic display. These systems traditionally have split the two-dimensional video signal into two separate channels of visual information and have delayed one channel of video information with respect to the other channel of video information. Systems which synthesize a simulated three-dimensional scene from two-dimensional input data tend to be somewhat less expensive due to the reduced hardware requirements necessary to receive and process two separate channels of information.

In addition, such systems may utilize any conventional video source rather than requiring generation of special video produced by a stereoscopic camera system. The reliance on temporal shifting of portions of the data in order to create a simulated three-dimensional scene, however, does not work well for objects that are not moving in the scene. Thus, there presently does not exist a system that can produce high quality simulated three-dimensional video from a two-dimensional input signal.

Another factor limiting the commercial success of traditional three-dimensional video has been adverse physical reactions including eye strain, headaches, and nausea experienced by a significant number of viewers of these systems. This is illustrated, for example, by the three-dimensional movies that were popular in the 1950s and 1960s. Today, however, outside of theme parks and similar venues, these movies are typically limited to less than about thirty minutes in length, because the average viewer tolerance for this media is limited. Viewer tolerance problems seem to be intrinsic to the methodology of traditional stereoscopy, and result from the inability of these systems to realistically emulate the operation of the human visual system. Such systems also seem to suffer from the inability to account for the central role of the human brain and the neuro-cooperation between the brain and eyes for effective visual processing. Additionally, distribution of three-dimensional video is currently limited. If a consumer desires to view three-dimensional video, he must generally either visit a business dedicated to showing three-dimensional video or purchase a three-dimensional decoder box, glasses, and one of the handful of titles converted to three-dimensional. A limited amount of media is currently available in three-dimensional. Accordingly, the opportunities for viewing three-dimensional video are likewise limited.

In summary, prior art systems have suffered from poor image quality, low user tolerance, and high cost. In would be an advancement in the art to produce a three-dimension video system that did not suffer from these problems. It would also be an advancement in the art to provide a manner of mass distribution of three-dimensional video media and a system for efficiently generating three-dimensional video titles, recording those titles, and distributing those titles to viewers.

SUMMARY AND OBJECTS OF THE INVENTION

The problems of the prior art have been successfully overcome by the present invention which is directed to systems and methods for synthesizing a simulated three-dimensional video image from a two-dimensional input video signal. The present invention is relatively inexpensive, produces high quality video, and has high user tolerance. The systems of the present invention do not rely on temporal shifting in order to create a simulated three-dimensional scene. However, certain embodiments may use temporal shifting in combination with other processing to produce simulated three-dimensional video from a two-dimensional video source.

Traditional video sources, such as an NTSC compatible video source is composed of a sequence of frames that are displayed sequentially to a user in order to produce a moving video image. The frame rate for NTSC video is thirty frames per second. Frames are displayed on a display device, such as a monitor or television, by displaying the individual horizontal scan lines of the frame on the display device. Traditionally, televisions have been designed to display the frame by interlacing two different fields. In other words, the television first displays all the odd numbered scan lines and then interlaces the even numbered scan lines in order to display a complete frame. Thus, a frame is typically broken down into an even field which contains the even numbered scan lines and an odd field which contains the odd numbered scan lines.

The present invention takes a two-dimensional video input signal and digitizes the signal so that it can be digitally processed. In one embodiment, the digitized frame is separated into the even field and the odd field. The even field and/or the odd field are then processed through one or more transformations in order to impart characteristics to the field that, when combined with the other field, and properly displayed to a viewer will result in a simulated three-dimensional video stream. The fields are then placed in a digital memory until they are needed for display. When the fields are needed for display, they are extracted from the digital memory and sent to the display device for display to the user.

The fields are displayed to the user in such a manner that one field is viewed by one eye and the other field is viewed by the other eye. Many mechanisms may be used to achieve this, including the various prior art mechanisms previously discussed. In one embodiment, the system utilizes a pair of shuttered glasses that are synchronized with the display of the different fields so that one eye is shuttered or blocked during the display of one field and then the other eye is shuttered or blocked during the display of the other field. By alternating the fields in this manner, three-dimensional video may be viewed on a conventional display device, such as a conventional television. The mind, when receiving signals from the eyes, will interpret the visual clues included in the video stream and will fuse the two fields into a single simulated three-dimensional image.

The fields may be spatially offset, temporally offset, or may be transformed with a combination of both temporal and spatial offset. Additionally, a frame may be only partially transformed, such that certain objects within a frame are transformed into three-dimensional, while other objects remain two-dimensional.

Thus, in one embodiment, a system for broadcasting and displaying a three dimensional video stream created from a two dimensional video stream is provided. Under this embodiment, the two dimensional video stream comprises a plurality of video frames intended to be sequentially displayed on a display device, each of said video frames comprising at least a first field and a second field.

Preferably, the system comprises a receiving module configured to receive a frame of said two dimensional video stream and for digitizing said frame so that said frame can be further processed by the system. The system may also comprise a separating module configured to separate said frame into at least a first field and a second field, each of said fields containing a portion of the video data in said frame. Additionally, a transforming module may also be provided. In one embodiment, the transforming module is configured to transform at least one of said first field or said second field using a selected transform that will produce a simulated three-dimensional video frame when said first field and said second field are recombined and displayed on a display device.

The system also preferably comprises a recombining module, a broadcasting station, and a decoding module. In one embodiment, the recombining module is configured to recombine said first field and said second field for transferring said recombined first field and second field to a display device in order to create said simulated three-dimensional video frame The broadcasting station is preferably configured to broadcast the first and second fields to a plurality of viewers. The decoding module is preferably configured to control said display device so that said first field is viewed by one eye of an individual viewing said display device and said second field is viewed by the other eye of the individual. In further embodiments, the recombining module is configured to recombine said first field and said second field without temporally shifting either said first field or said second field. Additionally, the transforming module may be configured to transform at least one of said first field or said second field with a spatial transformation or alternatively, with a temporal transformation.

In one embodiment, the selected transform comprises a skew transform that skews one field in the horizontal direction relative to the other field. In an alternative embodiment, the selected transform comprises a skew transform that skews one field in the vertical direction relative to the other field. Additionally, the selected transform may comprise a shift transform that shifts one field in the horizontal direction relative to the other field and may also comprise a shift transform that shifts one field in the vertical direction relative to the other field. The present invention also comprises a method for broadcasting a video stream that is synthesized from two-dimensional video into three-dimensional video. In one embodiment, the method comprises the steps of receiving a two-dimensional digitized video stream comprising a plurality of video frames that are intended to be displayed sequentially on a display device, each frame comprising a plurality of fields which together contain all digital video information to be displayed for a frame; generating information adapted to transform at least one of said plurality of fields in a manner that renders the video frames to collectively appear to a viewer to be at least partially three dimensional; and broadcasting the information from a broadcasting station to be received by a viewer station.

In further embodiments, the method also comprises providing a decoding device at the viewer station; receiving the information broadcast from the broadcasting station into the decoding device; transforming the video frames for transmission in three-dimensional on a television set; and displaying the video frames in three-dimensional on a television set. The information may comprise a spatial transformation of the field and may also comprise a temporal transformation of the field. The method may also comprise receiving and displaying a simulated three- dimensional video frame on a display device disposed at said viewer station by alternating said first field and said second field such that said first field is viewed by one eye of an individual viewing the display device and said second field is viewed by the other eye of the individual. Other alternative steps may include separating said plurality of fields of said two- dimensional digital video frame into at least a first field and a second field; extracting from said video stream a single two-dimensional digital video frame for processing; and separating said plurality of fields of said single two-dimensional digital video frame into at least a first field and a second field. This may be accompanied by spatially transforming at least one of said first field or said second field in order to produce a simulated three-dimensional video frame when said first field and said second field are recombined and viewed on a display device. Additionally, the method may comprise displaying said first field and said second field without temporally shifting either said first field or said second field in order to create said simulated three-dimensional video frame by displaying said first field and said second field on a display device within a single frame such that said first field is viewed by one eye of an individual viewing the display device and said second field is viewed by the other eye of the individual. In certain embodiments, the first field and said second field each comprise a plurality of pixels arranged in a matrix having a plurality of rows and columns, and said spatial transformation step skews one field in the vertical direction relative to the other field by performing at least the steps of selecting a total skew value; selecting a starting column of pixels; and for each column after said selected starting column, shifting the column relative to the preceding column in a chosen vertical direction by a predetermined value derived from the total skew value.

Additionally, said transformation step may comprise spatial transformation and further comprise shifting one field in the vertical direction relative to the other field. The spatial transformation step in one embodiment scales one field in the vertical direction relative to the other field. The method may further comprise the step of temporally shifting at least one of said first field or said second field in order to introduce a time delay relative to its original location in said two dimensional video stream.

Also provided within the present invention is a system and method for recording a three-dimensional video stream that is synthesized from a two-dimensional video stream. In one embodiment, the system comprises the components described above and in addition, a recording station.

The three-dimensional recording method in one embodiment comprises the steps of receiving a two-dimensional digitized video stream comprising a plurality of video frames that are intended to be displayed sequentially on a display device, each frame comprising a plurality of fields which together contain all digital video information to be displayed for a frame and extracting from said video stream a single two-dimensional digital video frame for processing.

The method also preferably comprises the steps of separating said plurality of fields of said single two-dimensional digital video frame into at least a first field and a second field and spatially transforming at least one of said first field or said second field in order to produce a simulated three-dimensional video frame when said first field and said second field are recombined and viewed on a display device.

Additionally, the method also preferably comprises recording the plurality of fields on a suitable media and displaying said first field and said second field without temporally shifting either said first field or said second field. This is preferably conducted in order to create said simulated three-dimensional video frame by displaying said first field and said second field on a display device within a single frame such that said first field is viewed by one eye of an individual viewing the display device and said second field is viewed by the other eye of the individual.

Additionally, the present invention provides a system and method for selective scaling wherein certain objects within a frame are scaled in time or space to appear in three dimensions while the nonmoving or slower moving objects within a frame are not transformed.

A further aspect of the present invention is a system and method for dynamic variance of temporal and/or spatial delay. Under this aspect, the temporal or spatial delay is dynamically varied according to what is taking place within the frames. If not much action is occurring, the temporal delay is increased. Conversely, if significant action is occurring, the temporal delay is decreased. Accordingly, for instance, during cut scenes, the transformation is scaled down or eliminated in order to make viewing easier on the eyes. Likewise, with spatial delays, when a cut scene is detected, the field that is ahead is preferably frozen until the field that is behind catches up. The spatial delay may also be increased or decreased to enhance 3-D effects according to the amount of action detected as occurring in the frame.

Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other objects and features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.

BRTEF DESCRIPTION OF THE DRAWINGS In order that the manner in which the above-recited and other advantages and objects of the invention are obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

Figure 1 is a diagram illustrating the conceptual processing that occurs in one embodiment of the present invention;

Figure 2 illustrates the conceptual processing that takes place in another embodiment of the present invention;

Figures 3 A through 3D illustrate various transformations that may be used to impart visual clues to the synthesized three-dimensional scene;

Figures 4A through 4D illustrate a specific example using a scaling transformation; Figure 5 illustrates temporal transformation; and

Figures 6 A through 8B illustrate the various circuitry of one embodiment of the present invention.

Figure 9 is a schematic block diagram illustrating the components of one embodiment of a system for recording and broadcasting three-dimensional video.

Figure 10 is a schematic block diagram illustrating the components of one embodiment of a system for selective scaling of three-dimensional video.

Figure 11 is a schematic block diagram illustrating the general components of one embodiment of a system and method for dynamic variance of temporal and/or spatial delay. Figure 12 is one embodiment of a state diagram of a system and method of dynamic variance of temporal and/or spatial delay.

Figure 13 is one embodiment of a timing diagram of the system and method of dynamic variance of temporal and/or spatial delay.

Figure 14 is a schematic block diagram illustrating one embodiment of the components of a histogram circuit of the system and method of Figure 11.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Synthesizing three-dimensional video from a two-dimensional source One embodiment of the present invention is directed to systems and methods for synthesizing a three-dimensional video stream from a two-dimensional video source. The video source may be any source of video such as a television signal, the signal from a VCR, DVD, video camera, cable television, satellite TV, or any other source of video. Since the present invention synthesizes a three-dimensional video stream from a two-dimensional video stream no special video input source is required. However, if a video source produces two video channels, each adapted to be viewed by an eye of a user, then the present invention may also be used with appropriate modification. From the discussion below, those skilled in the art will quickly recognize the modifications that should be made.

The following discussion presents the basics of a video signal and is intended to provide a context for the remainder of the discussion of the invention. Although specific examples and values may be used in this discussion, such should be construed as exemplary only and not as limiting the present invention. As previously explained, the present invention may be adapted to be used with any video source.

In general, a video signal is comprised of a plurality of frames that are intended to be displayed in a sequential fashion to the user or viewer of a display device in order to provide a moving scene for the viewer. Each frame is analogous to the frame on a movie film in that it is intended to be displayed in its entirety before the next frame is displayed. Traditional display devices, such as television sets or monitors, may display these video frames in a variety of ways. Due to limitations imposed by early hardware, televisions display a frame in an interlaced manner. This means that first one sequence of lines is scanned along the monitor and the then another sequence of lines is scanned along the monitor. In this case, a television will scan the odd numbered lines first and then return and scan the even numbered lines. The persistence of the phosphor on the television screen allows the entire frame to be displayed in such a manner that the human eye perceives the entire frame displayed at once even though all lines are not displayed at once. The two different portions of the frame that are displayed in this interlaced manner are generally referred to as fields. The even field contains the even numbered scan lines, and the odd field contains the odd numbered scan lines. Due to hardware advances, many computer monitors and some television sets are capable of displaying images in a non-interlaced manner where the lines are scanned in order. Conceptually, the even field and odd field are still displayed, only in a progressive manner. In addition, it is anticipated that with the introduction of advanced TV standards, that there may be a move away from interlaced scanning to progressive scanning. The present invention is applicable to either an interlaced scanning or a progressive scanning display. The only difference is the order in which information is displayed.

As an example of the particular scan rates, consider standard NTSC video. Standard NTSC video has a frame rate of thirty frames per second. The field rate is thus sixty fields per second since each frame has two fields. Other video sources use different frame rates. This, however, is not critical to the invention and the general principles presented herein will work with any video source.

Referring now to Figure 1, a general diagram of the processing of one embodiment of the present invention is illustrated. In Figure 1, an input video stream, shown generally as 20 is comprised of a plurality of frames 22 labeled FI through F8. In Figure 1, frame 24 is extracted for processing. As illustrated in Figure 1, frame 24 is comprised of a plurality of scan lines. The even scan lines of frame 24 are labeled 26 and the odd scan lines of frame 24 are labeled 28. This is done simply for notational puφoses and to illustrate that a frame, such as frame 24, may be divided into a plurality of fields. Although two fields are illustrated in Figure 1, comprising even scan lines 26 and odd scan lines 28, other delineations may be made. For example, it may be possible to divide the frame into more than two fields.

The frame is digitized by encoder 30. Encoder 30, among other things, samples the video data of frame 24 and converts it from analog format to a digital format. Encoder 30 may also perform other processing functions relating to color correction/translation, gain adjustments, and so forth. It is necessary that encoder 30 digitize frame 24 with a sufficient number of bits per sample in order to avoid introducing unacceptable distortion into the video signal. In addition, it may be desirable to sample various aspects of the video signal separately. In NTSC video, it may be desirable to sample the luminescence and chrominance of the signal separately. Finally, the sample rate of encoder 30 must be sufficient to avoid introducing aliasing artifacts into the signal. In one embodiment, a 13.5 MHZ sample rate using sixteen bits to represent the signal has been found to be sufficient for standard NTSC video. Other video sources may require different sample rates and sample sizes. In Figure 1, the digitized frame is illustrated as 32 Digitized frame 32 is processed by modification processing component 34.

Modification processing component 34 performs various transformations and other processing on digitized frame 32 in order to introduce visual clues into the frame that, when displayed to a viewer, will cause the frame to be inteφreted as a three-dimensional image. A wide variety of processing may be utilized in modification processing component 34 to introduce appropriate visual clues. Various transformations and other processing are discussed below. In general, however, modification processing component 34 will prepare the frame to be displayed to a user so that the frame is inteφreted as a three-dimensional object. The transformations and other processing performed by modification processing component 34 often entail separating frame 32 into two or more components and transforming one component relative to the other. The resultant modified frame is illustrated in Figure 1 as 36.

After the frame has been modified, the next step is to save the modified frame and display it on a display device at the appropriate time and in the appropriate fashion. Depending on the processing speed of encoder 30 and modification processor 34, it may be necessary to hold modified frame 36 for a short period of time. In the embodiment illustrated in Figure 1, controller 38 stores modified frame 36 in memory 40 until it is needed. When it is needed, modified frame 36 is extracted and sent to the appropriate display device to be displayed. This may require controller 38, or another component, to control the display device or other systems so that the information is displayed appropriately to the viewer. The exact process of extracting the modified frame and displaying it on a display device will be wholly dependent upon the type of display device used. In general, it will be necessary to use a display device that allows one eye of a viewer to view a portion of the frame and another eye of the viewer to view the other portion of the frame. For example, one display system previously described separates the frame into two fields that are multiplexed on a single display device. A pair of shuttered glasses, or other shuttering device is then used so that one field is viewed by one eye while the other eye is covered and then the other field is viewed while the shutter switches. In this manner, one eye is used to view one field and the other eye is used to view the other field. The brain will take the visual clues introduced by modification processing component 34 and fuse the two fields into a single image that is inteφreted in a three-dimensional manner. Other mechanisms may also be utilized. These mechanisms include multidisplay systems where one eye views one display and the other eye views the other display. The traditional polarized or colored approach which utilizes a pair of passive glasses may also be used, as previously described. In the embodiment illustrated in Figure 1, controller 38 is illustrated as controlling a shuttering device 42 in order to allow images multiplexed on monitor 44 to be viewed appropriately. In addition, decoder 46 converts modified frame 36 from a digital form to an analog form appropriate for display on monitor 44. Decoder 46 may also generate various control signals necessary to control monitor 44 in conjunction with shuttering device 42 so that the appropriate eye views the appropriate portion of frame 36. Decoder 46 may also perform any other functions necessary to ensure proper display of frame 36 such as retrieving the data to be displayed in the appropriate order.

Referring now to Figure 2, a more detailed explanation of one embodiment of the present invention is presented. The embodiment of Figure 2 has many elements in common with the embodiment illustrated in Figure 1. However, a more detailed explanation of certain processing that is performed to modify the frame from two-dimensional to three-dimensional is illustrated.

In Figure 2 a video frame, such as frame 48, is received and encoded by encoder 50. Encoder 50 represents an example of means for receiving a frame from a two-dimensional video stream and for digitizing the frame so that the frame can be processed. Encoder 50, therefore, digitizes frame 48 among other things. The digitized frame is illustrated in Figure 2 as digitized frame 52. Encoder 50 may also perform other functions as previously described in conjunction with the encoder of Figure 1. Digitized frame 52 is split by splitter 54 into odd field 56 and even field 58.

Splitter 54 represents an example of means for separating a frame into a plurality of fields. Odd field 56 and even field 58 are simply representative of the ability to split a frame, such as digitized frame 52, into multiple fields. When interlaced display devices are utilized, it makes sense to split a frame into the even and odd fields that will be displayed on the device. In progressively scanned display devices, even and odd fields may be used, or other criteria may be used to split a frame into multiple fields. For example, at one time it was proposed that an advanced TV standard may use vertical scanning rather than the traditional horizontal scanning. In such a display device, the criteria may be based on a vertical separation rather than the horizontal separation as illustrated in Figure 2. All that needs happen is that splitter 54 separate frame 52 into at least two fields that will be processed separately. Odd field 56 and even field 58 are processed by modification processing components 60 and 62, respectively. Modification processing component 60 and 62 represent the conceptual processing that occurs to each of the fields separately. In actuality, the fields may be processed by the same component. Modification processing component 60 and 62 represent but one example of means for transforming at least one field using a selected transform. Such a means may be implemented using various types of technologies such as a processor which digitally processes the information or discrete hardware which transforms the information in the field. Examples of one implementation are presented below. In Figure 2, modified odd field 64 and modified even field 66 represent the fields that are transformed by modification processing components 60 and 62, respectively. Note that although Figure 2 illustrates modified field 64 and 66, in various embodiments one, the other, or both fields may be modified. The fields may be transformed in any manner that is desirable to introduce appropriate visual clues into the field, as previously explained.

Examples of some transforms that have been found useful to introduce visual clues in order to convert a two-dimensional video stream into a three-dimensional video stream are presented and discussed below. In general, such transforms involve shifting, scaling, or otherwise modifying the information contained in one or both fields. Note that the transforms performed by modification processing components 60 and 62 may be performed either in the horizontal direction, the vertical direction, or both.

Modified fields 64 and 66 are then stored by controller 68 in memory 70 until they are needed for display. Once they are needed for display, controller 68 will extract the information in the desired order and transfer the information to decoder 72. If the display requires an interlaced display of one field and then the other, controller 68 will transfer one field and then the other field for appropriate display. If, however, the display is progressively scanned, then controller 68 may supply the information in a different order. Thus, controller 68 represents an example of means for recombining fields and for transferring the recombined fields to a display device. In the alternative, certain of this functionality may be included in decoder 72.

Decoder 72 is responsible for taking the information and converting it from a digital form to an analog form in order to allow display of the information. Decoder 72 may also be responsible for generating appropriate control signals that controls the display. In the alternative, controller 68 may also supply certain control signals in order to allow proper display and inteφretation of the information. As yet another example, a separate device, such as a processor or other device, may be responsible for generating control signals that control the display device so that the information is properly displayed. From the standpoint of the invention, all that is required is that the information be converted from a digital format to a format suitable for use with the display device. Currently, in most cases this will be an analog format, although other display devices may prefer to receive information in a digital format. The display device is then properly controlled so that the information is presented to the viewer in an appropriate fashion so that the scene is inteφreted as three-dimensional. This may include, for example, multiplexing one field and then the other on the display device while, simultaneously, operating a shuttering device which allows one eye to view one field and the other eye to view the other field. In the alternative, any of the display devices previously discussed may also be used with appropriate control circuitry in order to allow presentation to an individual. In general, however, all these display systems are premised on the fact that one eye views a certain portion of the information and another eye views a different portion of the information. How this is accomplished is simply a matter of choice, given the particular implementation and use of the present invention.

Referring next to Figures 3 A through 3D, some of the transforms that have been found useful for providing visual clues that are included in the data and inteφreted by a viewer as three-dimensional. The examples illustrated in Figures 3A through 3D present transformations in the horizontal direction. Furthermore, the examples illustrate transformation in a single horizontal direction. Such should be taken as exemplary only. These transformations may also be used in a different horizontal direction or in a vertical direction. Finally, combinations of any of the above may also be used. Those of skill in the art will recognize how to modify the transformations presented in Figures 3 A through 3D as appropriate.

Referring first to Figure 3A, a skew transform is presented. This transform skews the data in the horizontal or vertical direction. In Figure 3 A a field that is to be transformed is illustrated generally as 74. This field has already been digitized and may be represented by a matrix of data points. In Figure 3 this matrix is five columns across by three rows down. The transformations used in the present invention will shift or otherwise modify the data of the field matrix. Typical field matrices are hundreds of columns by hundreds of rows. For example, in NTSC video an even or odd field may contain between eight and nine hundred columns and two to three hundred rows. The skew transform picks a starting row or column and then shifts each succeeding row or column by an amount relative to the column or row that precedes it. In the example in Figure 3 A, each row is shifted by one data point relative to the row above it. Thus, the transformed field, illustrated generally as 76, has row 78 being unshifted, row 80 being shifted by one data point, and row 82 being shifted by two data points. As illustrated in Figure 3 A, the data points of original matrix are thus bounded by dashed lines 84 and takes on a skew shape. The total shift from the beginning row to the ending row is a measure of the amount of skew added to the frame.

When each row is shifted, the data points begin to move outside the original matrix boundaries, illustrated in Figure 3A by solid lines 86. As the data points are shifted, "holes" begin to develop in the field matrix as illustrated by data points 88. Thus, the question becomes what to place in data points 88. Several options may be utilized. In one embodiment, as the data points are shifted they are wrapped around and placed in the holes created at the beginning of the row or column. Thus, in row 80 when the last data point was shifted outside the field matrix boundary it would be wrapped and placed at the beginning of the row. The process would be similar for any other rows. In the alternative, if the holes opened in the field matrix lie outside the normal visual range presented on the display, then they may simply be ignored or filled with a fixed value, such as black. In the alternative, various inteφolation schemes may be used to calculate a value to place in the holes. As previously mentioned, this transformation may be performed in the horizontal direction, the vertical direction, or a combination of both.

Referring next to Figure 3B, a shifting transform is presented. In the shifting transform, each row or column in the field matrix is shifted by a set amount. In Figure 3B, the unshifted field matrix is illustrated as 90, while the shifted field matrix is illustrated as 92. As indicated in Figure 3B, this again places certain data points outside the boundaries of the field matrix. The data points may be wrapped to the beginning of the row and placed in the holes opened up, or the holes that opened up may be filled with a different value and the data points that fall beyond the boundaries of the field matrix may simply be ignored. Again, various schemes may be used to fill the holes such as filling with a fixed data point or using a myriad of inteφolation schemes.

Figures 3C and 3D illustrate various scaling transformations. Figure 3C illustrates a scaling transformation that shrinks the number of data points in the field matrix while Figure 3D illustrates a scaling transformation that increases the number of data points. This would correspond to making something smaller and larger respectively. In Figure 3C, the unsealed matrix is illustrated as 96 while the scaled field matrix is illustrated by 98. When a scaling is applied that reduces the number of data points, such as a scaling illustrated in Figure 3C, appropriate data points are simply dropped and the remainder of the data points are shifted to eliminate any open space for data points that were dropped. Because the number of data points is reduced by the scaling, values must be placed in the holes that are opened by the reduced number of data points. Again, such values may be from a fixed value or may be derived through some inteφolation or other calculation. In one embodiment, the holes are simply filled with black data points.

Figure 3D represents a scaling that increases the number of data points in a field matrix. In Figure 3D the unsealed field matrix is illustrated by 100 and the scaled field matrix is illustrated by 102. Generally, when a matrix of data points is scaled up, the "holes" open up in the middle of the data points. Thus, again a decision must be made as to what values to fill in the holes. In this situation, it is typically adequate to inteφolate between surrounding data values to arrive at a particular value to put in a particular place. In addition, since the data points grow, any data points that fall outside the size of the field matrix are simply ignored. This means that the only values that must be inteφolated and filled are those that lie within the boundaries of the field matrix.

Although the transformations illustrated in Figures 3 A and 3D have been applied separately, it is also possible to apply them in combination with each other. Thus, a field may be scaled and then skewed or shifted and then skewed or scaled and then shifted.

Furthermore, other transformations may also be utilized. For example, transformations that skew a field matrix from the center outward in two directions may be useful. In addition, it may also be possible to transform the values of the data points during the transformation process. In other words, it may be possible to adjust the brightness or other characteristic of a data point during the transformation.

Referring next to Figures 4A through 4D, a specific example is presented in order to illustrate another aspect of the various transformations. It is important to note that when a field is shifted or otherwise transformed, it is possible to pick an alignment point between the transformed field and the other field. For example, it may be desirable to align the fields at the center and then allow the skewing, shifting, scaling, or other transforms to grow outward from the alignment point. In other words, when fields are transformed it is generally necessary to pick an alignment point and then shift the two fields in order to align them to the alignment point. This will determine how the values are then used to fill in the holes that are opened up. As a simple example, consider a skew transform which begins not at the first row as illustrated in Figure 3 A but at the center row. The rows above the center row may then be shifted one direction and the rows below the center row may then be shifted the other direction. Obviously such a skew transform would be different than a skewed transform which began at the top row and then proceeded downward or a skewed transform that began at the bottom row and then proceeded upward.

Referring first to Figure 4 A, an untransformed frame 104 is illustrated. This frame comprises six rows, numbered 105 through 110 and seven columns. The rows of the frame are first separated into an even field and an odd field. Odd field 112 contains rows 105, 107, and 109 while even field 114 contains rows 106, 108 and 110. Such a function may be performed, for example, by a splitter or other means for separating a frame into a plurality of fields. Splitter 54 of Figure 2 is but one example.

Referring next to Figure 4B, the process of transforming one or both fields is illustrated. In the example illustrated in 4B, odd field 112 will be transformed while even field 114 remains untransformed. The untransformed fields are illustrated on the left-hand side of Figure 4B while the transformed fields are illustrated on the right-hand side of Figure 4B. In this case, a scaling transform which increases the number of data points in the horizontal direction is applied to odd field 112. This results in transformed odd field 116. As previously explained in conjunction with Figure 3D, when a transform that expands the number of data points is applied, "holes" will open up between various data points in the field matrix. In Figure 4B, these holes are illustrated by the grey data points indicated by 118. These "holes" may be filled in any desired manner. As previously explained, a good way to fill these holes is to inteφolate among the surrounding data points in order to arrive at a value that should be placed therein. Referring next to Figure 4C, the alignment issues that can be created when a transform is applied are illustrated. Such a situation is particularly apparent when a transform is applied that changes the number of data points in a field. For example, transformed odd field 116 has ten columns instead of the normal seven. In such a situation, as previously explained it is desirable to pick an alignment point and shift the data points until the field matrices are aligned. For example, suppose it is desired to align the second column of transformed odd field 116 with the first column of even field 114. In such a situation, the fields would appropriately shifted as shown on the right hand side of Figure 4C. The edge of the field matrix is then indicated by dashed lines 120 and any data points that fall outside those lines can simply be discarded.

Picking an alignment point and performing the shifting in order to properly align the fields is an important step. Depending on the alignment point selected and the shifting that is performed, very different results may be achieved when the reconstructed simulated three-dimensional frame is displayed. Shifting tends to create visual clues that begin to indicate depth. In general, shifting one direction will cause something to appear to move out of the screen while shifting the other direction will cause something to appear to move into the background of the screen. Thus, depending on the alignment point and the direction of shift, various features can be brought in or out of the display. Furthermore, these effects may be applied to one edge of the screen or the other edge of the screen depending on the alignment point selected. Since most action in traditional programs takes place near the center of the screen, it may be desirable to apply transformations that enhance the three dimensional effect at the center of the screen.

Referring next to Figure 4D, the process of recombining the fields to create a simulated three-dimensional frame is illustrated. The left-hand side of Figure 4D illustrates transformed odd field 116 that has been cropped to the appropriate size. Figure 4D also illustrates even field 114. The frame is reconstructed by interleaving the appropriate rows as indicated on the right-hand side of Figure 4D. The reconstructed frame is illustrated generally as 122. Such a reconstruction may take place, for example, when the fields are displayed on a display device. If the display device is an interlaced display, as for example a conventional television set, then the odd field may be displayed after which the even field is displayed in order to create the synthesized three-dimensional frame.

In various embodiments of the present invention, the synthesized three-dimensional frame is referred to as being constructed from a recombining of the various fields of the frame. The reconstructed frame is then illustrated as being displayed on a display device. In actuality, these two steps may take place virtually simultaneously. In other words, in the case of an interlaced monitor or display device, one field is displayed after which the other field is displayed. The total display of the two fields, however, represents the reconstructed frame. Similarly, if a two-display system is utilized, then the total frame is never physically reconstructed except in the mind of the viewer. However, conceptually the step of creating the synthesized three-dimensional frame by recombining the fields is performed. Thus, the examples presented herein should not be construed as limiting the scope of the invention, but the steps should be inteφreted broadly. The embodiments presented above have processed a frame and then displayed the same frame. In other words, the frame rate of the output video stream is equal to the frame rate of the input video stream. Technologies exist, however, that either increase or decrease the output frame rate relative to the input frame rate. It may be desirable to employ such technologies with the present invention.

In employing technologies that increase the output frame rate relative to the input frame rate, decisions must be made as to what data will be used to supply the increased frame rate. One of two approaches may be used. The first approach is simply to send the data of a frame more often. For example, if the output frame rate is doubled, the information of a frame may simple be sent twice. In the alternative, it may be desirable to create additional data to send to the display through further transformations. For example, two different transformations may be used to create two different frames which are then displayed at twice the normal frame rate.

The embodiments and the discussions presented above have illustrated how a single frame is broken down into two or more fields and those fields are then processed and then recombined to create a synthesized three-dimensional frame. An important aspect of the embodiments presented above is that they do not temporarily shift either of the fields when performing the synthesis of a three-dimensional frame. In other words, both the fields are extracted from a frame the fields are processed and then the fields are displayed within the exact same frame. In the alternative, however, with certain transformations it may be desirable to introduce a temporal transformation or a temporal shift into the processing that creates the synthesized three-dimensional frame. Referring next to Figure 5, the concept of temporal shifting is presented.

In Figure 5 an input video stream comprising a plurality of frames is illustrated generally as 124. In accordance with the present invention, a single frame is extracted for processing. This frame is illustrated in Figure 5 as 126. The frame is broken down into a plurality of fields, as for example field 128 and 130. As previously discussed, although two fields are illustrated, the frame may be broken into more than two fields if desired.

The individual fields are then processed by applying one or more transformations as illustrated in Figure 5 by modification processing components 132 and 134. Modified field 130 is illustrated as field 136. In the case of field 128, however, the embodiment illustrated in Figure 5 introduces a temporal shift as illustrated by delay 138. Delay 138 simply holds the transformed field for a length of time and substitutes a transformed field from a previous frame. Thus, a field from frame 1 may not be displayed until frames 2 or 3. A delayed field, illustrated in Figure 5 as 140, is combined with field 136 to create frame 142. Frame 142 is then placed in the output video stream 144 for proper display.

Referring next to Figures 6 A through Figures 8B, one embodiment of the present invention is presented. These figures represent circuit diagrams with which one of skill in the art is readily familiar. The discussion which follows, therefore, will be limited to a very high level which discusses the functionality incoφorated into some of the more important functional blocks. The embodiments illustrated in Figures 6A through 8B is designed to operate with a conventional display, such as a television and shuttered glasses which operate to alternatively block one eye and the other so that one field of the frame is seen by one eye and another field of the frame is seen by the other eye.

Referring first to Figure 6 A, a first part of the circuitry of the embodiment is presented. In Figure 6 A, processor 144 is illustrated. Processor 144 is responsible for overall control of the system. For example, processor 144 is responsible for receiving various user input commands, as from a remote control or other input devices in order to allow user input for various parameters of the system. Such inputs may, for example, adjust various parameters in the transforms that are used to produce the synthesized three-dimensional images. Such an ability allows a user to adjust the synthesized three-dimensional scene to suit his or her own personal tastes. Processor 144 will then provide this information to the appropriate components. In addition, processor 144 may help perform various transformations that are used in producing the synthesized three-dimensional scenes. Figure 6A also illustrates a schematic representation of shuttered glasses 150, which is discussed in greater detail below.

Figure 6B illustrates a block level connection diagram of video board 146. Video board 146 will be more particularly described in conjunction with Figures 7 A through 71 below. Video board 146 contains all necessary video circuitry to receive a video signal, digitize the video signal, store and receive transformed fields in memory, reconvert transformed fields back to analog signals, and provide the analog signals to the display device. In addition, video board 146 may contain a logic to generate control signals that are used to drive shuttered glasses used by this embodiment to produce a synthesized three-dimensional effect when worn by a viewer.

Block 148 of Figure 6C contains a schematic representation of the drivers which are used to drive the shuttered glasses. The shuttered glasses are illustrated schematically in Figure 6A by block 150. Figures 6D - 6F contain various types of support circuitry and connectors as for example, power generation and filtering, various ground connectors, voltage converters, and so forth. The support circuitry is labeled generally as 152.

Referring next to Figures 7A through 71 a more detailed schematic diagram of video board 146 of Figure 6B is presented. Video board 146 comprises decoder 154 (Figure 7 A), controller 156 (Figure 7B), memory 158 (Figures 7C and 7D), and encoder 162 (Figure 7E). In addition, in Figure 7F an alternate memory configuration is illustrated as block 160.

Various support circuitry is illustrated in Figures 7G through 71. Block 164 of Figure 7G contains various input circuitry that receives video and other data from a variety of sources. Block 165 of Figure 7G illustrates how the pinouts of video board 146 of Figure 6B translate into signals of Figures 7 A through 71. Block 166 of Figures 7H and 71 contains output and other support circuitry.

Decoder 154 (Figure 7A) is responsible for receiving the video signal and for digitizing the video signal. The digitized video signal is stored in memory 158 (Figures 7C and 7D) under the control of controller 156 (Figure 7B). Controller 156 is a highly sophisticated controller that basically allows information to be written into memory 158 while, information is being retrieved from memory 158 by encoder 162 (Figure 7E) for display. The various frames and fields of an input video received by decoder 154 may be identified from the control signals in the video data. The fields may then be separated out for processing and transformation, as previously described.

It should be noted that if transformations occur in the horizontal direction, then the transformation may be applied line by line as the field is received. If, on the other hand, a transformation occurs in the vertical direction, it may be necessary to receive the entire field before transformation can occur. The exact implementation of the transformations will be dependent upon various design choices that are made for the embodiment.

Turning now to controller 156 of Figure 7B, it should be noted that in addition to storing and retrieving information from memory 158, controller 156 also generates the control signals which drive the shuttered glasses. This allows controller 156 to synchronize the shuttering action of the glasses with the display of information that is retrieved from memory 158 and passed to encoder 162 for display on the display device.

Encoder 162 (Figure 7E) takes information retrieved from memory 158 and creates the appropriate analog signals that are then sent to the display device.

Alternate memory 160 (Figure 7F), which is more fully illustrated in Figures 8 A and 8B, is an alternate memory configuration using different component parts that may be used in place of memory 158. Figure 8A illustrates the various memory chips used by alternate memory 160. Figure 8B illustrate how the pinouts of Figure 7F translate into the signals of Figures 8 A and 8B in pinout block 161. Figure 8B also illustrates filtering circuitry 163.

In summary, the present invention produces high-quality, synthesized, three-dimensional video. Because the present invention converts a two-dimensional video source into a synthesized three-dimensional video source, the present invention may be used with any video source. The system will work, for example, with television signals, cable television signals, satellite television signals, video signals produced by laser disks, DVD devices, VCRs, video cameras, and so forth. The use of two-dimensional video as an input source substantially reduces the overall cost of creating three-dimensional video since no specialized equipment must be used to generate an input video source.

The present invention retrieves the video source, digitizes it, splits the video frame into a plurality of fields, transforms one or more of the fields, and then reassembles the transformed fields into a synthesized, three-dimensional video stream. The synthesized three-dimensional video stream may be displayed on any appropriate display device. Such display devices include, but are not limited to, multiplexed systems that use a single display to multiplex two video streams and coordinate the multiplexing with a shuttering device such as a pair of shutter glasses worn by a viewer. Additional display options may be multiple display devices which allow each eye to independently view a separate display. Other single or multidisplay devices are also suitable for use with the present invention and have been previously discussed.

Recording Transformed Three-Dimensional Video

In accordance with another aspect of the present invention, the three-dimensional conversion system and method mentioned above may also be used to record three- dimensional versions of video works for later viewing. The recorded and converted work may then be viewed with the use of a relatively simple and inexpensive synchronizer which synchronizes the viewing glasses with the three-dimensional video display.

One embodiment of a three-dimensional recording and broadcasting system 200 is shown in Figure 9. As seen in Figure 9, when recording a three-dimensional work, the work is first obtained from a two-dimensional source 202. The two-dimensional source 202 is typically as described above, and may be recorded in the NTSC format, and may be on DVD, VHS, Beta, or other video media. The two-dimensional video source 202 is preferably broken into fields 204, 206.

The two-dimensional video source 202 is in one embodiment input as a video stream through an encoding and transforming module 210. The encoding and transforming module 210 performs the steps of encoding and transforming the video stream to simulate three- dimensional video to the user in the manner described above. In so doing, the encoding and transforming module 210 may convert the video stream from the two-dimensional source 202 into a three-dimensional stream 212. The three-dimensional stream 212 contains odd and even fields 204, 206 which are transformed spatially and/or temporally to simulate a three- dimensional frame. The resulting three-dimensional video stream 212 is then recorded onto the appropriate media with a recording module 214. Once again, the media may be any suitable video media, including DVD, VHS, and Beta. The three-dimensional video stream 212 is passed from the encoding and transforming module 210 into the recording module 214. The recording module may be a readable DVD, a disk drive device, a VHS recorder, etc. The resultant recording 215 may then be used for later broadcast or private viewing within a user's home.

Broadcasting Three-Dimensional Video In a further aspect of the present invention, the system 200 may be used for broadcasting three-dimensional video. Accordingly, as seen in Figure 9, a three-dimensional video stream 212 is obtained. The three-dimensional video stream 212 may be obtained by conversion of a two-dimensional video source 202 as described. Alternatively, the three- dimensional video stream 212 may be produced by other transformation techniques, including spatial and temporal displacement of one field relative to the other. Additionally, the video stream may be received as production three-dimensional video shot by dual camera lenses or other known three-dimensional production techniques.

The three-dimensional video is transmitted with three-dimensional transformation information for later assembly at the viewer station. This information may be digital video fields for recombining at the viewer station or other suitable information to enable a decoder at the viewer station to assemble the video into three-dimensional video.

The three-dimensional video stream 212 may be recorded onto storage media such as the recording 215, or may be converted on-the-fly from a two-dimensional video source 202. The two-dimensional video source 202 may be live or a recording. In either case, the video stream 202 is passed through the encoding and transforming module 210 to produce three- dimensional transformation information. As discussed, in one embodiment, this information comprises a resulting three-dimensional video information stream 212.

The three-dimensional video information stream 212 may then be recorded and the recording supplied to a transmission station 216. Alternatively, the three-dimensional video stream 212 is transmitted directly to the transmission station 216. The transmission station 216 converts the three-dimensional transformation information into a suitable format for transmission to receiving stations 226. In the depicted embodiment, the format is MPEG video. The MPEG video is uplinked 220 to a satellite 222 at a high rate of speed, in one embodiment, 25 megabits per second.

Alternatively, the transmission may be by cable, radio magnetic frequencies, etc. The transmission may also be received directly by viewing stations 230. Referring again to the depicted embodiment, once the three-dimensional video information stream 212 is uplinked 220 to the satellite 222, the satellite 222 then transmits or broadcasts 224 the video stream 212 to a receiving station 226. In the depicted embodiment, the receiving station 226 is a cable company. A satellite dish 228 at the receiving station 226 receives the satellite transmission or broadcast 224. Of course, many such local cable companies 226 preferably receive the broadcast 224 at the same time.

The receiving stations 226 may be located at varying locations throughout the world. If necessary, the video stream 212 may be passed between several satellites or even several ground links prior to being received by the receiving station 226.

In one embodiment, the receiving station 226 decodes the video stream from the MPEG or other format in which it was transmitted. The result is a standard video stream 229 containing the three-dimensional transformation information described above. The video stream 229 is then transmitted through communication channels 227 to individual users. The communication channels 227 may be satellite transmission to satellite disks. The communication channels 227 may also be comprise cable transmission, RF transmission, direct links, and the like.

Alternatively, the satellite 222 may broadcast 238 the video stream 212 directly to the user stations 230 In this embodiment, the user stations 230 are provided with receiving stations 240 which may comprise satellite disks 242 for receiving the satellite broadcast 238. The satellite disks may comprise C-band receivers or small dish receivers. The video stream 212 is broadcast in a format 229 receivable by the user stations 230, depending upon the particular medium chosen.

The video stream 229 is eventually received by the viewing stations 230. At the viewing stations 230, a decoding module is preferably provided for decoding the three- dimensional transformation information. The decoding module may decode the video stream in a manner suitable to the format of the three-dimensional transformation information. In one embodiment, the decoding module comprises a synchronizing module 236. Under this embodiment, the synchronizing module 236 is used read the transformation information in the form of a vertical synchronization signal of each of the odd and even fields 204, 206 of each frame of the video stream 229. Synchronization signals are then transmitted to viewer glasses 244 which alternately shutter the lenses 246 thereof, as described above.

Thus, the viewer is allowed to view the three-dimensional video 229 from the comfort of his/her home. Conveniently, under the present invention, the encoding and transforming module 210 need not be supplied to each viewing station 230. Instead only the synchronizing module 236 and the viewer glasses 244 are needed at each viewing station 230. It is an important concept of the present invention that the three-dimensional video stream 229 may be of several types which are seamlessly mixed and programmed for the viewer. In one embodiment, a separate three-dimensional video channel programming station is used to program up to 24 hours per day of programming most of all of which is in three- dimensional video format. As part of this programming, pre-recorded three-dimensional video may be used.

Also, live video which is shot with dual camera or other three-dimensional video generation techniques may also be interspersed therein. Additionally, live events such as sporting events can be translated on-the-fly by the system 200 of the present invention and immediately transmitted through the above-described channels to the viewing stations 230. The end viewer then views each of these different types of three-dimensional video seamlessly through his/her viewing glasses 244 without ever knowing that the video is generated in different manners. The three-dimensional video is in one embodiment transformed with spacial displacement to simulate the three-dimensional effect as described above. Nevertheless, the three-dimensional may also be transformed by temporal displacement to simulate the three- dimensional effect to the viewer. This temporal displacement may be modified on-the-fly according to the amount of motion in the particular scenes being depicted. Thus, for instance, objects moving fast, such as a race car, may be displaced by less frames, e.g, one frame displacement. Objects not moving at may be displaced more, such as three frame displacement. Objects moving slow may have an intermediate displacement, for instance, a two frame displacement. This manner of selective scaling will be discussed in greater detail below.

In a further embodiment, the video stream 212 may be transformed within a studio prior to broadcasting and be altered or "sweetened" through computer manipulation. The sweetening may comprise different types of manipulation of the different fields 202, 204 of the three-dimensional video stream 212. For instance, under this sweetening scheme, video may be added into the stream 212 through animation or inteφolation. Colors and tones may be changed. For instance, colors which appear to pop out of the screen may be added in instances, and colors which appear to regress into the screen may be added in other instances.

Shadowing effects may be added, objects may be added, and the two-dimensional video stream may be passed through the encoding and transforming module 210 in part, in whole, or not at all. The resultant sweetened video stream is then mastered into a recording 215 and distributed for viewing to the end stations 230. This distribution may be through video rental shops, direct cable, or through the broadcast network described above.

Selective Scaling A further aspect of the present invention is a system and method for selective scaling.

According to one embodiment of the present invention, selective scaling comprises expanding or reducing objects within a frame while leaving the background the same or scaling the background in the other direction. In one embodiment, the objects are chosen based on the pixel movement from frame to frame. By taking the difference in pixels from succeeding frames that change beyond a certain threshold, a shape can be isolated. This shape represents the movement or changes from frame to frame. Complete scene changes would be detected as a massive change and would not be scaled. Slight changes from one frame to the next would also not be scaled.

Referring to Figure 10, shown therein is a schematic block diagram of one embodiment of a selective scaling system. The system 250 comprises a video in line 252 on which digitized video is received into an incoming video sequence storage module 254. The storage module 254 disassembles the video into a plurality of frames and stores those individual frames temporarily on frame buffers 256. The buffers 256 are also used in the embodiments described above for effecting frame delays.

The differences in the individual pixels of the different frames stored in the frame buffers 256 is observed and calculated with a pixel difference sequence control. The differences between frames is stored in a pixel difference buffer 262. Using these differences, the outline of a shape of an object that is moving faster or more slowly than the remainder of objects in the frame is determined with a determine shape difference module 266.

Once a shape that is moving more or less than the remainder of objects in a frame is determined, the shape is scaled with a calculate scaling data module 268. In so doing, the shape may be scaled to pop out more from the screen, or to regress into the screen. In so doing, the shape is preferably transformed to have a lesser or greater temporal or spatial displacement from the other objects in the frame. This process is preferably conducted for all frames in which the shape is distinguishable as moving faster or slower than others in the frame. Thus, shapes can be made to "pop out" from the screen. In so doing, other objects or background could be scaled in the opposite direction, to further exaggerate movement of the shape.

The calculate scaling data module 268 temporarily stores the calculated transformation data in scaling data buffers 270, until the transformation can be completed by successive circuitry. Thus, a video control out module 260 coordinates, together with a master logic and timing control 264, the addition of the selectively scaled data into the frames. This is done through a scale shape logic module 272 which provides the scaling data from the buffers 270 to a scaling process 274 which in one embodiment operates similar to the embodiments discussed above, to temporally or spatially transform one field with respect to another, with the exception that the selected moving objects are transformed to a greater or lesser degree as described. Of course, only the object could be transformed, and the manner of so doing will be readily apparent from the present discussion and the transformation techniques discussed above.

Once the frames are transformed, they are provided to a video out buffer 276 which provides them to a video out line 278 in a sequence and with a selected timing for recording, broadcasting, and/or viewing.

Dynamic Variance of Temporal and/or Spatial Delay A further aspect of the present invention is a system and method for dynamic variance of temporal and/or spatial delay. According to this aspect of the present invention, the three- dimensional transformations conducted above, are dynamically varied according to what is taking place within the frames of the video stream. This is preferably measured by a change of luminescence from one screen to the next. Thus, in one embodiment, if little action is occurring, the temporal or spatial transformation is increased. Likewise, if there is significant movement, the transformation may be reduced. During lulls and times of little movement, the transformation is preferably increased. During cut scenes, the transformation is preferably substantially scaled down or totally eliminated in order to make viewing easier on the eyes.

Figure 11 illustrates one embodiment of a system for effecting dynamic variance of temporal and/or spatial delay. The system 300 of Figure 11 is shown comprising a histogram circuit 304, a processor 320, and a memory controller 324. In one embodiment, the processor 320 is the processor U5 of Figure 6A. A series of lines 302 (eight lines in one embodiment) carries a video stream Y-in into the histogram circuit where the video stream is examined for movement. The histogram circuit examines the video stream for activity and communicates with the processor 320 over lines D0-D7 306, Start* 308, RD* 310, FC* 312, Func(0...1) 314, LD* 316, and Clear 318. Lines D0-D7 306 provides the output of the histogram circuit to the processor 320. The remainder of the lines 308, 310, 312, 314, 316, 318 are for control puφoses.

In one embodiment, the histogram circuit 300 samples one of two fields (e.g., the even or odd field) in an area of interest, sums up the luminance values of all pixels in that field for a given number of frames, (in one example, five frames are summed) and compares the sum of luminance values for each new frame to the average value for the previous five frames. A threshold value is set, and if the luminance varies by the threshold value, a signal is sent to the processor indicating that the threshold value has been exceeded. The processor then takes action to phase in or phase out three-dimensional transformation.

In an alternative embodiment, the histogram circuit 304 merely sums each pixel in each line in an area of interest, then sums the lines in the area of interest and passes the luminance of the most recent frame to the processor 320 on lines 306. The processor 320 then compares to new line to an average value of the last five lines to see if the luminance of the most recent frame is within the threshold value of the average luminance for the five previous frames. Again, action is taken according to the variance of luminance in the recent frame. The memory controller 324, which may correspond to controller 38 of Figure 1, controls RAM memory, which may correspond to memory 40 of Figure 1. In one embodiment, the RAM memory is provided with frames of a digitized video stream, as discussed above. Preferably, both transformed versions and original versions of several video frames are stored in the memory. The transforms may be conducted as discussed above, and in one embodiment comprises several types of transformations, including horizontal and vertical scaling, horizontal and vertical shifting, and a temporal time delay.

The memory controller 324 receives instructions from the processor 320 regarding which frames of the video stream are to be passed along and viewed by a viewer. When it is determined that the luminance variance has not exceeded the threshold value, the transformed frames of one of the two fields (horizontal and vertical) are selected for viewing. When the it is determined that the threshold has been exceeded, the three-dimensional transformation is phased out by selecting the original, untransformed frames to be viewed. This phasing out may be sudden, as when a cut scene occurs. It may also be gradual, by signaling the transformation circuitry to transform the selected field to a lesser degree. It may also be scaled by the particular change in luminance.

Referring now to Figure 12, shown therein is a state chart 330 illustrating the various states of a histogram circuit of one embodiment of the present invention. Timing for the histogram circuit is illustrated in Figure 13, while one embodiment of a histogram circuit 360 suitable for use with the present invention is illustrated in Figure 14.

In Figure 12, when in a 0 0 histogram mode 331 of the state chart 330, pixel data of a frame of a two-dimensional video stream is sampled during positive clock pulses on a line Y- in(0...7) 362 through a D-type flip-flop 376. The lines 362 carry a luminance value for each pixel. In the depicted embodiment, there are 256 possible luminance values for each pixel. The RAM memory 388 (which preferably comprises SDRAM) is provided with 256 storage locations, one for each possible luminance value. For each pixel within the current line, the luminance of that pixel is sampled and the particular luminance value is used to reference one of the memory locations of the RAM memory 388 according to the luminance values stored therein. Thus, for instance, if a sampled pixel had a luminance value of 155, a one is added to the 155 address location of the RAM memory 388.

This is conducted for an entire line of pixels in the area of interest. In one embodiment, the area of interest is the center 256 by 256 pixels of each frame. Accordingly, each line of the area of interest comprises 256 pixels, and there are 256 such lines that are observed. When each pixel of the area of interest of a line has been read, the processor transmits signals to a function decoder 416 to cause a state machine 404 to alter the state to 01 (referenced at 332 in Figure 12). State 01 is the histogram accumulator state. In this state, the pixels of the current line are added to the current pixel total for the current frame. The addition is conducted with a summing node 398 and a 3-to-one input Multiplexor 420. The sum is passed on a line 390 back to the RAM memory 388, and the total sum of luminescence for all previous lines in a frame is read by a plurality of output flip-flops 434 and stored therein. This process continues, summing lines at a 0 0 histogram state 331 and adding the current line to the running total at a 0 1 histogram accumulator state 332 until all lines of a frame have been read and added. When all lines of a frame have been read and added, the processor signals the function decoder 416 to cause the state control machine 404 to go to state 1 0, the data out state 333. In this state, the sum located in the output flip-flops 434 is passed to the processor 320. The processor then, as discussed, compares the newest total luminescence to the average luminescence for the previous five frames to determine if the threshold value has been exceeded. Of course, the comparison could also be done with logic circuitry, such as a comparator.

If the threshold has been exceeded, the processor 320 then signals the memory controller 324 to select normal, nontransformed frames for viewing, rather than transformed frames. The viewer thus experiences a reduction or elimination of the three-dimensional effect. Alternatively, of course, if the luminance increases by a certain value, rather than decreases, the three-dimensional transformations may be resumed or increased.

Figure 13 is a timing diagram illustrating the timing occurring on various clock, data, and control lines. The timing diagram is broken up into histogram timing, histogram accumulate timing, and data out timing. In the histogram timing, one embodiment of timing 332 for a elk line 364 is shown, together with timing 334 for a start* line 366, timing 336 for the Yin data line 362, and timing 338 on a RAM in line 386.

In the histogram accumulate timing, the timing 340 for the elk line 364, timing for the start line 366, and timing 334 for an accumulate data operation of the output buffers 434 are shown. In a data out timing diagram, the timing 346 of a rd* line 368 is shown, as well as the timing 348 of OSEL lines 438, and the timing 350 of the DO lines 440 which are read by the processor 320 on node 458.

The elk line 364 is connected to a fixed frequency clock. As discussed, the timing signals on the lines 366 and 368, together with other timing signals are generated by the processor 320 in a manner that will be readily appreciated and understood by one of skill in the art.

The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

CLAIMS:

1. A method for broadcasting a video stream that is synthesized from two- dimensional video into three-dimensional video, the method comprising the steps of: receiving a two-dimensional digitized video stream comprising a plurality of video frames configured for sequential display on a display device, each frame comprising a plurality of fields which together contain all digital video information to be displayed for a frame; generating information adapted to transform at least one of said plurality of fields into a format that when assembled with others of the plurality of fields provides a viewer with a three dimensional image; and broadcasting the information from a broadcasting station for reception by a viewer station.

2. The method of claim 1 , further comprising: providing a decoding device at the viewer station; receiving the information broadcast from the broadcasting station into the decoding device; assembling the information into a three-dimensional image for display on a television set; and displaying the three-dimensional image on a television set.

3. The method of claim 1, wherein the information comprises a spatial transformation of the at least one of said plurality of fields.

4. The method of claim 3, wherein said transformation step comprises spatial transformation and further comprises shifting one field in the vertical direction relative to the other field.

5. The method of claim 4, wherein said spatial transformation step scales said one field in the vertical direction relative to said other field.

6. The method of claim 1, wherein the information comprises a temporal transformation of the at least one of said plurality of fields.

7. The method of claim 1 , further comprising: receiving and assembling the information into a simulated three-dimensional video frame on a display device disposed at said viewer station by alternating a first field and a second field such that said first field is viewed by one eye of an individual viewing the display device and said second field is viewed by the other eye of the individual.

8. The method of claim 1, further comprising: separating said plurality of fields of said two-dimensional digital video frame into at least a first field and a second field; extracting from said video stream a single two-dimensional digital video frame for processing; separating said plurality of fields of said single two-dimensional digital video frame into at least a first field and a second field; spatially transforming at least one of said first field or said second field in order to produce a simulated three-dimensional video frame when said first field and said second field are recombined and viewed on a display device; and displaying said first field and said second field without temporally shifting either said first field or said second field in order to create said simulated three- dimensional video frame by displaying said first field and said second field on a display device within a single frame such that said first field is viewed by one eye of an individual viewing the display device and said second field is viewed by the other eye of the individual.

9. The method of claim 8, wherein said first field and said second field each comprise a plurality of pixels arranged in a matrix having a plurality of rows and columns, and wherein said spatial transformation step skews one field in the vertical direction relative to the other field by performing at least the steps of: selecting a total skew value; selecting a starting column of pixels; and for each column after said selected starting column, shifting the column relative to the preceding column in a chosen vertical direction by a predetermined value derived from the total skew value.

10. The method of claim 8, further comprising the step of temporally shifting at least one of said first field or said second field in order to introduce a time delay relative to its original location in said two dimensional video stream.

11. The method of claim 1 , further comprising providing three-dimensional sweetening transformation to the information through computer manipulation prior to broadcasting the information.

12. A method for broadcasting a three-dimensional video stream created from a two-dimensional video stream, the method comprising the steps of: receiving a two-dimensional digitized video stream comprising a plurality of video frames that are intended to be displayed sequentially on a display device, each frame comprising a plurality of fields which together contain all digital video information to be displayed for a frame; extracting from said video stream a single two-dimensional video frame for processing; separating said plurality of fields of said single two-dimensional video frame into at least a first field and a second field; spatially transforming at least one of said first field or said second field using a transform comprising at least one of (a) a skewing transform that skews one field relative to the other, (b) a scaling transform that scales one field relative to the other, and (c) a shifting transform that shifts one field relative to the other; recombining said first field and said second field without temporally shifting either said first field or said second field in order to create in order to produce a simulated three-dimensional video frame; broadcasting the first and second fields from a broadcasting station to a plurality of viewers; and displaying said simulated three-dimensional video frame on a display device by alternating said first field and said second field such that said first field is viewed by one eye of an individual viewing the display device and said second field is viewed by the other eye of the individual.

13. A system for broadcasting and displaying a three dimensional video stream created from a two dimensional video stream, said two dimensional video stream comprising a plurality of video frames intended to be sequentially displayed on a display device, each of said video frames comprising at least a first field and a second field, said system comprising: a receiving module configured to receive a frame of said two dimensional video stream and for digitizing said frame so that said frame can be further processed by the system; a separating module configured to separate said frame into at least a first field and a second field, each of said fields containing a portion of the video data in said frame; a transforming module configured to transform at least one of said first field or said second field using a selected transform that will produce a simulated three- dimensional video frame when said first field and said second field are recombined and displayed on a display device; a recombining module configured to recombine said first field and said second field for transferring said recombined first field and second field to a display device in order to create said simulated three-dimensional video frame; a broadcasting station configured to broadcast the first and second fields to a plurality of viewers; and a decoding module configured to control said display device so that said first field is viewed by one eye of an individual viewing said display device and said second field is viewed by the other eye of the individual.

14. The system of claim 13, wherein the recombining module is configured to recombine said first field and said second field without temporally shifting either said first field or said second field.

15. The system of claim 13, wherein the transforming module is configured to transform at least one of said first field or said second field with a spatial transformation.

16. The system of claim 13, wherein the transforming module is configured to transform at least one of said first field or said second field with a temporal transformation.

17. The system of claim 13, wherein said selected transform comprises a skew transform that skews one field in the horizontal direction relative to the other field.

18. The system of claim 13, wherein said selected transform comprises a skew transform that skews one field in the vertical direction relative to the other field.

19. The system of claim 13 wherein said selected transform comprises a shift transform that shifts one field in the horizontal direction relative to the other field.

20. The system of claim 13, wherein said selected transform comprises a shift transform that shifts one field in the vertical direction relative to the other field.

21. The system of claim 13, further comprising a sweetening module configured to add sweetening information to the video stream through computer manipulation prior to broadcasting the first and second fields.

22. A method for recording a three-dimensional video stream that is synthesized from a two-dimensional video stream, comprising the steps of: receiving a two-dimensional digitized video stream comprising a plurality of video frames that are intended to be displayed sequentially on a display device, each frame comprising a plurality of fields which together contain all digital video information to be displayed for a frame; extracting from said video stream a single two-dimensional digital video frame for processing; separating said plurality of fields of said single two-dimensional digital video frame into at least a first field and a second field; spatially transforming at least one of said first field or said second field in order to produce a simulated three-dimensional video frame when said first field and said second field are recombined and viewed on a display device; recording on a suitable media the plurality of fields; and displaying said first field and said second field without temporally shifting either said first field or said second field in order to create said simulated three- dimensional video frame by displaying said first field and said second field on a display device within a single frame such that said first field is viewed by one eye of an individual viewing the display device and said second field is viewed by the other eye of the individual.