|Publication number||WO2000067249 A1|
|Publication date||9 Nov 2000|
|Filing date||26 Apr 2000|
|Priority date||29 Apr 1999|
|Publication number||PCT/2000/11182, PCT/US/0/011182, PCT/US/0/11182, PCT/US/2000/011182, PCT/US/2000/11182, PCT/US0/011182, PCT/US0/11182, PCT/US0011182, PCT/US011182, PCT/US2000/011182, PCT/US2000/11182, PCT/US2000011182, PCT/US200011182, WO 0067249 A1, WO 0067249A1, WO 2000/067249 A1, WO 2000067249 A1, WO 2000067249A1, WO-A1-0067249, WO-A1-2000067249, WO0067249 A1, WO0067249A1, WO2000/067249A1, WO2000067249 A1, WO2000067249A1|
|Inventors||Jeffrey D. Marsh|
|Applicant||Marsh Jeffrey D|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (7), Non-Patent Citations (1), Referenced by (4), Classifications (3), Legal Events (7)|
|External Links: Patentscope, Espacenet|
SYSTEM FOR STORING, DISTRIBUTING, AND COORDINATING DISPLAYED TEXT OF BOOKS WITH VOICE SYNTHESIS
This invention relates to a system and a method for the storage of a 5 multiplicity of books (or other text based documents) in a repository (e.g., a computer database), which allows a user to select one of such books and to produce a spoken version of such book. This selected book will then be stored on media (e.g., a CD disk or other suitable computer readable media) so that a text-to-speech version of the book may be stored and be re-played at will by an
10 end user wherein such text-to-speech version of the book. Background Art
Personal Computers (PCs) and laptop computers now typically include both sound cards and compact disk (CD) players. Laptop computer capability continues to dramatically increase and their prices continue to fall such that
15 laptop computers are now widely available to many people.
Compact disk recorders (CDR) are now capable of 4X recording speeds and CD readers are now capable of 24X and higher speeds. A typical 20 Mbyte file can be recorded in less than 36 seconds. Data transmission rates of 20 Mbytes per second are now achievable and cost effective. Data for an audio
20 book of the present invention can be compressed to 1 Kbytes per page.
The current "print and hold" method of book distribution has been recognized as very inefficient. In such "print and hold" method of distributing books, books are printed on paper and bound, sent to distributors who in turn distribute the printed books to stores, and then the books are sold to customers.
25 While such prior book distribution and marketing systems worked well for many years, there were problems because many of the books are unsold by the stores and thus were returned to the distributor or to the publisher. Moreover, resources were tied up in keeping books in stock. Still further, once a book went "out of print", consumers could not order such books. In recent years, on
30 demand publishing systems, such as shown in U. S. Patent 5,465,213, have addressed such problems by storing a large number of books in digital form on a computer, by allowing a customer to review information on the books so stored, to select a book to be ordered, and by allowing the printing and binding of single copies of such books on demand.
Currently, certain "books on tape" versions of books are commercially available. These books are used by sight impaired persons and by persons that prefer to listen to books on tape as, for example, they commute in their automobiles. However, the number of books available on tape is a small portion of the books published. Currently, in order to produce a book on tape, it requires that a person orally read the book and that the spoken book be recorded on an audio tape. This requires that the reader spend considerable time to read the book. In turn, the cost of producing such a book on tape is great and there is a considerable wait to produce a new book on tape. This is a considerable problem for sight impaired persons and very much limits the books to which they have access. Moreover, there are applications other for sight impaired persons which having a book or other written document available in a spoken form are of considerable value.
There are text to speech programs available what audibly play back a computer file containing a document. Typically, such text to speech programs utilize a voice synthesizer to covert the words of the text file to audible speech. While these synthesizer systems work for their intended purposes, they are difficult to understand, they are difficult to change the time between the words, and, in general, since the words are synthesized rather than spoken by a human voice, they are difficult for many people to understand.
There has been a long-standing need for a fast, inexpensive, and secure method of transforming the text of written books or other documents into a spoken version, particularly for a single copy or for small runs which offers strong protection against unauthorized copying, and which may be readily and conveniently played back on demand utilizing inexpensive and widely available equipment.
Summary of Invention Among the several objects and features of the present invention may be noted the provision of a system and method of automatically converting books or other text documents from text to spoken versions of the book without the need for a person to read the book onto audio tape or the like and where the book may be readily re-played on demand by the end user;
The provision of such a system and method which reliably converts text works to spoken works substantially without errors; The provision of such a system which has a library of digitized words spoken by a person wherein the system of this invention recognizes the next word in the text to be read, looks up the digitized spoken word corresponding to the next work, and digitally plays back the spoken word rather than generating a synthesized word thus resulting in a more natural and more easily understood text to speech system;
The provision of such a system and method where if a word in a document to be spoken is not found in the library, the letters of the word will be audibly spoken (spelled) so that the user may discern the word by recognizing the spelling of the word; The provision of such a system and method in which the time interval between the spoken words can be varied by the user;
The provision of such a system and method in which the rate at which words are spoken can be speeded up or slowed down by the user without substantially impairing the understandability of the spoken words; The provision of such a system and method in which a library of spoken words is digitized and recorded such that upon the words of a document being played back, each word in the document, as it is to be play or spoken, will in turn be "looked up" in the dictionary and spoken such that an electronic file of most documents or books may readily be aurally heard by the user without the necessity of special preparation of the text for play back on the present system;
The provision of such a system and method where, in preparing a book file, a digitized word corresponding to each word in the text of the book is recorded so that the digitized words do not have to be looked up in a master dictionary thus speeding up the reading of the text file; The provision of such a system and method in which the words being spoken by the system of this invention may be displayed on a computer screen in a large format thereby to enable sight impaired persons to visually see the words of the text as they are spoken;
The provision of such a system and method in which the user may use the computer may make notes on the computer as the user is playing back a book or other document, and where the notes may be played back (spoken) to the user;
The provision of a system and method in which the user may use the computer to make bookmarks in the book file being played so that the user may readily find certain passages or notes in the document or book; The provision of such a system and method which may be used with virtually any book or document (so long as a text file recognizable by a computer exists or can be readily created) so that such books and text documents can be played back in spoken form at moderate cost and can be done substantially on demand; The provision of such a system and method in which such books or documents so converted to speech documents may be economically stored and re-produced on demand at remote locations:
The provision of such a system and method in which copies of such spoken versions of such books and documents are copy protected such that publishers and authors will have confidence that unauthorized copies cannot be made;
The provision of such a system which uses widely available and relatively inexpensive PC based equipment to play back the spoken book;
The provision of such a system in which the spoken book may be randomly accessed and searched; and
The provision of such a system and method which uses currently available technology to translate the book or other text document to a spoken version, which compresses the data so as to economically stored in a computer or the like, which may be rapidly and economically from a central computer date base or the like to a remoter user, and which is convenient to use.
Briefly stated, the system of the present invention converts text-based documents (e.g., books or other such documents) from text to speech. The system of the present invention stores a plurality of documents in a central repository, for example, in a central computer data base or the like, and the distributes a selected one of such stored documents to an end user. The system of this invention allows the end user to selectively play back of such selected document (or a portion thereof) in a spoken format. More specifically, this system includes a program for converting such text-based documents into a digital format, means for storing a plurality of such documents in a digital format in the above-noted central repository, means for selecting one of such stored documents stored in such repository, means for recording or storing such selected document onto a medium which allows such document to be played back on computer means, means for encrypting the selected document as it is recorded onto such medium so as to prevent copying of such document, and means for selectively playing back the stored document, the play back means including means for deciphering the encrypted document stored on the medium, and means for converting such stored document to speech upon play back of the document.
Still further, the system of this invention is intended for use with a computer so as to play back a text file in a spoken format. The system comprises a media (e.g., a CDROM or the hard drive of the computer) on which at least part of a book or other text file is recorded in a format readable by the computer. A program is provided for reading the words of the text file which are to be played back. The program has access to a plurality of digitized words that have been spoken by a person. Generally, there is a digitized spoken word for each word in the text file. The computer has a sound system for audibly playing back the digitized word files where the computer reads each word in a text file and plays back the digitized words corresponding to the word in the text file over the sound system so that the end user hears the spoken words in the order they appear in the text file.
The method of the present invention involves storing a multiplicity of books or other text-based documents in a digital format in a central repository in a digital computer. An end user selects one of the multiplicity of such books stored in the computer. Then, the selected book is downloaded and stored on suitable playback media. As the one book is recorded on the media, it is protected against copying. The end user selectively plays back the one book recorded on the media on a suitable play back computer. The play back computer reads an encryption key. and converts the text of the selected book from text to speech.
Brief Description of the Drawings
Fig. 1 is a block diagram illustrating the steps of storing a plurality of text based documents (e.g., books) in a central computer repository;
Fig. 2 is a block diagram illustrating the steps managing the documents stored in the central computer repository;
Fig. 3 is a block diagram of the main steps or components of a point of sale or terminal (which could include an end user's computer)which may be connected via a wide area network (e.g., the Internet) to the central book (or other text based document) repository shown in Fig. 2 and which may be used to browse through the documents stored in the central repository, for ordering a selected one of such documents, and for downloading this one selected document (or a portion thereof) from the central computer repository and for recording such selected document on suitable playback media (e.g., on a CDROM or on the hard drive of the end user's computer); Fig. 4 is a block diagram illustrating the main components or steps in playing back the selected one document recorded on such playback media in a spoken mode;
Fig. 5 is a flow diagram depicting the steps for creating a master dictionary and for updating (adding words to) this master dictionary; Fig. 5A is a flow diagram of a program for preparing optimal database for the digitized word files for use with the present invention;
Fig. 6 is a flow diagram for checking the master dictionary to insure that sound files exist in the master dictionary corresponding to all of the words in a book or other text based document to be played over the system and method of this invention;
Fig. 7 and 7A are flow diagrams for a book reader program for use with the present invention; Fig. 8 is a flow diagram in accordance with this invention for the preparation of a minimized book database;
Fig. 9 is a flow diagram for a minimized version of a book reader program for use with the present invention; Fig. 10 is another flow diagram for preparing a book to be entered in the data base; and
Figs. 1 1 and 1 1 A show another flow diagram illustrating the steps in accordance with this invention for reading a book.
Corresponding reference characters indicate corresponding parts throughout the several views of the drawings. Best Mode for Carrying Out the Invention
This invention relates to a system and a method for allowing a text based document, such as a book, stored in a digital computer, to be stored on a central data base, distributed to an end user, either on computer storage media, such as a CDROM disk, or to distributed over a wide area network, such as the Internet, so that the text based document may be stored on the computer storage media within an end user's computer (e.g.. on suitable portable media, such as a CD ROM disk or a floppy disk) or on the hard drive of the end user computer, and then played back on the sound system of the end user computer so that the user may hear the words of the text in a spoken format.
This invention involves the creation of a database in a central computer (also referred to as the central book repository, as shown in Fig. 2) of optically read (scanned) books (which preferably undergo are converted into text as, for example, performing an optical character reading (OCR) routine on the scanned book or document) or other computer readable text-based documents. Preferably, the text of the document is stored as a simple ASCII text file (or as any other text file, such as a conventional word processing format, as a PDF file, or in a vector based format such as an Adobe PostScript® file) in the computer data base. For security purposes, the text may optionally be encrypted (as will be hereinafter described) to prevent piracy. The data representing a book (or any text based document) may be compressed (as will be hereinafter described in detail) at an average of 1Kb per page. This results in an "average" book (or other document) of 250 pages in length requiring 250Kbytes of storage. Due to low costs of computer disk storage, the available database of books may be stored in a single site and or partially distributed to the points of sale. A small PC can now easily contain 28 Gbytes of storage. This would translate to 1 12,000 books stored in compressed format. Local storage could provide caching of often requested books, thus reducing the demand on the central repository.
Specifically, the central book repository used for the central repository database may be a personal computer having a Pentium processor manufactured by Intel, preferably having a speed of 200 MHz or more and a random access memory (RAM) of 64 Mbytes or more. Such a computer will have adequate speed to service requests coming from multiple other computers requesting copies of books stored on central computer. The central computer would preferably have a relatively larger hard disk storage capability, such as about 7 - 20 Gigabits, having a SCSI-WIDE interface thus allowing fast access to the data. Also, such SCSI interfaces allow "daisy chaining" of numerous disk units such that with current technology, perhaps as much as 70 Gigabits of storage may be provided on a PC, which with current data compression technology, would allow approximately 250,000 books (at about 250 pages/book) to be stored in the central book depository. The central computer is provided with an Internet interface, such as a BayStack Instant Internet interface, and would preferably have at least a Tl connection to allow data transmission at the rate of about 1 Mbit/second or more. It will be understood that the central book repository computer would preferably be part of a wide area network which could include the Internet. In this manner, the books or documents stored in the data base would be remotely accessible by a customer or other authorized person or entity. While the central computer was described as a PC, it will be understood that for high volume applications a suitable mainframe computer may be preferred. In accordance with the present invention, the process of audio book distribution involves creating a central repository, as shown in Fig. 2, in which a multiplicity of book or other text based documents are stored on the central computer which, as noted above, may contain compressed text of a plurality (multiplicity) of books (or other text-based documents) that had been entered into the repository either by scanning and optical character recognition or by electronic transmission from book publishers where these electronic files (in a suitable format) may now exist. If the book contains graphics (e.g., photographs or charts), a written description of the graphics in the book or document may be added to enhance the understandability of the book or document as it is audibly played back by the user of the system and method of the present invention . The manner in which such book data is stored could be any one of suitable data storage methods. For example, the book data can be stored on a hard drive (as described above), on CD disks in a juke box player, or any other suitable method.
As shown in Fig. 2, the central book repository computer may be located remotely for a point of sale system POS or end user computer, as shown in Fig. 3, by the above-mentioned wide area network or by the Internet. Such a point of sale system includes a customer (or end user) computer from which an end user may review the books, or may review sales information concerning the books in the data base. Such sales information may include such information as book reviews, summaries, abstracts, key words, abstracts, card catalog, cross reference to other books by the same author or on the same subject, or other information stored on the central repository and associated with the books. This sales information may be used by the end user (e.g., a potential customer connected to the central book repository computer by means of the Internet) or by a clerk in a store or an authorized person at a school or order processing center to select a book or other document from the data base. Then the customer or the clerk may order a copy of any book or other document stored on the central repository. As shown in Fig. 2, the customer's computer may be located remotely from the central computer C and may be connected to the central repository computer by means of a satellite link, a connection to the Internet, or by a direct link, as part of a wide area network or a local area network. For sight impaired persons, such book ordering and selecting information may be available both as words that appear on the screen and may be printed or they may be played back over the sound system of the customer or end user's computer by means of a text to speech program. When the customer on the Internet selects a book to be order, the customer transmits transaction data, such as billing information (e.g., credit card information) along with shipping instructions. Once all of the required ordering and shipping information is obtained, the text of the selected book (or other document) may be copied onto a recordable compact disk, typically referred to as a CDR (or on other suitable media) so the selected book may be shipped to the customer. Alternately, the selected book may be downloaded to the customer's computer over the Internet or other network such that the use of the above-described media is not required. The CDR containing the text of the selected book, which may be in an encrypted format, along with an encryption key is recorded on a copy protected diskette which is shipped to the customer along with the CDR by mail or by other suitable delivery service.
Within the description of this invention, it will be understood that terms "distributing a book to a customer or an end user" would encompass recording the selected book or document on computer media, such as on a CD ROM disk or on a floppy disk or the like and physically shipping or giving the media to the end user, or allowing the end user's computer to connect to an electronic computer file over a wide area network, such as the Internet, and to allow the selected text file to be down loaded onto storage media in the end user's computer (i.e., on the hard drive or onto a suitable floppy or the like). Likewise, the term "recording the book or document on suitable media" or words to that effect would include both recording the text file on a CD ROM or floppy disk or recording it on the hard drive of another computer.
As noted, the text stored on the central computer may be encrypted to prevent unauthorized copying of copyrighted materials. A variety of encryption techniques may be used to encrypt the data stored on computer C. The encryption technique shown below in C language is preferred because it offers reasonable copy protection, but operates quickly so as to not substantially slow¬ down the transfer of data. An encryption program may be as follows:
It will be understood that rather than recording the text on suitable computer storage media, such as on a CDR. the text of the selected book stored in the data base, may be delivered electronically to the customer or end user via the wide area network or via the Internet. Such electronic delivery would make any of the books or documents stored in the data base virtually instantly available from anvwhere in the world. As shown in Fig. 2, the central repository also contains software and hardware to accept orders, transmit orders either by Internet, direct telecommunications, surface mail or satellite link, and to provide billing information both to the purchaser of the book selected and to the publisher of the book. In addition, the central repository has the capability of transmitting data corresponding to a selected point of sale or customer computer where the above-noted CDR and the encryption key diskette (described above) may be recorded.
To operate the hardware described above in the central repository, an operating system is used, such as Microsoft's Windows 95 or Windows NT. To provide an interface between the point of sale or customer computers Cl and central computer C, an Internet server software system (ISS) is provided. One ISS system that has been found to work well is Webstar commercially available from QuarterDeck®, or NetServe® available from NetScape®. Custom software to interface the ISS with the tasks of fetching and transmitting requested files as well as maintaining transaction history for billing and publisher notification are included as part of a CGI engine program. This last- noted program is launched or communicated to the Dynamic Data Exchange (DDE) by the ISS. When a request comes to the central computer from a point of sale or customer computer via the Internet, the ISS notifies the CGI engine that a request has been made and informs the CGI of the name of the file it has created that contains the information about the request and the name of the file to use that will contain the requested data. When the CGI engine is finished, it notifies the ISS that it is done and the ISS transmits the data back to the requesting point of sale or customer computer by an appropriate data transmission link, as shown in Fig. 2.
As shown in Fig. 3, one or more point of sale terminals or customer computers are shown to be located remotely from the central repository computer and are interconnected to the central computer by a suitable network or communication link. Such point of sale terminals or customer computers may be connected directly to the central computer such that upon an order being received from a remote location via the Internet, the selected book is transmitted to the point of sale computer and the CDR with the book data thereon and the appropriate encryption key may be recorded. In certain instances, it may be desirable to located the point of sale computer in a kiosk in a store or library, or, in certain instances, the point of sale terminal may be a personal computer (PC) located in an end-user's home or office. The point of sale terminal or customer computer includes a suitable personal computer (PC) terminal which allows the user to browse or to otherwise interrogate the central repository system for books (or other documents) available from the central repository, along with ordering information, such as the cost of ordering such a book. In the case of a point of sale terminal, such as a kiosk located at a store or a school, it may be desirable to include local storage for a large number of books. In such case, the computer used in the point of sale terminal should have about 7 Gigabits of disk space and a moderate memory (e.g., 32 Mbytes of RAM). Any current Pentium®- based PC will be sufficient. As is typical, such computers have a monitor, a keyboard, and a sound system including a suitable sound card for playing digital sound recordings over speakers connected tot he computer.
In addition, the point of sale terminal or a similar computer located in conjunction with the central repository would preferably have both a floppy disk drive and a compact disk recorder/reader (CDR), such as are readily commercially available from any number of companies, for copying the book data (which may be in an encrypted format) transmitted from the central repository computer upon receipt of an order for a selected book. In addition, the CDR disk containing the book or document will have sufficient space such that the suitable text-to-speech conversion program of the present invention (as will be hereinafter described) may also be transmitted with and stored on the disk. It is further preferred that the end user or point of sale computer have a modem for connection to the Internet. In the case of a point of sale computer located in a kiosk or located at a school or other "public" location,, it is preferred that such point of sale location also have a label printer such that labels for the floppy disk and the CDR disk for the book selected can be printed with information to identify the book recorded on the disks. The point of sale or customer computer is preferably provided with software to enable it to run and to connect to the central computer and to allow and end user or clerk to order a book, to transmit the book from the central computer to the remote or customer computer, and to record the book on the CDR. Such software would include a basic operating system, such as Microsoft's Windows 95, and a suitable Internet browser. In addition, custom software may be provided that will run on such operating system. These functions include:
1. A query function that will allow a clerk (via an appropriate communication link) or end user (via the Internet) to query or browse the local database of titles. This database may include the above-described sales information relating to the books stored on the central computer. Such sales information may include a synopsis of the book and sufficient information such that the end user may search the database for books that may relate to certain topics or categories so as to aid the end user in finding books that may be of interest. Included in this database will be searchable fields of titles, author names, topics, and the like to aid the clerk/customer in the selection of a book to be ordered. If the full text of the books is stored in a searchable format, full text searches of the books in the repository may be conducted, as well. Of course, this end user or browser database is updated as new books are added to the central repository.
2. An order function that will transmit the text of a selected book from the central repository computer to a point of sale or end user computer in response to an order being placed for the selected book.
3. A receiving function that accepts the data transmitted from the central computer. 4. A copying function that takes the transmitted book data and copies this data onto a blank recordable CDR. 5. An optional encryption key diskette is also created, preferably on a 3-1/2" floppy diskette, with a copy protected sector(s) and places an encryption key transmitted form the central computer
C with the book data. 6. A printing function that prints labels for both the key diskette and the CDR.
Once a customer has ordered a book and has received the book which has been recorded on the CDR, as above described, or which has been downloaded over the Internet (as described above), the end user may readily replay the book in spoken form. In order to replay the book in spoken form, an end user must have a suitable personal computer that can read the floppy diskette and the CDR (or other media upon which the book has been recorded) containing the book data and, preferably, a suitable text-to-speech conversion program, preferably the text to speech program as hereinafter described which constitutes a part of the present invention.. However, it will be recognized that the text-to-speech program may be resident on the end user's PC. In addition, the end user's PC must have a suitable sound card and speakers. Those skilled in the art will recognize that contemporary personal computers (circa 2000), both desk top and laptop computers, commonly have such capabilities. In the creation of the book repository on the central computer, selected books may be optically scanned and the scanned image of each book page is read by a suitable optical character reading (OCR) program, such as
OmniPage® commercially available from Calera. The resulting book page is then preferably stored on the hard disk drive of the central computer in an ASCII format. The book pages are proofed and a written description of any graphics appearing on a page may be added to aid the listener of the book upon playback in its spoken version. Of course, if an electronic file in a word processing or in a vector based file (such as a PostScript or other widely known computer file format) is available, it would not be necessary to optically scan the book and to used an OCR program to covert it to a text file. The book data may optionally be encrypted, using, for example, the above described encryption program. Preferably, the text file containing the book is compressed. and stored in the central repository computer data base. A variety of standard compression techniques may be used, but one that has been found to work will is PKZIP commercially available from PKWare may be utilized. Upon the book data being entered in the central repository, an entry is made in the repository database regarding the specifics of the book and of the selection/marketing information regarding that particular book. This information will be sent to the point of sale or end user computers.
A typical transaction would comprise a clerk at a point of sale terminal connected to the central computer via an appropriate communications link, or an end user at a remote computer connected directly to the central computer via the Internet. For a point of sale terminal, it is anticipated that a sighted person would use the terminal. However, for an end user terminal, it is anticipated that it may be equipped for use by sight impaired persons. In any event, the end user or a clerk could browse the books on the central repository to locate books of interest and to select one or more books to be ordered. In order to initiate the ordering process, the end user or clerk enters a financial portion of the transaction in a known manner. The point of sale or end user computer would transmit the financial data (e.g., credit card information) via the Internet to computer along with a positive identification code for the point of sale or end user terminal. Computer would, upon receiving an order request, first verify that the requesting point of sale or end user terminal was in good standing and computer would then fetch the requested book data from the central repository and transmit this book data over a suitable link back to the point of sale or end user terminal. Computer would also record the transaction for billing information and for notification of the publisher of the selected book that an order for that book has been filled.
The point of sale terminal or end user terminal receiving the book data would then record the book file along with a "playing" program (i.e., a suitable text-to-speech (TTS) program) onto a compact disk CD by an recordable CDR recorder. In addition, a 3 Vi inch, a copy protected floppy disk may be recorded on which the encryption key is included. For orders received over the Internet, it is preferable that the order be filled at the central computer (or some other location under the control to the system owner) so that unauthorized copying of the books can be controlled.
The CDR and the encryption key (if the file is encrypted) on the diskette can then be mailed or shipped by express courier to the customer.
Upon a customer or end user receiving the CDR on which the book data and the playing program is recorded and the floppy including the encryption key for the CDR, the customer inserts the CDR and the floppy in the corresponding drives of a personal computer, as above-described. Since the operating system (e.g., Windows 95 or the like) autosenses the presence of a compact disk in the CD drive of the computer, an autorun program is initiated which launches the playing (TTS) program contained on the CDR. The TTS program would then search for a diskette in the floppy drive of the computer for an encryption key. Of course, if such an encryption key is not found, the book data on the CD cannot be accessed and thus cannot be copied or played. Once the encryption key is read from the floppy, the program would load, decompress the book data, decrypt, and convert the text of the book to speech which is audibly played back (spoken) via the sound card and speakers of the end user's PC. Such TTS programs are commercially available by programs such as the commercially available The Open Book program. Other such TTS software, such as is available from Ref Software-Quelle Datentechnik GmbH, may be utilized. Alternatively, WAN. files corresponding to each word in the text file could be issued to the computer's sound card. Of course, each page of the book could be played sequentially. Of course, as the book is read, the computer may keep track of the current page being read (spoken) on the writable key disk for future startup. Such bookmarks aid in returning to where the end user left off in a previous session. Of course, the end user has random access to any page of the book, and is able to flag selected pages and be able to search the pages of the book for key words or the like. The software for the end product performs the following functions. The program is loaded into computer memory and be given control (i.e. load and execute). Upon initialization, the program would look for any "last opened" bookmarks (as will be hereinafter described) indicating that the user had stopped "reading" the book prior to finishing. The "pages" of the book is stored as individual files. A bookmark is the file number of the page as well as the offset into that file that represented the last word "read". Once a page was "opened" (the file read into memory), the page is expanded by reading the encryption key that was placed in the copy-proof segment of the disk/CD, and then decrypted into ASCII text (or other suitable format) in memory. The program would then parse and convert the text word by word into speech by either current conventional text-to-speech means or by the process described below.
As the program processes the decrypted text, a file used as the aforementioned "bookmark" is updated as to the current file and offset. Subsequent pages are "opened" by the above process, clearing memory of the preceding page and loading and decrypting the data for the next until the last of the sequentially numbered files are "read". Reaching the end of the book, the bookmark is closed.
An alternative to processing a number of files corresponding to the number of pages is to compress each page of text into a binary storage field of a database. Thus each page in a book is contained in a separate record of a database with a field related to the page number and a field containing the textual data of that page. Additional searchable fields pertaining to and describing said data on that page could be included for search capabilities.
Most conventional text-to-speech conversion routines use a voice synthesizing process of applying a limited set of phonic rules to the word to be converted, thus saving considerable storage. In accordance with one version of the system and method of the present invention, an alternative method of higher quality is used in which a "dictionary" of spoken words, letters and numbers is created. The terms "spoken word" or "spoken format" are herein defined as a sound file, preferably a digitized sound file, which upon the program reading a word in a text based document, such as a book, plays the digitized sound file over the sound system of the end user's computer such that a sound recording of a human voice (not a synthesized voice) is heard by the user. This dictionary involves a database with a field for the text to be matched to the selected word in the "book" along with one or more fields that would contain recorded, digitized speech for that word. The words in this dictionary are created by a person speaking the words and recording the words on tape or the like. The analog recording of the spoken words is then converted into a digital format, and the digitized words may be edited or "trimmed" so as place the words in a uniform format. This dictionary will contain tens of thousands of such words, such that all of the words required to play back a book in spoken form are included in the dictionary. Further, similar dictionaries using Male, Female, Child, and other voices for each word in the dictionary may be provided thus allowing the user to select the voice style most pleasing to the user.
Fast record indexing routines are available to allow finding the matching word in the dictionary and process the digitized pattern to the sound card. In the case that no match was found, the program would revert to spelling the selected word - using the digitized letters also contained in the dictionary.
Referring now to Figs. 5 - 9, flow diagrams illustrating all necessary steps for operating and using the system of this invention for playing back a text file, such as a book or other text based document, in the above-described spoken format. In Fig. 5, the steps for the creation of a master dictionary of digitized words, as spoken by a person, and for updating this master dictionary (Fig. 5A) are shown. It is believed that the steps shown in Figs. 5 and 5A would fully describe the process of creating the master dictionary to one skilled in the art and thus further description herein is not required.
In Fig. 6, the steps for the preparation of a book database is shown. The book database includes a record layout that may have a first text field including processing instructions (PROCESS), a second text field (35 characters) for the text to be displayed (DISPLAY), and a memo field for compressed digitized sound to the corresponding text (SOUND). The steps for the preparation of the database are shown in Fig. 6 and are thus fully described to one skilled in the art.
The steps in a book reader program of this invention reads a text file of a book (or a passage therefrom) and commands the end user's computer to speak each word in the book (or in a selected passage of the book or other document) over the sound system so that each word may be heard by the end user. In one embodiment of this invention, as shown in Figs. 7 and 7A, the basic reading system comprises a reader program, a complete master dictionary including digitized sound files of all of the words in the dictionary (which may number hundreds of thousands of such sound filed), and a text file in an ASCII or other suitable format.
Alternatively, as shown in Fig. 10, instead of providing the entire master dictionary, a special dictionary (referred to as a Book Database in Fig. 10) may be constructed for each book which includes only the words in the book to be read, thereby minimizing the size of the dictionary and increasing speed with which words can be looked up in the dictionary by the computer.
It will be appreciated that with both reading programs, as described above and as shown in Figs. 7, 7A, 10 and 1 1, special or auxiliary dictionaries for special books or documents (e.g., a legal or a technical dictionary) may be used in conjunction with the master or special dictionary described above.
In accordance with the system and method of converting the text of a book (or other document) to spoken words played back over the sound system of the end user's computer, the time interval between the words, the rate (speed) at which the program reads the words, and the pitch of the spoken words may be selectively varied by the end user to suit the speed and sound desired by the end user. It will be recognized that some end user's may want the words read by the computer to be at a faster pace than other users. However, this is not merely a matter of increasing the speed or rate at which the computer reads the words. While, within limits, such increase or decreased in reading speed may be done without substantially affecting the sound of the words, increases or decreases in the speaking rate above or below such limits will result in a decrease in the understandability of the words spoken by the system. If the speaking rate is speeded up, the pitch of the spoken words will also increase and the spoken words may be unintelligible. Likewise, if the rate is unduly slowed down, the pitch may become too low such that the words become unintelligible. The system of this invention has a routine that allows the pitch of the words to be selectively adjusted up or down by the end user so that the rate (speed) at which the computer plays back (speaks) the words of the text may be increased or decreased and yet the words remain understandable to the end user. This is shown in Figs. 7A, 9A, and 1 1. Further in accordance with this invention, if the master dictionary of digitized words does not have a digitized word or sound corresponding to a word that appears in the text of the book being read, the text to speech program of this invention will spell the letters of the word so that the end user may discern the word. The steps for spelling such words that are not in the dictionary of digitized sound files are shown, for example, in Figs. 7A. 9A or
Still further. Fig. 7 illustrates that the system of this invention allows the end user to place a bookmark at desired locations in the text as the text is being read aloud by the computer so that the end user may readily return to a desired passage in the book.
The system and method of this invention may convert text to spoken words in two ways. First, a master dictionary of a very large number of words (i.e., digitized sound recordings of words spoken by a person and stored or recorded in a digitized format). As the computer reads the next word in the text. the computer searches for the corresponding digitized sound file in the master dictionary corresponding to the word being read, and if such word if found, the sound will be played back over the sound system of the computer, and then the next word in the text will be read.
As shown in Fig. 8, a minimized dictionary for a particular book may be prepared such that the dictionary supplied with a particular book text file contains only the words in that book. This requires a new dictionary for each book. Alternatively, the entire master dictionary may be supplied to the end user so that no matter what book or document is being read by the computer, the master dictionary will have substantially all of the words contained in the book or document. It will be recognized that special dictionaries may be created for technical or specialized vocabularies or for different languages. While the system of this invention using such dictionaries works well when the end user's computer has a fast CD reader, such dictionaries place a premium on the response time of the CD reader. It will be recognized, however, that this system of providing a master dictionary with a very large number of digitized sounds allows virtually any text file to be converted from text to speech merely by using a text file with the text to speech program of this invention.
A second method of reading the words of a book is to, upon making us the file for the book, to record the digitized sounds for the words in the order the words appear in the book. The program for creating or preparing such book file is shown in Fig. 10 and the program for playing such a book file is shown in Fig. 1 1. As shown in Fig. 10, there is a master dictionary which contains all of the digitized words. Each word in the book text file is read by the computer and, if a corresponding digitized word exists in the master dictionary, the digitized sound from the master dictionary if laid down on the book file. This process is repeated for each word in the book. Such a book file is read back by the program shown in Fig. 1 1. This program, where the digitized word is recorded for each word in the book file eliminates the need to look up the digitized sounds in the master dictionary and thus speeds up the reading of the book.
Further in accordance with this invention, while an end user is playing back a book, the end user may make notes regarding passages in the book, and at a later time, the end user may play back and hear the words of the notes.
These notes may be entered by the end user by typing on the keyboard of the end user computer. For example, a sight impaired student may study a book and may take appropriate notes and then play back the notes to aid in studying the text. Also, with the bookmarks, as noted above, may be used by the user to find certain passages in the book. It will also be recognized that since the book file contains all of the words in the book, simple word searches may be employed to find desired passages in the book. This is a distinct advantage over recording books on audio tape. Still further, in accordance with this invention, as the words of a book file are being converted to speech and played back over the sound system of the computer, the word, or (preferably) a string of words containing the word being spoken so that the word can be seen in context, may be displayed in an enlarged font may be displayed on the monitor screen of the end user's computer. It will be recognized that many sight impaired persons do have partial vision and the enlarged font display aids these persons in comprehending the spoken words. Also, as the notes a person has made are played back, such notes may be displayed in the enlarged format.
Still further, it is an object of the system and method of this invention that the speech generated by the computer sound system in speaking the words of the book passage or other text file sound as much like a human voice as possible. Of course, the use of the above-described digitized sound files of a human voice offers substantial advantages over current voice synthesized sounds. Another feature of this invention which makes the speech generated or spoken by the system of this invention is the use of the punctuation in the text to add pauses between sentences or phases, or to change the pitch of the last word or the last syllable of a word at the end of sentence or phrase. For example, if the punctuation following a word is a "period", as "colon"; or a "semi-colon", the reading program of this invention will increase the time interval until the next word is read and spoken. If the punctuation is a "question mark", the pitch of the last word may be increased or inflected to generate a sound of the word indicating an interrogative sound. Likewise, if the punctuation is an "exclamation point", the pitch and/or the loudness of the last word may be increased. Those skilled in the art that words from other languages may be incorporated into the master or the book dictionary. Also, the dictionary may include definitions of the words that may be selectively played back by the end user upon the end user commanding the computer to do so, as with key strokes or the like.
In view of the above, it will be seen that the several objects and features of this invention are achieved and other advantageous results attained.
As various changes could be made in the above constructions and methods without departing from the scope of the invention, it is intended that all matter contained in the above description or shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4700322 *||25 May 1984||13 Oct 1987||Texas Instruments Incorporated||General technique to add multi-lingual speech to videotex systems, at a low data rate|
|US5384893 *||23 Sep 1992||24 Jan 1995||Emerson & Stern Associates, Inc.||Method and apparatus for speech synthesis based on prosodic analysis|
|US5475399 *||27 Jul 1993||12 Dec 1995||Borsuk; Sherwin M.||Portable hand held reading unit with reading aid feature|
|US5661635 *||14 Dec 1995||26 Aug 1997||Motorola, Inc.||Reusable housing and memory card therefor|
|US5820379 *||14 Apr 1997||13 Oct 1998||Hall; Alfred E.||Computerized method of displaying a self-reading child's book|
|US5864823 *||25 Jun 1997||26 Jan 1999||Virtel Corporation||Integrated virtual telecommunication system for E-commerce|
|US5991594 *||21 Jul 1997||23 Nov 1999||Froeber; Helmut||Electronic book|
|1||*||Zoom Text Etra Usre's Guide Version 6.1 Manchester Center, Vermont, USA: Ai Squared 1997. pp. 85, 88, 96, 106- 108, 117- 124, XP002929376|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|EP1122716A2 *||8 Dec 2000||8 Aug 2001||Deutsche Telekom AG||Appararus for the transformation of printed text into speech|
|EP1122716A3 *||8 Dec 2000||14 Nov 2001||Deutsche Telekom AG||Appararus for the transformation of printed text into speech|
|EP2316076A2 *||4 Aug 2009||4 May 2011||Hewlett-Packard Development Company, L.P.||Bookmarks for flexible integrated access to published material|
|EP2316076A4 *||4 Aug 2009||10 Aug 2011||Hewlett Packard Development Co||Bookmarks for flexible integrated access to published material|
|9 Nov 2000||AL||Designated countries for regional patents|
Kind code of ref document: A1
Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG
|9 Nov 2000||AK||Designated states|
Kind code of ref document: A1
Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW
|21 Dec 2000||DFPE||Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)|
|3 Jan 2001||121||Ep: the epo has been informed by wipo that ep was designated in this application|
|28 Feb 2002||REG||Reference to national code|
Ref country code: DE
Ref legal event code: 8642
|25 Sep 2002||122||Ep: pct application non-entry in european phase|
|29 Jan 2004||NENP||Non-entry into the national phase in:|
Ref country code: JP