WO2002003374A1 - A method for generating a musical tone - Google Patents

A method for generating a musical tone Download PDF

Info

Publication number
WO2002003374A1
WO2002003374A1 PCT/FI2001/000630 FI0100630W WO0203374A1 WO 2002003374 A1 WO2002003374 A1 WO 2002003374A1 FI 0100630 W FI0100630 W FI 0100630W WO 0203374 A1 WO0203374 A1 WO 0203374A1
Authority
WO
WIPO (PCT)
Prior art keywords
note
musical
musical tone
based code
generating
Prior art date
Application number
PCT/FI2001/000630
Other languages
French (fr)
Inventor
Tero Tolonen
Matti Airas
Original Assignee
Oy Elmorex Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oy Elmorex Ltd filed Critical Oy Elmorex Ltd
Priority to AU2001282156A priority Critical patent/AU2001282156A1/en
Publication of WO2002003374A1 publication Critical patent/WO2002003374A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M19/00Current supply arrangements for telephone systems
    • H04M19/02Current supply arrangements for telephone systems providing ringing current or supervisory tones, e.g. dialling tone or busy tone
    • H04M19/04Current supply arrangements for telephone systems providing ringing current or supervisory tones, e.g. dialling tone or busy tone the ringing-current being generated at the substations
    • H04M19/041Encoding the ringing signal, i.e. providing distinctive or selective ringing capability
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • G10H1/0058Transmission between separate instruments or between individual components of a musical system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H3/00Instruments in which the tones are generated by electromechanical means
    • G10H3/12Instruments in which the tones are generated by electromechanical means using mechanical resonant generators, e.g. strings or percussive instruments, the tones of which are picked up by electromechanical transducers, the electrical signals being further manipulated or amplified and subsequently converted to sound by a loudspeaker or equivalent instrument
    • G10H3/125Extracting or recognising the pitch or fundamental frequency of the picked up signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2230/00General physical, ergonomic or hardware implementation of electrophonic musical tools or instruments, e.g. shape or architecture
    • G10H2230/005Device type or category
    • G10H2230/021Mobile ringtone, i.e. generation, transmission, conversion or downloading of ringing tones or other sounds for mobile telephony; Special musical data formats or protocols herefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/011Files or data streams containing coded musical information, e.g. for transmission
    • G10H2240/046File format, i.e. specific or non-standard musical file format used in or adapted for electrophonic musical instruments, e.g. in wavetables
    • G10H2240/056MIDI or other note-oriented file format
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/201Physical layer or hardware aspects of transmission to or from an electrophonic musical instrument, e.g. voltage levels, bit streams, code words or symbols over a physical link connecting network nodes or instruments
    • G10H2240/241Telephone transmission, i.e. using twisted pair telephone lines or any type of telephone network
    • G10H2240/251Mobile telephone transmission, i.e. transmitting, accessing or controlling music data wirelessly via a wireless or mobile telephone receiver, analog or digital, e.g. DECT GSM, UMTS
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments
    • G10H2240/281Protocol or standard connector for transmission of analog or digital data to or from an electrophonic musical instrument
    • G10H2240/321Bluetooth
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/135Autocorrelation

Definitions

  • the invention relates to a method for generating a musical tone, such as a ringing tone.
  • the invention suits particularly well for generating ringing or warning tones for mobile terminals or multimedia devices.
  • 'musical tone' refers to ringing tones, warning tones, alarm tones or any other similar type of tone.
  • users of mobile terminals have been able to download ringing tone melodies that have been created and provided for example by network operators.
  • the users may have used web-based tools for creating a melody of their own or a tool for creating a melody may have been incorporated in the user device, such as a mobile terminal.
  • the latter two methods employ common music notation in their user interfaces and thus the user of these methods has to possess knowledge on musical theory or at least on musical notation in order to create new melodies. Disclosure of the Invention
  • An object of the present invention is to provide a method for generating a musical tone, such as a ringing tone, without musical skills.
  • Another object of the invention is a device, such as a network server or a user terminal, which implements the method according to the invention.
  • the invention is based on using a musical seed for providing the musical tone.
  • the musical seed is a musical content provided by a user and it may be in audio format or in note-based code format. If the musical seed is an audio signal, the audio signal is converted into a note-based code by an audio- to-notes conversion. The musical tone is generated on the basis of the note- based code.
  • the audio signal may be produced for example by singing, humming, whistling, or by playing an instrument.
  • the method of the invention is preferably executed in a network server; alternatively, the method may be executed in a user terminal. If a network server is employed, a user connects to the server via a wireless or a fixed connection.
  • connection protocols include, but are not limited to, the Internet protocol (IP), a wireless voice protocol of the Global System for Mobile Communications (GSM) or the like, wireless data protocols (e.g. data over GSM), short message service (SMS), wireless application protocol (WAP), telephone voice connection, modem connection, ISDN, infrared connection, local radio connection (e.g. BlueTooth).
  • IP Internet protocol
  • GSM Global System for Mobile Communications
  • SMS short message service
  • WAP wireless application protocol
  • telephone voice connection modem connection
  • ISDN infrared connection
  • local radio connection e.g. BlueTooth
  • the user provides a melody or a musical seed for the tone generation method.
  • the forms of the user input can be categorized into audio formats and note-based code formats.
  • the audio formats include, but are not limited to, waveform audio (digitized audio), encoded audio (obtained by using for example speech coding methods, such as methods based on linear prediction, or general audio coding methods, such as the transform codecs in the MPEG family), streaming audio, and audio files in the aforementioned formats.
  • the note-based formats include, but are not limited to, MIDI, MIDI files, ringing tone formats, music representation languages, such as CSound, and MPEG-4 synthetic audio.
  • the server provides a musical tone on the basis of the user's input.
  • the musical tone is provided by generating a code sequence corresponding to new melody lines, i.e. a new combination of notes, by using said note-based code as an input for a composing method which produces a new melody and by converting said new melody into a musical tone.
  • the term 'melody line' refers generally to musical content formed by a combination of notes and pauses.
  • the note-based code may be considered as an old melody line.
  • the note-based code is converted directly into a musical tone.
  • the second embodiment is similar to the above-described first embodiment with the distinction that now the composing method is not employed, but the note-based code is used as such for generating the tone.
  • the note-based code is compared to melodies which have been previously stored in a memory, then the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone.
  • a code sequ- ence corresponding to new melody lines is generated by using said note- based code as an input for a composing method which produces a new melody.
  • the new melody is compared to melodies, which have been previously stored in a memory, and the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone.
  • the fourth embodiment is a combination of the above described first and third embodiments.
  • Converting the note-based code into a musical tone means converting the note-based code into a tone of a suitable form for delivery to the user or for storage.
  • the note-based code may be simply encoded into the form of a ringing tone in Nokia Smart Message form or similar.
  • the musical tone may be stored on the server and/or delivered to the user by using the aforementioned connections and formats.
  • the tone can be delivered to the user terminal for example by using vendor-specific means, such as Nokia Smart Messaging, or by making the tone available for download at a web site or by downloading the tone directly over IP or via WAP (Wireless Application Protocol) gateway or in any other suitable manner.
  • vendor-specific means such as Nokia Smart Messaging
  • the musical tone is delivered to the user in the form of common musical notation for editing with some suitable software tool, or in a non- editable form.
  • the tone is delivered for editing for example in the form of a common musical notation, such as written notes or as a MIDI code.
  • the server includes functionality for playback and/or for editing the musical tone.
  • the audio-to-notes conversion method preferably comprises estimating fundamental frequencies of the audio signal for obtaining a sequence of fundamental frequencies and detecting note events on the basis of the sequence of fundamental frequencies for obtaining the note-based code.
  • the audio signal containing musical information is processed in frames, and the note-based code representing musical information is constructed at the same time as the input signal is provided.
  • the signal level of a frame is first measured and compared to a predetermined signal level threshold. If the signal level threshold is exceeded, a voicing decision is executed for judging if the frame is voiced or unvoiced. If the frame is judged voiced, the fundamental frequency of the frame is estimated and quantized for obtaining a quantized present fundamental frequency. Then, it is decided on the basis of the quantized present fundamental frequency whether a note is found. If a note is found, the quantized present fundamental frequency is compared to the fundamental frequency of the previous frame.
  • a note-off event and then a note-on event, after the note-off event are applied. If the previous and present fundamental frequencies are the same, nothing is done. If the signal level threshold is not exceeded or if the frame is judged unvoiced or if a note is not found, it is detected whether a note-on event is currently valid and if yes, a note-off event is applied. The procedure is repeated frame by frame at the same time as the audio signal is received for obtaining the note-based code.
  • An advantage of the invention is that it can be used by people without knowledge on musical theory for producing a musical tone, such as a ringing tone, by providing a musical presentation for example by singing, humming, whistling or playing an instrument.
  • a musical tone such as a ringing tone
  • the invention provides a simple method for personalizing mobile terminals and other similar devices. Additionally, self-made musical content can be stored in the form of a musical tone.
  • Figure 1A is a flow diagram illustrating a method according to the invention
  • Figure 1 B is a block diagram illustrating an arrangement according to an embodiment of the invention
  • Figure 1C is a block diagram illustrating an arrangement according to another embodiment of the invention.
  • FIG. 2 illustrates the audio-to-notes conversion according to an embodiment of the invention
  • Figure 3 is a flow diagram illustrating fundamental frequency estimation according to an embodiment of the invention
  • FIGS. 4A and 4B illustrate time-domain windowing
  • Figures 5A to 6B illustrate an example of the effect of the LPC whitening
  • Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention.
  • the principle of the invention is to provide a musical tone, i.e. a ringing tone or the like, on the basis of a musical seed given by the user in the form of an audio signal or in the form of a note-based code.
  • FIG. 1A is a flow diagram illustrating a method according to the invention for generating a musical tone.
  • the musical seed is provided in the form of an audio signal, and this audio signal is converted into a note- based code with an audio-to-notes conversion method in step 3.
  • the audio-to-notes conversion comprises fundamental frequency estimation and note detection.
  • the musical seed is provided in the form of a note-based code.
  • the note-based code obtained by the audio-to-notes conversion or from the user is used for generating a musical tone in one of steps 4a, 4b, 4c and 4d.
  • step 4a the note-based code is used as a seed sequence for a composition method.
  • An automated composition method which is preferably used for this, is disclosed in [2].
  • This composition method generates code sequences corresponding to new melody lines on the basis of a seed sequence (training sequence).
  • the new melody adapts to changes in the input signal, but it is not necessarily the same. In this way, for example deficiencies in the input signal, if any, are corrected or smoothened.
  • the new melody lines are then converted into the musical tone.
  • step 4b the note-based code is converted directly into a musical tone.
  • This method allows users to sing a melody, for example, and to receive the melody they sang in the form of a ringing tone.
  • step 4c the note-based code is compared to melodies stored in a memory to find the melody that is the closest match with the note-based code.
  • the melody that is the closest match is then converted into the musical tone.
  • Step 4d is a combination of steps 4a and 4c.
  • the note-based code is used for generating new melody lines with a composition method and then the new melody lines are compared to the melodies stored in a memory and the melody corresponding to the closest match is converted into a musical tone.
  • the composition method enables deficiencies in the input signal, if any, to be corrected or smoothened, and therefore, comparison to stored melodies may become easier.
  • the comparison may be based on a distance measure computed on the intervals of the seed sequence, duration of individual notes in the sequence, absolute pitches of the notes in the sequence, or other musical information contained in the sequence.
  • step 5 the musical tone is delivered to the user in the form of a common musical notation for editing with some suitable software tool or for playback.
  • step 6 the tone is delivered to the user.
  • Step 5 or 6 may also include storing the tone in a file.
  • the file may be for example a MIDI file in which sound event descriptions are stored, or it may be a sound file which stores synthesized sound.
  • FIG. 1B is a block diagram illustrating an arrangement according to an embodiment of the invention.
  • a user connects from a mobile user terminal 8 or from a fixed user terminal 9 to a server 10a through a suitable connection.
  • the mobile user terminal 8 is typically a mobile phone or some other wireless device and the fixed user terminal 9 is typically a workstation or a personal computer.
  • the server process may be incorporated in the user terminal, but typically the server is a separate network server.
  • the user provides a musical seed, and the musical seed is transmitted to the server 10a in any suitable form. Some possible data formats and transmission protocols were described in the above description.
  • the server 10a executes the tone generation method according to the invention and returns the generated tone to the user terminal 8 or 9.
  • FIG. 1C is a block diagram illustrating an arrangement according to another embodiment of the invention.
  • the arrangement includes a wireless communication network 13 and the Internet 15.
  • the wireless network may be for example a GSM or a UMTS (Universal Mobile Telecommunications System) network.
  • GSM Global System for Mobile communications
  • UMTS Universal Mobile Telecommunications System
  • a mobile user terminal 8 and a server 10b are connected to the wireless network.
  • the mobile user terminal 8 is used for providing a musical seed to the server 10b for example through a voice connection.
  • the server 10b generates a musical tone and returns the musical tone to the mobile user terminal 8 for example in ringing tone format via SMSC 17 (Short Message Service Center).
  • a fixed user terminal 9 and a server 10c are connected to the Internet.
  • the fixed user terminal 9 is used for providing a musical seed to the server 10c for example through a voice over IP connection.
  • the mobile user terminal 8 may be used for providing a musical seed to the server 10c.
  • the connection between the mobile user terminal 8 and the server 10c is established through a WAP gateway 14, which connects the wireless network and the Internet and provides Internet services to mobile networks, and the server 10c then generates a musical tone.
  • the musical tone is returned to the fixed user terminal 9 for example as audio over IP or by placing the musical tone into a file available for download on an Internet site. To the mobile user terminal 8 the musical tone is transmitted through the WAP gateway.
  • the audio-to-notes conversion according to an embodiment of the invention can be divided into two steps, as shown in Figure 2: fundamental frequency estimation 21 and note detection 22.
  • step 21 an audio input is segmented into frames in time and the fundamental frequency of each frame is estimated.
  • the processing treatment of the signal is executed in the digital domain, and therefore, the audio input is digitized with an A/D converter prior to the fundamental frequency estimation, if the audio input is not already in a digital form.
  • fundamental frequency estimation alone is not sufficient for producing the note-based code. Therefore, in step 22, consecutive fundamental frequencies are further processed for detecting the notes.
  • the autocorrelation function has been widely adopted for fundamental frequency estimation, and it is also preferred in the method according to the invention. However, it is not mandatory for the method of the invention to employ autocorrelation in fundamental frequency estimation, but also other fundamental frequency estimation methods can be applied. Other techniques for fundamental frequency estimation can be found for example in [3].
  • the present estimation algorithm is based on detection of a fundamental period in an audio signal segment (frame). The fundamental period is denoted as TO (in samples) and it is related to the fundamental frequency as t ° -A T, .
  • f s is the sampling frequency in Hz.
  • the fundamental frequency is obtained from the estimated fundamental period by using Equation 1.
  • Figure 3 is a flow diagram illustrating the operation of the fundamental frequency (or period) estimation.
  • the input signal is segmented into frames in time and the frames are treated separately.
  • the input signal Audio In is first filtered with a high-pass filter (HPF) in order to remove the DC component of the signal Audio In.
  • HPF high-pass filter
  • the next step 31 in the chain is optional linear predictive coding
  • LPC LPC whitening of the spectrum of the signal segment (frame).
  • the signal is then autocorrelated.
  • the fundamental period estimate is obtained from the autocorrelation function of the signal by using peak detection in step 33.
  • the fundamental period estimate is filtered with a median filter in order to remove spurious peaks.
  • the human voice production mechanism is typically considered as a source-filter system, i.e. an excitation signal is created and filtered by a linear system that models a vocal tract.
  • the excitation signal is periodic and it is produced at the glottis.
  • the period of the excitation signal determines the fundamental frequency of the tone.
  • the vocal tract may be considered as a linear resonator that affects the periodic excitation signal, for example, the shape of the vocal tract determining the vowel that is perceived.
  • the vocal tract can be modeled for example by using an all pole model, i.e. as an Nth order digital filter with a transfer function of
  • a k are filter coefficients.
  • the filter coefficients may be obtained by using linear prediction, that is by solving a linear system involving an autocorrelation matrix and the parameters a k .
  • the linear system is most conveniently solved using the Levinson-Durbin recursion, which is disclosed for example in [4].
  • the whitened signal x(n) is obtained by inverse filtering the non-whitened signal x'(n) by using the inverse of the transfer function in Equation 3.
  • Figures 4A and 4B illustrate time-domain windowing.
  • Figure 4A shows a signal windowed with a rectangular window and
  • Figure 4B shows a signal windowed with a Hamming window. Windowing is not shown in Figure 3, but it is assumed that the signal is windowed before the step 32.
  • FIG. 5A to 6B An example of the effect of LPC whitening is illustrated in Figures 5A to 6B.
  • Figures 5A, 5B and 5C depict a spectrum, an LPC spectrum and an inverse-filtered (whitened) spectrum of the Hamming-windowed signal of Figure 4B, respectively.
  • Figures 6A and 6B illustrate an example of the effect of LPC whitening in the autocorrelation function.
  • Figure 6A illustrates the autocorrelation function of the whitened signal of Figure 5C
  • Figure 6B that of the (non-whitened) signal of Figure 5A. It can be seen that local maxima stand out relatively more clearly in the autocorrelation function of the whitened spectrum of Figure 6A than in that of the non-whitened spectrum of Figure 6B. Therefore, this example suggests that it is advantageous to apply LPC whitening to the autocorrelation maximum detection problem.
  • LPC whitening decreases the accuracy of the estimator. This concerns particularly signals that contain high-pitched tones. Therefore, it is not always advantageous to employ LPC whitening, and, consequently, the present fundamental period estimation can be applied either with or without LPC whitening.
  • the autocorrelation of the signal is implemented by using short-time autocorrelation analysis, as disclosed in [5].
  • the short-time autocorrelation function operating on a short segment of signal x(n) is defined as
  • ⁇ k (m) — [x(n + )w(n)][x(n + + m)w(n + m)], 0 ⁇ m ⁇ C -1 W
  • c is the number of autocorrelation points to be analyzed
  • N is the number of samples
  • w( ) is the time-domain window function, such as a Hamming window.
  • the length of the time-domain window function w(n) determines the time resolution of the analysis.
  • a tapered window that is at least twice the period of the lowest fundamental frequency. This means that if for example 50 Hz is chosen as the lower limit for the fundamental frequency estimation, the minimum window length is 40 ms. At a sampling frequency of 22 050 Hz, this corresponds to 882 samples.
  • the window length it is attractive to choose the window length to be the smallest power of two that is larger than 40 ms. This is because the Fast Fourier Transform (FFT) is used to calculate the autocorrelation function and the FFT requires that the window length is a power of two.
  • FFT Fast Fourier Transform
  • the sequence has to be zero-padded before FFT calculation.
  • Zero padding simply refers to appending zeros to the signal segment in order to increase the signal length to the required value.
  • the short-time autocorrelation function is calculated as
  • x(n) is the windowed signal segment and IFFT denotes the inverse- FFT.
  • the estimated fundamental period To is obtained by peak detection, which searches for the local maximum value of ⁇ (m) (autocorrelation peak) for each k in a meaningful range of the autocorrelation lag m.
  • the peak detection is further improved by parabolic interpolation. In parabolic interpolation, a parabola is fitted to the three points consisting of a local maximum and two values adjacent to the local maximum.
  • the median filter preferably used in the method according to the invention is a three-tap median filter.
  • the above-described method for the estimation of the fundamental frequency is quite reliable in detecting the fundamental frequency of a sound signal with a single prominent harmonic source (for example voiced speech, singing, musical instruments that provide harmonic sound). Furthermore, the method derives a time trajectory of the estimated fundamental frequencies so that it follows the changes in the fundamental frequency of the sound signal.
  • the time trajectory of the fundamental frequencies needs to be further processed for obtaining a note-based code. Specifically, the time trajectory needs to be analyzed into a sequence of event pairs indicating the start, pitch and end of a note, which is referred to as note detection.
  • note detection refers to the forming of note events from the fundamental frequency trajectory.
  • a note event comprises for example a starting position (note-on event), pitch, and ending position (note-off event) of a note.
  • the time trajectory may be transformed into a sequence of single length units, such as quavers, according to a user- determined tempo.
  • Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention.
  • a frame of the audio signal is investigated at a time.
  • the signal level of a frame of the audio signal' is measured. Typically, an energy-based signal-level measurement is applied, although it is possible to use more sophisticated methods, e.g. auditorily motivated loudness measurements.
  • the signal level obtained from step 70 is compared to a predetermined threshold. If the signal level is below the threshold, it is decided that no tone is present in the current frame. Therefore, the analysis is aborted and step 76 is executed. If the signal level is above the threshold, a voicing decision
  • step 72 (voiced/unvoiced) is made in steps 72 and 73.
  • the voicing decision is made on the basis of the ratio of the signal level at a prominent lag in the autocorrelation function of the frame to the frame energy. This ratio is determined in steps 72 and 73, and the ratio is compared with a predetermined threshold. In other words, it is determined if there is voice or a pause in the original signal during that frame. If the frame is judged unvoiced in step 73, i.e. it is decided that no prominent harmonic tones are present in the current frame, the analysis is aborted and step 76 is executed. Otherwise, the execution proceeds to step 74. In step 74, the fundamental frequency of the frame is estimated.
  • the voicing decision is integrated in the fundamental frequency estimation, but logically they are independent blocks and therefore presented as separate steps.
  • the fundamental frequency of the frame is also quantized preferably into a semitone scale, such as a MIDI pitch scale.
  • median filtering is applied for removing spurious peaks and for deciding if a note was found or not. In other words, for example three consecutive fundamental frequencies are detected and if one of them differs very much from the others, that particular frequency is rejected because it is probably a noise peak. If no note is found in step 75, the execution proceeds to step 76. In step 76, it is detected if a note-on event is currently valid, and if so, a note-off event is applied. If a note-on event is not valid, nothing is done.
  • the fundamental frequency estimated in step 74 is compared to the fundamental frequency of the currently active note (of the previous frame). If the values are different, a note- off event is applied to stop the currently active note, and a note-on event is applied to start a new note event. If the fundamental frequency estimated in step 74 is the same as the fundamental frequency of the currently active note, nothing is done.

Abstract

A method for generating a musical tone, such as a ringing tone, the method comprising inputting (1, 2) a musical seed, and providing (3, 4) the musical tone on the basis of the musical seed. If the musical seed is in the form of a note-based code, the musical tone is generated (4a, 4b, 4c, 4d) on the basis of said note-based code. If the musical seed is in the form of an audio signal, an audio-to-notes conversion is applied (3) to the audio signal for generating a note-based code representing the musical seed, and the musical tone is generated (4a, 4b, 4c, 4d) on the basis of said note-based code.

Description

A method for generating a musical tone
Field of the Invention
The invention relates to a method for generating a musical tone, such as a ringing tone. The invention suits particularly well for generating ringing or warning tones for mobile terminals or multimedia devices.
Background of the Invention
With increasing use of mobile terminals, personal computers, multimedia terminals, and other similar devices, the need to personalize these devices, for example by personalized ringing and alarm tones, increases as well. Herein, the term 'musical tone' refers to ringing tones, warning tones, alarm tones or any other similar type of tone.
Typically, users of mobile terminals have been able to download ringing tone melodies that have been created and provided for example by network operators. Alternatively, the users may have used web-based tools for creating a melody of their own or a tool for creating a melody may have been incorporated in the user device, such as a mobile terminal. The latter two methods employ common music notation in their user interfaces and thus the user of these methods has to possess knowledge on musical theory or at least on musical notation in order to create new melodies. Disclosure of the Invention
An object of the present invention is to provide a method for generating a musical tone, such as a ringing tone, without musical skills. Another object of the invention is a device, such as a network server or a user terminal, which implements the method according to the invention. These objects are achieved with the method and device, which are characterized by what is disclosed in the attached independent claims. Preferred embodiments of the invention are disclosed in the attached dependent claims.
The invention is based on using a musical seed for providing the musical tone. The musical seed is a musical content provided by a user and it may be in audio format or in note-based code format. If the musical seed is an audio signal, the audio signal is converted into a note-based code by an audio- to-notes conversion. The musical tone is generated on the basis of the note- based code. The audio signal may be produced for example by singing, humming, whistling, or by playing an instrument. The method of the invention is preferably executed in a network server; alternatively, the method may be executed in a user terminal. If a network server is employed, a user connects to the server via a wireless or a fixed connection. The possible connection protocols include, but are not limited to, the Internet protocol (IP), a wireless voice protocol of the Global System for Mobile Communications (GSM) or the like, wireless data protocols (e.g. data over GSM), short message service (SMS), wireless application protocol (WAP), telephone voice connection, modem connection, ISDN, infrared connection, local radio connection (e.g. BlueTooth). Bluetooth technique may be employed for example when the method of the invention is executed in a user terminal.
The user provides a melody or a musical seed for the tone generation method. The forms of the user input can be categorized into audio formats and note-based code formats. The audio formats include, but are not limited to, waveform audio (digitized audio), encoded audio (obtained by using for example speech coding methods, such as methods based on linear prediction, or general audio coding methods, such as the transform codecs in the MPEG family), streaming audio, and audio files in the aforementioned formats. The note-based formats include, but are not limited to, MIDI, MIDI files, ringing tone formats, music representation languages, such as CSound, and MPEG-4 synthetic audio.
The server provides a musical tone on the basis of the user's input. According to a first embodiment of the invention, the musical tone is provided by generating a code sequence corresponding to new melody lines, i.e. a new combination of notes, by using said note-based code as an input for a composing method which produces a new melody and by converting said new melody into a musical tone. Herein, the term 'melody line' refers generally to musical content formed by a combination of notes and pauses. In contrast to the new melody lines, the note-based code may be considered as an old melody line.
According to a second embodiment of the invention, the note-based code is converted directly into a musical tone. Thus, the second embodiment is similar to the above-described first embodiment with the distinction that now the composing method is not employed, but the note-based code is used as such for generating the tone.
According to a third embodiment of the invention, the note-based code is compared to melodies which have been previously stored in a memory, then the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone.
According to a fourth embodiment of the invention, a code sequ- ence corresponding to new melody lines is generated by using said note- based code as an input for a composing method which produces a new melody. The new melody is compared to melodies, which have been previously stored in a memory, and the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone. In other words, the fourth embodiment is a combination of the above described first and third embodiments.
Converting the note-based code into a musical tone means converting the note-based code into a tone of a suitable form for delivery to the user or for storage. For example, the note-based code may be simply encoded into the form of a ringing tone in Nokia Smart Message form or similar. The musical tone may be stored on the server and/or delivered to the user by using the aforementioned connections and formats. The tone can be delivered to the user terminal for example by using vendor-specific means, such as Nokia Smart Messaging, or by making the tone available for download at a web site or by downloading the tone directly over IP or via WAP (Wireless Application Protocol) gateway or in any other suitable manner.
The musical tone is delivered to the user in the form of common musical notation for editing with some suitable software tool, or in a non- editable form. The tone is delivered for editing for example in the form of a common musical notation, such as written notes or as a MIDI code. Typically, the server includes functionality for playback and/or for editing the musical tone.
The audio-to-notes conversion method according to a further embodiment of the invention preferably comprises estimating fundamental frequencies of the audio signal for obtaining a sequence of fundamental frequencies and detecting note events on the basis of the sequence of fundamental frequencies for obtaining the note-based code.
In an audio-to-notes conversion method according to a still further embodiment of the invention, the audio signal containing musical information is processed in frames, and the note-based code representing musical information is constructed at the same time as the input signal is provided. The signal level of a frame is first measured and compared to a predetermined signal level threshold. If the signal level threshold is exceeded, a voicing decision is executed for judging if the frame is voiced or unvoiced. If the frame is judged voiced, the fundamental frequency of the frame is estimated and quantized for obtaining a quantized present fundamental frequency. Then, it is decided on the basis of the quantized present fundamental frequency whether a note is found. If a note is found, the quantized present fundamental frequency is compared to the fundamental frequency of the previous frame. If the previous and present fundamental frequencies are different, first a note-off event and then a note-on event, after the note-off event, are applied. If the previous and present fundamental frequencies are the same, nothing is done. If the signal level threshold is not exceeded or if the frame is judged unvoiced or if a note is not found, it is detected whether a note-on event is currently valid and if yes, a note-off event is applied. The procedure is repeated frame by frame at the same time as the audio signal is received for obtaining the note-based code.
An advantage of the invention is that it can be used by people without knowledge on musical theory for producing a musical tone, such as a ringing tone, by providing a musical presentation for example by singing, humming, whistling or playing an instrument. Thus, the invention provides a simple method for personalizing mobile terminals and other similar devices. Additionally, self-made musical content can be stored in the form of a musical tone.
Brief Description of the Drawings In the following the invention will be described in greater detail by means of the preferred embodiments with reference to the accompanying drawings, in which
Figure 1A is a flow diagram illustrating a method according to the invention, Figure 1 B is a block diagram illustrating an arrangement according to an embodiment of the invention,
Figure 1C is a block diagram illustrating an arrangement according to another embodiment of the invention,
Figure 2 illustrates the audio-to-notes conversion according to an embodiment of the invention,
Figure 3 is a flow diagram illustrating fundamental frequency estimation according to an embodiment of the invention,
Figures 4A and 4B illustrate time-domain windowing,
Figures 5A to 6B illustrate an example of the effect of the LPC whitening, Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention.
Preferred Embodiments of the Invention
The principle of the invention is to provide a musical tone, i.e. a ringing tone or the like, on the basis of a musical seed given by the user in the form of an audio signal or in the form of a note-based code.
Figure 1A is a flow diagram illustrating a method according to the invention for generating a musical tone. In step 1, the musical seed is provided in the form of an audio signal, and this audio signal is converted into a note- based code with an audio-to-notes conversion method in step 3. In a preferred embodiment of the invention, which is described in detail with reference to Figure 2, the audio-to-notes conversion comprises fundamental frequency estimation and note detection. In step 2, the musical seed is provided in the form of a note-based code.
The note-based code obtained by the audio-to-notes conversion or from the user is used for generating a musical tone in one of steps 4a, 4b, 4c and 4d. In step 4a, the note-based code is used as a seed sequence for a composition method. An automated composition method, which is preferably used for this, is disclosed in [2]. This composition method generates code sequences corresponding to new melody lines on the basis of a seed sequence (training sequence). The new melody adapts to changes in the input signal, but it is not necessarily the same. In this way, for example deficiencies in the input signal, if any, are corrected or smoothened. The new melody lines are then converted into the musical tone.
In step 4b, the note-based code is converted directly into a musical tone. This method allows users to sing a melody, for example, and to receive the melody they sang in the form of a ringing tone.
In step 4c, the note-based code is compared to melodies stored in a memory to find the melody that is the closest match with the note-based code. The melody that is the closest match is then converted into the musical tone. In this way, users can, for example, sing a part of a melody of music and receive a ringing tone according to the melody they sang. Step 4d is a combination of steps 4a and 4c. First, the note-based code is used for generating new melody lines with a composition method and then the new melody lines are compared to the melodies stored in a memory and the melody corresponding to the closest match is converted into a musical tone. The composition method enables deficiencies in the input signal, if any, to be corrected or smoothened, and therefore, comparison to stored melodies may become easier. The comparison may be based on a distance measure computed on the intervals of the seed sequence, duration of individual notes in the sequence, absolute pitches of the notes in the sequence, or other musical information contained in the sequence.
In step 5, the musical tone is delivered to the user in the form of a common musical notation for editing with some suitable software tool or for playback. In step 6, the tone is delivered to the user. Step 5 or 6 may also include storing the tone in a file. The file may be for example a MIDI file in which sound event descriptions are stored, or it may be a sound file which stores synthesized sound.
Figure 1B is a block diagram illustrating an arrangement according to an embodiment of the invention. A user connects from a mobile user terminal 8 or from a fixed user terminal 9 to a server 10a through a suitable connection. The mobile user terminal 8 is typically a mobile phone or some other wireless device and the fixed user terminal 9 is typically a workstation or a personal computer. The server process may be incorporated in the user terminal, but typically the server is a separate network server. The user provides a musical seed, and the musical seed is transmitted to the server 10a in any suitable form. Some possible data formats and transmission protocols were described in the above description. The server 10a executes the tone generation method according to the invention and returns the generated tone to the user terminal 8 or 9.
Figure 1C is a block diagram illustrating an arrangement according to another embodiment of the invention. The arrangement includes a wireless communication network 13 and the Internet 15. The wireless network may be for example a GSM or a UMTS (Universal Mobile Telecommunications System) network.
A mobile user terminal 8 and a server 10b are connected to the wireless network. The mobile user terminal 8 is used for providing a musical seed to the server 10b for example through a voice connection. The server 10b generates a musical tone and returns the musical tone to the mobile user terminal 8 for example in ringing tone format via SMSC 17 (Short Message Service Center).
A fixed user terminal 9 and a server 10c are connected to the Internet. The fixed user terminal 9 is used for providing a musical seed to the server 10c for example through a voice over IP connection. Alternatively, the mobile user terminal 8 may be used for providing a musical seed to the server 10c. The connection between the mobile user terminal 8 and the server 10c is established through a WAP gateway 14, which connects the wireless network and the Internet and provides Internet services to mobile networks, and the server 10c then generates a musical tone. The musical tone is returned to the fixed user terminal 9 for example as audio over IP or by placing the musical tone into a file available for download on an Internet site. To the mobile user terminal 8 the musical tone is transmitted through the WAP gateway. The audio-to-notes conversion according to an embodiment of the invention can be divided into two steps, as shown in Figure 2: fundamental frequency estimation 21 and note detection 22. In step 21 , an audio input is segmented into frames in time and the fundamental frequency of each frame is estimated. The processing treatment of the signal is executed in the digital domain, and therefore, the audio input is digitized with an A/D converter prior to the fundamental frequency estimation, if the audio input is not already in a digital form. However, fundamental frequency estimation alone is not sufficient for producing the note-based code. Therefore, in step 22, consecutive fundamental frequencies are further processed for detecting the notes. In the following description, the operation of these two steps according to the preferred embodiments of the invention is explained in detail.
Numerous techniques exist for estimation of the fundamental frequency of audio signals, such as speech or musical melodies. The autocorrelation function has been widely adopted for fundamental frequency estimation, and it is also preferred in the method according to the invention. However, it is not mandatory for the method of the invention to employ autocorrelation in fundamental frequency estimation, but also other fundamental frequency estimation methods can be applied. Other techniques for fundamental frequency estimation can be found for example in [3]. The present estimation algorithm is based on detection of a fundamental period in an audio signal segment (frame). The fundamental period is denoted as TO (in samples) and it is related to the fundamental frequency as t ° -A T, .
0 (1)
where fs is the sampling frequency in Hz. The fundamental frequency is obtained from the estimated fundamental period by using Equation 1.
Figure 3 is a flow diagram illustrating the operation of the fundamental frequency (or period) estimation. The input signal is segmented into frames in time and the frames are treated separately. In step 30, the input signal Audio In is first filtered with a high-pass filter (HPF) in order to remove the DC component of the signal Audio In. The transfer function of the HPF may be for example {z) = ^, 0 < a <1
1-az"1 (2)
where a is the filter coefficient. The next step 31 in the chain is optional linear predictive coding
(LPC) whitening of the spectrum of the signal segment (frame). In step 32, the signal is then autocorrelated. The fundamental period estimate is obtained from the autocorrelation function of the signal by using peak detection in step 33. Finally in step 34, the fundamental period estimate is filtered with a median filter in order to remove spurious peaks. In the following paragraphs, LPC whitening, autocorrelation and peak detection will be explained in detail.
The human voice production mechanism is typically considered as a source-filter system, i.e. an excitation signal is created and filtered by a linear system that models a vocal tract. In voiced (harmonic) tones or in voiced speech, the excitation signal is periodic and it is produced at the glottis. The period of the excitation signal determines the fundamental frequency of the tone. The vocal tract may be considered as a linear resonator that affects the periodic excitation signal, for example, the shape of the vocal tract determining the vowel that is perceived. In practice, it is often attractive to minimize the contribution of the vocal tract in the signal prior to the fundamental period detection. In signal processing terms this means inverse filtering (whitening) in order to remove the contribution of the linear model that corresponds to the vocal tract. The vocal tract can be modeled for example by using an all pole model, i.e. as an Nth order digital filter with a transfer function of
Figure imgf000011_0001
where ak are filter coefficients. The filter coefficients may be obtained by using linear prediction, that is by solving a linear system involving an autocorrelation matrix and the parameters ak. The linear system is most conveniently solved using the Levinson-Durbin recursion, which is disclosed for example in [4]. After solving the parameters a , the whitened signal x(n) is obtained by inverse filtering the non-whitened signal x'(n) by using the inverse of the transfer function in Equation 3.
Figures 4A and 4B illustrate time-domain windowing. Figure 4A shows a signal windowed with a rectangular window and Figure 4B shows a signal windowed with a Hamming window. Windowing is not shown in Figure 3, but it is assumed that the signal is windowed before the step 32.
An example of the effect of LPC whitening is illustrated in Figures 5A to 6B. Figures 5A, 5B and 5C depict a spectrum, an LPC spectrum and an inverse-filtered (whitened) spectrum of the Hamming-windowed signal of Figure 4B, respectively. Figures 6A and 6B illustrate an example of the effect of LPC whitening in the autocorrelation function. Figure 6A illustrates the autocorrelation function of the whitened signal of Figure 5C, and Figure 6B that of the (non-whitened) signal of Figure 5A. It can be seen that local maxima stand out relatively more clearly in the autocorrelation function of the whitened spectrum of Figure 6A than in that of the non-whitened spectrum of Figure 6B. Therefore, this example suggests that it is advantageous to apply LPC whitening to the autocorrelation maximum detection problem.
However, tests have revealed that in some cases LPC whitening decreases the accuracy of the estimator. This concerns particularly signals that contain high-pitched tones. Therefore, it is not always advantageous to employ LPC whitening, and, consequently, the present fundamental period estimation can be applied either with or without LPC whitening. The autocorrelation of the signal is implemented by using short-time autocorrelation analysis, as disclosed in [5]. The short-time autocorrelation function operating on a short segment of signal x(n) is defined as
§k(m) = — [x(n + )w(n)][x(n + + m)w(n + m)], 0 < m < C -1 W
where c is the number of autocorrelation points to be analyzed, N is the number of samples, and w( ) is the time-domain window function, such as a Hamming window.
The length of the time-domain window function w(n) determines the time resolution of the analysis. In practice, it is feasible to use a tapered window that is at least twice the period of the lowest fundamental frequency. This means that if for example 50 Hz is chosen as the lower limit for the fundamental frequency estimation, the minimum window length is 40 ms. At a sampling frequency of 22 050 Hz, this corresponds to 882 samples. In practice, it is attractive to choose the window length to be the smallest power of two that is larger than 40 ms. This is because the Fast Fourier Transform (FFT) is used to calculate the autocorrelation function and the FFT requires that the window length is a power of two.
Since the autocorrelation function for a signal of N samples is 2N-1 samples long, the sequence has to be zero-padded before FFT calculation. Zero padding simply refers to appending zeros to the signal segment in order to increase the signal length to the required value. After zero-padding, the short-time autocorrelation function is calculated as
φ = IFFT(| FFT(x(A7)) |2) (5)
where x(n) is the windowed signal segment and IFFT denotes the inverse- FFT.
The estimated fundamental period To is obtained by peak detection, which searches for the local maximum value of φ (m) (autocorrelation peak) for each k in a meaningful range of the autocorrelation lag m. The global maximum of the autocorrelation function occurs at location m=0 and the local maximum corresponding to the fundamental period is one of the local maxima. The peak detection is further improved by parabolic interpolation. In parabolic interpolation, a parabola is fitted to the three points consisting of a local maximum and two values adjacent to the local maximum. If A = φ(/) is the value of the local maximum at autocorrelation lag I, and Ai = φ(/-1) and An = φ(/+1) are the adjacent values on the left and the right of the maximum at lags 1-1 and 1+1 , respectively, the interpolated location of the autocorrelation peak T is expressed as
/ = / + 1 A_« -A +1
2 /\_1 -2,4 + ,4+1 (6)
The median filter preferably used in the method according to the invention is a three-tap median filter.
Further information on the LPC, autocorrelation analysis, and the FFT can be found in textbooks on digital signal processing and spectral analysis.
The above-described method for the estimation of the fundamental frequency is quite reliable in detecting the fundamental frequency of a sound signal with a single prominent harmonic source (for example voiced speech, singing, musical instruments that provide harmonic sound). Furthermore, the method derives a time trajectory of the estimated fundamental frequencies so that it follows the changes in the fundamental frequency of the sound signal. However, as was stated before, the time trajectory of the fundamental frequencies needs to be further processed for obtaining a note-based code. Specifically, the time trajectory needs to be analyzed into a sequence of event pairs indicating the start, pitch and end of a note, which is referred to as note detection. In other words, the note detection refers to the forming of note events from the fundamental frequency trajectory. A note event comprises for example a starting position (note-on event), pitch, and ending position (note-off event) of a note. For example, the time trajectory may be transformed into a sequence of single length units, such as quavers, according to a user- determined tempo. Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention. A frame of the audio signal is investigated at a time. In step 70, the signal level of a frame of the audio signal' is measured. Typically, an energy-based signal-level measurement is applied, although it is possible to use more sophisticated methods, e.g. auditorily motivated loudness measurements. In step 71 , the signal level obtained from step 70 is compared to a predetermined threshold. If the signal level is below the threshold, it is decided that no tone is present in the current frame. Therefore, the analysis is aborted and step 76 is executed. If the signal level is above the threshold, a voicing decision
(voiced/unvoiced) is made in steps 72 and 73. The voicing decision is made on the basis of the ratio of the signal level at a prominent lag in the autocorrelation function of the frame to the frame energy. This ratio is determined in steps 72 and 73, and the ratio is compared with a predetermined threshold. In other words, it is determined if there is voice or a pause in the original signal during that frame. If the frame is judged unvoiced in step 73, i.e. it is decided that no prominent harmonic tones are present in the current frame, the analysis is aborted and step 76 is executed. Otherwise, the execution proceeds to step 74. In step 74, the fundamental frequency of the frame is estimated.
Typically, the voicing decision is integrated in the fundamental frequency estimation, but logically they are independent blocks and therefore presented as separate steps. In step 74, the fundamental frequency of the frame is also quantized preferably into a semitone scale, such as a MIDI pitch scale. In step 75 median filtering is applied for removing spurious peaks and for deciding if a note was found or not. In other words, for example three consecutive fundamental frequencies are detected and if one of them differs very much from the others, that particular frequency is rejected because it is probably a noise peak. If no note is found in step 75, the execution proceeds to step 76. In step 76, it is detected if a note-on event is currently valid, and if so, a note-off event is applied. If a note-on event is not valid, nothing is done.
If a note was found in step 75, the fundamental frequency estimated in step 74 is compared to the fundamental frequency of the currently active note (of the previous frame). If the values are different, a note- off event is applied to stop the currently active note, and a note-on event is applied to start a new note event. If the fundamental frequency estimated in step 74 is the same as the fundamental frequency of the currently active note, nothing is done.
The figures and the related description are only intended to illustrate the present invention. The principle of the invention, that is, the providing of a ringing tone, or any other similar musical tone, on the basis of a musical seed provided in the form of an audio signal or a note-based code may be implemented in different ways. In its details, the invention may vary within the scope of the attached claims.
References
[1] MIDI 1.0 specification, Document No. MIDI-1.0, August 1983, International MIDI Association
[2] Kohonen T., US Pat. No. 5 418 323 "Method for controlling an electronic musical device by utilizing search arguments and rules to generate digital code sequences", 1993.
[3] Hess, W., "Pitch Determination of Speech Signals", Springer-Verlag, Berlin, Germany, p. 3-48, 1983.
[4] Therrien, C. W., "Discrete Random Signals and Statistical Signal Processing", Prentice Hall, Englewood Cliffs, New Jersey, pp. 422-430, 1992.
[5] Rabiner, L. R., "On the use of autocorrelation analysis for pitch detection", IEEE Transactions on Acoustics, Speech and Signal Processing, 25(1): pp. 24-33, 1977.

Claims

Claims
1. A method for generating a musical tone, such as a ringing tone, characterized by inputting (1 , 2) a musical seed, and providing (3, 4) the musical tone on the basis of the musical seed.
2. A method according to claim ^characterized by inputting (1) the musical seed in the form of a note-based code, and generating (4a, 4b, 4c, 4d) the musical tone on the basis of said note-based code.
3. A method according to claim ^characterized by inputting (1) the musical seed in the form of an audio signal, applying (3) an audio-to-notes conversion to the audio signal for generating a note-based code representing the musical seed, and generating (4a, 4b, 4c, 4d) the musical tone on the basis of said note-based code.
4. A method according to claim 2 or 3, characterized in that generating (4a) the musical tone comprises the steps of generating a code sequence corresponding to new melody lines by using said note-based code as an input for a composing method, and converting said new melody lines into a musical tone.
5. A method according to claim 2 or 3, characterized in that generating (4b) the musical tone comprises the step of converting said note-based code into a musical tone.
6. A method according to claim 2 or 3, characterized in that generating (4c) the musical tone comprises the steps of comparing the note-based code to melodies previously stored in a memory, selecting from the memory the melody that is the closest match with the note-based code, and converting said melody into a musical tone.
7. A method according to claim 2 or 3, characterized in that generating (4d) the musical tone comprises the steps of generating a code sequence corresponding to new melody lines by using said note-based code as an input for a composing method, comparing the new melody lines to melodies previously stored in a memory, selecting from the memory the melody that is the closest match with the new melody, and converting said melody into a musical tone.
8. A method according to any one of claims 2 to 7, characterized by delivering (5) the musical tone in the form of a common musical notation for editing.
9. A method according to any one of claims 2 to 8, character- i z e d by delivering (6) the musical tone in a non-editable ringing tone format.
10. A method according to any one of claims 3 to 9, characterize d in that the audio-to-notes conversion comprises the steps of estimating (21) fundamental frequencies of the audio signal for obtaining a sequence of fundamental frequencies; and detecting (22) note-events on the basis of the sequence of fundamental frequencies for obtaining the note-based code.
11. A method according to any one of claims 3 to 9, characterize d in that the audio-to-notes conversion comprises the steps of a) segmenting the audio signal into frames in time for obtaining a sequence of frames; b) measuring (90) the signal level of a frame; c) comparing (91) said signal level to a predetermined signal level threshold; d) if said signal level threshold is exceeded in step c, executing (92, 93) a voicing decision for judging if the frame is voiced or unvoiced; e) if the frame is judged voiced in step d, estimating and quantizing (94) the fundamental frequency of the frame for obtaining a quantized present fundamental frequency; f) deciding (95) on the basis of the fundamental frequency obtained in step e if a note is found; g) if a note is found in step f, comparing (97) the quantized present fundamental frequency to the fundamental frequency of the previous frame and applying a note-off event and a note-on event after the note-off event if said fundamental frequencies are different; h) if said signal level threshold is not exceeded in step c, or if the frame is judged unvoiced in step d, or if a note is not found in step f, detecting (96) if a note-on event is currently valid and applying a note-off event if a note- on event is currently valid; and
- repeating steps a to h frame by frame at the same time as the audio signal is received for obtaining the note-based code.
12. A method according to any one of claims 3 to 11, characterized by producing the audio signal by singing, humming, whistling or by playing an instrument.
13. A device for generating a musical tone, such as a ringing tone, characterized in that the device is adapted to receive a musical seed, and to provide the musical tone on the basis of the musical seed.
14. A device according to claim 13, characterized in that the device is adapted to receive the musical seed in the form of a note-based code, and to generate the musical tone on the basis of said note-based code.
15. A device according to claim 13, characterized in that the device is adapted to receive the musical seed in the form of an audio signal, to apply an audio-to-notes conversion to the audio signal for generating a note-based code representing the musical seed, and to generate the musical tone on the basis of said note-based code.
16. A device according to any one of claims 13 to 15, characterized in that the device is a user terminal or a network server connected to a wireless or fixed communication network.
17. A device according to any one of claims 13 to 15, character i z e d in that the device is a mobile phone, a workstation or a personal computer.
PCT/FI2001/000630 2000-07-03 2001-07-02 A method for generating a musical tone WO2002003374A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001282156A AU2001282156A1 (en) 2000-07-03 2001-07-02 A method for generating a musical tone

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20001591A FI20001591A0 (en) 2000-07-03 2000-07-03 Generating a musical tone
FI20001591 2000-07-03

Publications (1)

Publication Number Publication Date
WO2002003374A1 true WO2002003374A1 (en) 2002-01-10

Family

ID=8558715

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2001/000630 WO2002003374A1 (en) 2000-07-03 2001-07-02 A method for generating a musical tone

Country Status (3)

Country Link
AU (1) AU2001282156A1 (en)
FI (1) FI20001591A0 (en)
WO (1) WO2002003374A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004049300A1 (en) * 2002-11-22 2004-06-10 Hutchison Whampoa Three G Ip(Bahamas) Limited Method for generating an audio file on a server upon a request from a mobile phone
WO2004072944A1 (en) * 2003-02-14 2004-08-26 Koninklijke Philips Electronics N.V. Mobile telecommunication apparatus comprising a melody generator
FR2861527A1 (en) * 2003-10-22 2005-04-29 Mobivillage Coded audio sequence adaptation method, involves finding processing method, for each mobile terminal family, for applying to audio sequence for adapting to family characteristics so that sequence is reproduced by classified terminal
WO2005094053A1 (en) * 2004-03-05 2005-10-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Device and method for providing a signal melody
WO2006039993A1 (en) * 2004-10-11 2006-04-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for smoothing a melody line segment
EP1691555A1 (en) * 2005-02-14 2006-08-16 Sony NetServices GmbH System for providing a music channel with true ring-tone download capability
WO2008086288A1 (en) * 2007-01-07 2008-07-17 Apple Inc. Creating and purchasing ringtones
TWI411304B (en) * 2007-05-29 2013-10-01 Mediatek Inc Electronic apparatus of playing and editing multimedia data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202528A (en) * 1990-05-14 1993-04-13 Casio Computer Co., Ltd. Electronic musical instrument with a note detector capable of detecting a plurality of notes sounded simultaneously
US5250745A (en) * 1991-07-31 1993-10-05 Ricos Co., Ltd. Karaoke music selection device
US5616876A (en) * 1995-04-19 1997-04-01 Microsoft Corporation System and methods for selecting music on the basis of subjective content
US5886274A (en) * 1997-07-11 1999-03-23 Seer Systems, Inc. System and method for generating, distributing, storing and performing musical work files

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202528A (en) * 1990-05-14 1993-04-13 Casio Computer Co., Ltd. Electronic musical instrument with a note detector capable of detecting a plurality of notes sounded simultaneously
US5250745A (en) * 1991-07-31 1993-10-05 Ricos Co., Ltd. Karaoke music selection device
US5616876A (en) * 1995-04-19 1997-04-01 Microsoft Corporation System and methods for selecting music on the basis of subjective content
US5886274A (en) * 1997-07-11 1999-03-23 Seer Systems, Inc. System and method for generating, distributing, storing and performing musical work files

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004049300A1 (en) * 2002-11-22 2004-06-10 Hutchison Whampoa Three G Ip(Bahamas) Limited Method for generating an audio file on a server upon a request from a mobile phone
WO2004072944A1 (en) * 2003-02-14 2004-08-26 Koninklijke Philips Electronics N.V. Mobile telecommunication apparatus comprising a melody generator
FR2861527A1 (en) * 2003-10-22 2005-04-29 Mobivillage Coded audio sequence adaptation method, involves finding processing method, for each mobile terminal family, for applying to audio sequence for adapting to family characteristics so that sequence is reproduced by classified terminal
WO2005094053A1 (en) * 2004-03-05 2005-10-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Device and method for providing a signal melody
WO2006039993A1 (en) * 2004-10-11 2006-04-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for smoothing a melody line segment
EP1691555A1 (en) * 2005-02-14 2006-08-16 Sony NetServices GmbH System for providing a music channel with true ring-tone download capability
WO2006084594A1 (en) * 2005-02-14 2006-08-17 Sony Netservices Gmbh System for providing a music channel with true ring-tone download capability
WO2008086288A1 (en) * 2007-01-07 2008-07-17 Apple Inc. Creating and purchasing ringtones
TWI411304B (en) * 2007-05-29 2013-10-01 Mediatek Inc Electronic apparatus of playing and editing multimedia data

Also Published As

Publication number Publication date
AU2001282156A1 (en) 2002-01-14
FI20001591A0 (en) 2000-07-03

Similar Documents

Publication Publication Date Title
US6541691B2 (en) Generation of a note-based code
JP5543640B2 (en) Perceptual tempo estimation with scalable complexity
US7346500B2 (en) Method of translating a voice signal to a series of discrete tones
EP1252621B1 (en) System and method for modifying speech signals
WO2019138871A1 (en) Speech synthesis method, speech synthesis device, and program
WO2002097798A1 (en) Method and apparatus for improved voicing determination in speech signals containing high levels of jitter
TWI281657B (en) Method and system for speech coding
JP2016161919A (en) Voice synthesis device
WO2002003374A1 (en) A method for generating a musical tone
JP2018004870A (en) Speech synthesis device and speech synthesis method
WO1997035301A1 (en) Vocoder system and method for performing pitch estimation using an adaptive correlation sample window
JP2006171751A (en) Speech coding apparatus and method therefor
JP2020076844A (en) Acoustic processing method and acoustic processing device
US7389231B2 (en) Voice synthesizing apparatus capable of adding vibrato effect to synthesized voice
Rodet et al. Spectral envelopes and additive+ residual analysis/synthesis
KR100579797B1 (en) System and Method for Construction of Voice Codebook
Helen et al. Perceptually motivated parametric representation for harmonic sounds for data compression purposes
CN115171729B (en) Audio quality determination method and device, electronic equipment and storage medium
JP6515945B2 (en) Code extraction apparatus and method
Alexandraki Real-time machine listening and segmental re-synthesis for networked music performance
Modegi Evaluation method for quality losses generated by miscellaneous audio signal processings using MIDI encoder tool “Auto-F”
Nishimura Aerial Acoustic Modem with Decoding Capabilities Using a CELP-Based Speech Encoder
Edwards Advanced signal processing techniques for pitch synchronous sinusoidal speech coders
CN115699161A (en) Sound processing method, sound processing system, and program
Airas Development of a mobile interactive musical service

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ CZ DE DE DK DK DM DZ EC EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP