WO2002003374A1

WO2002003374A1 - A method for generating a musical tone

Info

Publication number: WO2002003374A1
Application number: PCT/FI2001/000630
Authority: WO
Inventors: Tero Tolonen; Matti Airas
Original assignee: Oy Elmorex Ltd
Priority date: 2000-07-03
Filing date: 2001-07-02
Publication date: 2002-01-10
Also published as: AU2001282156A1; FI20001591A0

Abstract

A method for generating a musical tone, such as a ringing tone, the method comprising inputting (1, 2) a musical seed, and providing (3, 4) the musical tone on the basis of the musical seed. If the musical seed is in the form of a note-based code, the musical tone is generated (4a, 4b, 4c, 4d) on the basis of said note-based code. If the musical seed is in the form of an audio signal, an audio-to-notes conversion is applied (3) to the audio signal for generating a note-based code representing the musical seed, and the musical tone is generated (4a, 4b, 4c, 4d) on the basis of said note-based code.

Description

A method for generating a musical tone

Field of the Invention

The invention relates to a method for generating a musical tone, such as a ringing tone. The invention suits particularly well for generating ringing or warning tones for mobile terminals or multimedia devices.

Background of the Invention

With increasing use of mobile terminals, personal computers, multimedia terminals, and other similar devices, the need to personalize these devices, for example by personalized ringing and alarm tones, increases as well. Herein, the term 'musical tone' refers to ringing tones, warning tones, alarm tones or any other similar type of tone.

Typically, users of mobile terminals have been able to download ringing tone melodies that have been created and provided for example by network operators. Alternatively, the users may have used web-based tools for creating a melody of their own or a tool for creating a melody may have been incorporated in the user device, such as a mobile terminal. The latter two methods employ common music notation in their user interfaces and thus the user of these methods has to possess knowledge on musical theory or at least on musical notation in order to create new melodies. Disclosure of the Invention

An object of the present invention is to provide a method for generating a musical tone, such as a ringing tone, without musical skills. Another object of the invention is a device, such as a network server or a user terminal, which implements the method according to the invention. These objects are achieved with the method and device, which are characterized by what is disclosed in the attached independent claims. Preferred embodiments of the invention are disclosed in the attached dependent claims.

The invention is based on using a musical seed for providing the musical tone. The musical seed is a musical content provided by a user and it may be in audio format or in note-based code format. If the musical seed is an audio signal, the audio signal is converted into a note-based code by an audio- to-notes conversion. The musical tone is generated on the basis of the note- based code. The audio signal may be produced for example by singing, humming, whistling, or by playing an instrument. The method of the invention is preferably executed in a network server; alternatively, the method may be executed in a user terminal. If a network server is employed, a user connects to the server via a wireless or a fixed connection. The possible connection protocols include, but are not limited to, the Internet protocol (IP), a wireless voice protocol of the Global System for Mobile Communications (GSM) or the like, wireless data protocols (e.g. data over GSM), short message service (SMS), wireless application protocol (WAP), telephone voice connection, modem connection, ISDN, infrared connection, local radio connection (e.g. BlueTooth). Bluetooth technique may be employed for example when the method of the invention is executed in a user terminal.

The user provides a melody or a musical seed for the tone generation method. The forms of the user input can be categorized into audio formats and note-based code formats. The audio formats include, but are not limited to, waveform audio (digitized audio), encoded audio (obtained by using for example speech coding methods, such as methods based on linear prediction, or general audio coding methods, such as the transform codecs in the MPEG family), streaming audio, and audio files in the aforementioned formats. The note-based formats include, but are not limited to, MIDI, MIDI files, ringing tone formats, music representation languages, such as CSound, and MPEG-4 synthetic audio.

The server provides a musical tone on the basis of the user's input. According to a first embodiment of the invention, the musical tone is provided by generating a code sequence corresponding to new melody lines, i.e. a new combination of notes, by using said note-based code as an input for a composing method which produces a new melody and by converting said new melody into a musical tone. Herein, the term 'melody line' refers generally to musical content formed by a combination of notes and pauses. In contrast to the new melody lines, the note-based code may be considered as an old melody line.

According to a second embodiment of the invention, the note-based code is converted directly into a musical tone. Thus, the second embodiment is similar to the above-described first embodiment with the distinction that now the composing method is not employed, but the note-based code is used as such for generating the tone.

According to a third embodiment of the invention, the note-based code is compared to melodies which have been previously stored in a memory, then the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone.

According to a fourth embodiment of the invention, a code sequ- ence corresponding to new melody lines is generated by using said note- based code as an input for a composing method which produces a new melody. The new melody is compared to melodies, which have been previously stored in a memory, and the melody that is the closest match with the note-based code is selected from the memory and converted into a musical tone. In other words, the fourth embodiment is a combination of the above described first and third embodiments.

Converting the note-based code into a musical tone means converting the note-based code into a tone of a suitable form for delivery to the user or for storage. For example, the note-based code may be simply encoded into the form of a ringing tone in Nokia Smart Message form or similar. The musical tone may be stored on the server and/or delivered to the user by using the aforementioned connections and formats. The tone can be delivered to the user terminal for example by using vendor-specific means, such as Nokia Smart Messaging, or by making the tone available for download at a web site or by downloading the tone directly over IP or via WAP (Wireless Application Protocol) gateway or in any other suitable manner.

The musical tone is delivered to the user in the form of common musical notation for editing with some suitable software tool, or in a non- editable form. The tone is delivered for editing for example in the form of a common musical notation, such as written notes or as a MIDI code. Typically, the server includes functionality for playback and/or for editing the musical tone.

The audio-to-notes conversion method according to a further embodiment of the invention preferably comprises estimating fundamental frequencies of the audio signal for obtaining a sequence of fundamental frequencies and detecting note events on the basis of the sequence of fundamental frequencies for obtaining the note-based code.

In an audio-to-notes conversion method according to a still further embodiment of the invention, the audio signal containing musical information is processed in frames, and the note-based code representing musical information is constructed at the same time as the input signal is provided. The signal level of a frame is first measured and compared to a predetermined signal level threshold. If the signal level threshold is exceeded, a voicing decision is executed for judging if the frame is voiced or unvoiced. If the frame is judged voiced, the fundamental frequency of the frame is estimated and quantized for obtaining a quantized present fundamental frequency. Then, it is decided on the basis of the quantized present fundamental frequency whether a note is found. If a note is found, the quantized present fundamental frequency is compared to the fundamental frequency of the previous frame. If the previous and present fundamental frequencies are different, first a note-off event and then a note-on event, after the note-off event, are applied. If the previous and present fundamental frequencies are the same, nothing is done. If the signal level threshold is not exceeded or if the frame is judged unvoiced or if a note is not found, it is detected whether a note-on event is currently valid and if yes, a note-off event is applied. The procedure is repeated frame by frame at the same time as the audio signal is received for obtaining the note-based code.

An advantage of the invention is that it can be used by people without knowledge on musical theory for producing a musical tone, such as a ringing tone, by providing a musical presentation for example by singing, humming, whistling or playing an instrument. Thus, the invention provides a simple method for personalizing mobile terminals and other similar devices. Additionally, self-made musical content can be stored in the form of a musical tone.

Brief Description of the Drawings In the following the invention will be described in greater detail by means of the preferred embodiments with reference to the accompanying drawings, in which

Figure 1A is a flow diagram illustrating a method according to the invention, Figure 1 B is a block diagram illustrating an arrangement according to an embodiment of the invention,

Figure 1C is a block diagram illustrating an arrangement according to another embodiment of the invention,

Figure 2 illustrates the audio-to-notes conversion according to an embodiment of the invention,

Figure 3 is a flow diagram illustrating fundamental frequency estimation according to an embodiment of the invention,

Figures 4A and 4B illustrate time-domain windowing,

Figures 5A to 6B illustrate an example of the effect of the LPC whitening, Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention.

Preferred Embodiments of the Invention

The principle of the invention is to provide a musical tone, i.e. a ringing tone or the like, on the basis of a musical seed given by the user in the form of an audio signal or in the form of a note-based code.

Figure 1A is a flow diagram illustrating a method according to the invention for generating a musical tone. In step 1, the musical seed is provided in the form of an audio signal, and this audio signal is converted into a note- based code with an audio-to-notes conversion method in step 3. In a preferred embodiment of the invention, which is described in detail with reference to Figure 2, the audio-to-notes conversion comprises fundamental frequency estimation and note detection. In step 2, the musical seed is provided in the form of a note-based code.

The note-based code obtained by the audio-to-notes conversion or from the user is used for generating a musical tone in one of steps 4a, 4b, 4c and 4d. In step 4a, the note-based code is used as a seed sequence for a composition method. An automated composition method, which is preferably used for this, is disclosed in [2]. This composition method generates code sequences corresponding to new melody lines on the basis of a seed sequence (training sequence). The new melody adapts to changes in the input signal, but it is not necessarily the same. In this way, for example deficiencies in the input signal, if any, are corrected or smoothened. The new melody lines are then converted into the musical tone.

In step 4b, the note-based code is converted directly into a musical tone. This method allows users to sing a melody, for example, and to receive the melody they sang in the form of a ringing tone.

In step 4c, the note-based code is compared to melodies stored in a memory to find the melody that is the closest match with the note-based code. The melody that is the closest match is then converted into the musical tone. In this way, users can, for example, sing a part of a melody of music and receive a ringing tone according to the melody they sang. Step 4d is a combination of steps 4a and 4c. First, the note-based code is used for generating new melody lines with a composition method and then the new melody lines are compared to the melodies stored in a memory and the melody corresponding to the closest match is converted into a musical tone. The composition method enables deficiencies in the input signal, if any, to be corrected or smoothened, and therefore, comparison to stored melodies may become easier. The comparison may be based on a distance measure computed on the intervals of the seed sequence, duration of individual notes in the sequence, absolute pitches of the notes in the sequence, or other musical information contained in the sequence.

In step 5, the musical tone is delivered to the user in the form of a common musical notation for editing with some suitable software tool or for playback. In step 6, the tone is delivered to the user. Step 5 or 6 may also include storing the tone in a file. The file may be for example a MIDI file in which sound event descriptions are stored, or it may be a sound file which stores synthesized sound.

Figure 1B is a block diagram illustrating an arrangement according to an embodiment of the invention. A user connects from a mobile user terminal 8 or from a fixed user terminal 9 to a server 10a through a suitable connection. The mobile user terminal 8 is typically a mobile phone or some other wireless device and the fixed user terminal 9 is typically a workstation or a personal computer. The server process may be incorporated in the user terminal, but typically the server is a separate network server. The user provides a musical seed, and the musical seed is transmitted to the server 10a in any suitable form. Some possible data formats and transmission protocols were described in the above description. The server 10a executes the tone generation method according to the invention and returns the generated tone to the user terminal 8 or 9.

Figure 1C is a block diagram illustrating an arrangement according to another embodiment of the invention. The arrangement includes a wireless communication network 13 and the Internet 15. The wireless network may be for example a GSM or a UMTS (Universal Mobile Telecommunications System) network.

A mobile user terminal 8 and a server 10b are connected to the wireless network. The mobile user terminal 8 is used for providing a musical seed to the server 10b for example through a voice connection. The server 10b generates a musical tone and returns the musical tone to the mobile user terminal 8 for example in ringing tone format via SMSC 17 (Short Message Service Center).

A fixed user terminal 9 and a server 10c are connected to the Internet. The fixed user terminal 9 is used for providing a musical seed to the server 10c for example through a voice over IP connection. Alternatively, the mobile user terminal 8 may be used for providing a musical seed to the server 10c. The connection between the mobile user terminal 8 and the server 10c is established through a WAP gateway 14, which connects the wireless network and the Internet and provides Internet services to mobile networks, and the server 10c then generates a musical tone. The musical tone is returned to the fixed user terminal 9 for example as audio over IP or by placing the musical tone into a file available for download on an Internet site. To the mobile user terminal 8 the musical tone is transmitted through the WAP gateway. The audio-to-notes conversion according to an embodiment of the invention can be divided into two steps, as shown in Figure 2: fundamental frequency estimation 21 and note detection 22. In step 21 , an audio input is segmented into frames in time and the fundamental frequency of each frame is estimated. The processing treatment of the signal is executed in the digital domain, and therefore, the audio input is digitized with an A/D converter prior to the fundamental frequency estimation, if the audio input is not already in a digital form. However, fundamental frequency estimation alone is not sufficient for producing the note-based code. Therefore, in step 22, consecutive fundamental frequencies are further processed for detecting the notes. In the following description, the operation of these two steps according to the preferred embodiments of the invention is explained in detail.

Numerous techniques exist for estimation of the fundamental frequency of audio signals, such as speech or musical melodies. The autocorrelation function has been widely adopted for fundamental frequency estimation, and it is also preferred in the method according to the invention. However, it is not mandatory for the method of the invention to employ autocorrelation in fundamental frequency estimation, but also other fundamental frequency estimation methods can be applied. Other techniques for fundamental frequency estimation can be found for example in [3]. The present estimation algorithm is based on detection of a fundamental period in an audio signal segment (frame). The fundamental period is denoted as TO (in samples) and it is related to the fundamental frequency as t ° -A T, ^.

0 (1)

where f_s is the sampling frequency in Hz. The fundamental frequency is obtained from the estimated fundamental period by using Equation 1.

Figure 3 is a flow diagram illustrating the operation of the fundamental frequency (or period) estimation. The input signal is segmented into frames in time and the frames are treated separately. In step 30, the input signal Audio In is first filtered with a high-pass filter (HPF) in order to remove the DC component of the signal Audio In. The transfer function of the HPF may be for example {z) = ^, 0 < a <1

1-az^"1 ₍2)

where a is the filter coefficient. The next step 31 in the chain is optional linear predictive coding

(LPC) whitening of the spectrum of the signal segment (frame). In step 32, the signal is then autocorrelated. The fundamental period estimate is obtained from the autocorrelation function of the signal by using peak detection in step 33. Finally in step 34, the fundamental period estimate is filtered with a median filter in order to remove spurious peaks. In the following paragraphs, LPC whitening, autocorrelation and peak detection will be explained in detail.

The human voice production mechanism is typically considered as a source-filter system, i.e. an excitation signal is created and filtered by a linear system that models a vocal tract. In voiced (harmonic) tones or in voiced speech, the excitation signal is periodic and it is produced at the glottis. The period of the excitation signal determines the fundamental frequency of the tone. The vocal tract may be considered as a linear resonator that affects the periodic excitation signal, for example, the shape of the vocal tract determining the vowel that is perceived. In practice, it is often attractive to minimize the contribution of the vocal tract in the signal prior to the fundamental period detection. In signal processing terms this means inverse filtering (whitening) in order to remove the contribution of the linear model that corresponds to the vocal tract. The vocal tract can be modeled for example by using an all pole model, i.e. as an Nth order digital filter with a transfer function of

where a_k are filter coefficients. The filter coefficients may be obtained by using linear prediction, that is by solving a linear system involving an autocorrelation matrix and the parameters a_k. The linear system is most conveniently solved using the Levinson-Durbin recursion, which is disclosed for example in [4]. After solving the parameters a , the whitened signal x(n) is obtained by inverse filtering the non-whitened signal x'(n) by using the inverse of the transfer function in Equation 3.

Figures 4A and 4B illustrate time-domain windowing. Figure 4A shows a signal windowed with a rectangular window and Figure 4B shows a signal windowed with a Hamming window. Windowing is not shown in Figure 3, but it is assumed that the signal is windowed before the step 32.

An example of the effect of LPC whitening is illustrated in Figures 5A to 6B. Figures 5A, 5B and 5C depict a spectrum, an LPC spectrum and an inverse-filtered (whitened) spectrum of the Hamming-windowed signal of Figure 4B, respectively. Figures 6A and 6B illustrate an example of the effect of LPC whitening in the autocorrelation function. Figure 6A illustrates the autocorrelation function of the whitened signal of Figure 5C, and Figure 6B that of the (non-whitened) signal of Figure 5A. It can be seen that local maxima stand out relatively more clearly in the autocorrelation function of the whitened spectrum of Figure 6A than in that of the non-whitened spectrum of Figure 6B. Therefore, this example suggests that it is advantageous to apply LPC whitening to the autocorrelation maximum detection problem.

However, tests have revealed that in some cases LPC whitening decreases the accuracy of the estimator. This concerns particularly signals that contain high-pitched tones. Therefore, it is not always advantageous to employ LPC whitening, and, consequently, the present fundamental period estimation can be applied either with or without LPC whitening. The autocorrelation of the signal is implemented by using short-time autocorrelation analysis, as disclosed in [5]. The short-time autocorrelation function operating on a short segment of signal x(n) is defined as

§_k(m) = — [x(n + )w(n)][x(n + + m)w(n + m)], 0 < m < _C -1 W

where c is the number of autocorrelation points to be analyzed, N is the number of samples, and w( ) is the time-domain window function, such as a Hamming window.

The length of the time-domain window function w(n) determines the time resolution of the analysis. In practice, it is feasible to use a tapered window that is at least twice the period of the lowest fundamental frequency. This means that if for example 50 Hz is chosen as the lower limit for the fundamental frequency estimation, the minimum window length is 40 ms. At a sampling frequency of 22 050 Hz, this corresponds to 882 samples. In practice, it is attractive to choose the window length to be the smallest power of two that is larger than 40 ms. This is because the Fast Fourier Transform (FFT) is used to calculate the autocorrelation function and the FFT requires that the window length is a power of two.

Since the autocorrelation function for a signal of N samples is 2N-1 samples long, the sequence has to be zero-padded before FFT calculation. Zero padding simply refers to appending zeros to the signal segment in order to increase the signal length to the required value. After zero-padding, the short-time autocorrelation function is calculated as

φ = IFFT(| FFT(x(A7)) |²) ₍₅)

where x(n) is the windowed signal segment and IFFT denotes the inverse- FFT.

The estimated fundamental period To is obtained by peak detection, which searches for the local maximum value of φ (m) (autocorrelation peak) for each k in a meaningful range of the autocorrelation lag m. The global maximum of the autocorrelation function occurs at location m=0 and the local maximum corresponding to the fundamental period is one of the local maxima. The peak detection is further improved by parabolic interpolation. In parabolic interpolation, a parabola is fitted to the three points consisting of a local maximum and two values adjacent to the local maximum. If A = φ(/) is the value of the local maximum at autocorrelation lag I, and Ai = φ(/-1) and An = φ(/+1) are the adjacent values on the left and the right of the maximum at lags 1-1 and 1+1 , respectively, the interpolated location of the autocorrelation peak T is expressed as

/ = / + 1 A_« -A +1

2 /\_₁ -2,4 + ,4_{+1 (6)}

The median filter preferably used in the method according to the invention is a three-tap median filter.

Further information on the LPC, autocorrelation analysis, and the FFT can be found in textbooks on digital signal processing and spectral analysis.

The above-described method for the estimation of the fundamental frequency is quite reliable in detecting the fundamental frequency of a sound signal with a single prominent harmonic source (for example voiced speech, singing, musical instruments that provide harmonic sound). Furthermore, the method derives a time trajectory of the estimated fundamental frequencies so that it follows the changes in the fundamental frequency of the sound signal. However, as was stated before, the time trajectory of the fundamental frequencies needs to be further processed for obtaining a note-based code. Specifically, the time trajectory needs to be analyzed into a sequence of event pairs indicating the start, pitch and end of a note, which is referred to as note detection. In other words, the note detection refers to the forming of note events from the fundamental frequency trajectory. A note event comprises for example a starting position (note-on event), pitch, and ending position (note-off event) of a note. For example, the time trajectory may be transformed into a sequence of single length units, such as quavers, according to a user- determined tempo. Figure 7 is a flow diagram illustrating the audio-to-notes conversion according to an embodiment of the invention. A frame of the audio signal is investigated at a time. In step 70, the signal level of a frame of the audio signal' is measured. Typically, an energy-based signal-level measurement is applied, although it is possible to use more sophisticated methods, e.g. auditorily motivated loudness measurements. In step 71 , the signal level obtained from step 70 is compared to a predetermined threshold. If the signal level is below the threshold, it is decided that no tone is present in the current frame. Therefore, the analysis is aborted and step 76 is executed. If the signal level is above the threshold, a voicing decision

(voiced/unvoiced) is made in steps 72 and 73. The voicing decision is made on the basis of the ratio of the signal level at a prominent lag in the autocorrelation function of the frame to the frame energy. This ratio is determined in steps 72 and 73, and the ratio is compared with a predetermined threshold. In other words, it is determined if there is voice or a pause in the original signal during that frame. If the frame is judged unvoiced in step 73, i.e. it is decided that no prominent harmonic tones are present in the current frame, the analysis is aborted and step 76 is executed. Otherwise, the execution proceeds to step 74. In step 74, the fundamental frequency of the frame is estimated.

Typically, the voicing decision is integrated in the fundamental frequency estimation, but logically they are independent blocks and therefore presented as separate steps. In step 74, the fundamental frequency of the frame is also quantized preferably into a semitone scale, such as a MIDI pitch scale. In step 75 median filtering is applied for removing spurious peaks and for deciding if a note was found or not. In other words, for example three consecutive fundamental frequencies are detected and if one of them differs very much from the others, that particular frequency is rejected because it is probably a noise peak. If no note is found in step 75, the execution proceeds to step 76. In step 76, it is detected if a note-on event is currently valid, and if so, a note-off event is applied. If a note-on event is not valid, nothing is done.

If a note was found in step 75, the fundamental frequency estimated in step 74 is compared to the fundamental frequency of the currently active note (of the previous frame). If the values are different, a note- off event is applied to stop the currently active note, and a note-on event is applied to start a new note event. If the fundamental frequency estimated in step 74 is the same as the fundamental frequency of the currently active note, nothing is done.

The figures and the related description are only intended to illustrate the present invention. The principle of the invention, that is, the providing of a ringing tone, or any other similar musical tone, on the basis of a musical seed provided in the form of an audio signal or a note-based code may be implemented in different ways. In its details, the invention may vary within the scope of the attached claims.

References

[1] MIDI 1.0 specification, Document No. MIDI-1.0, August 1983, International MIDI Association

[2] Kohonen T., US Pat. No. 5 418 323 "Method for controlling an electronic musical device by utilizing search arguments and rules to generate digital code sequences", 1993.

[3] Hess, W., "Pitch Determination of Speech Signals", Springer-Verlag, Berlin, Germany, p. 3-48, 1983.

[4] Therrien, C. W., "Discrete Random Signals and Statistical Signal Processing", Prentice Hall, Englewood Cliffs, New Jersey, pp. 422-430, 1992.

[5] Rabiner, L. R., "On the use of autocorrelation analysis for pitch detection", IEEE Transactions on Acoustics, Speech and Signal Processing, 25(1): pp. 24-33, 1977.

Claims

1. A method for generating a musical tone, such as a ringing tone, characterized by inputting (1 , 2) a musical seed, and providing (3, 4) the musical tone on the basis of the musical seed.

2. A method according to claim ^characterized by inputting (1) the musical seed in the form of a note-based code, and generating (4a, 4b, 4c, 4d) the musical tone on the basis of said note-based code.

3. A method according to claim ^characterized by inputting (1) the musical seed in the form of an audio signal, applying (3) an audio-to-notes conversion to the audio signal for generating a note-based code representing the musical seed, and generating (4a, 4b, 4c, 4d) the musical tone on the basis of said note-based code.

4. A method according to claim 2 or 3, characterized in that generating (4a) the musical tone comprises the steps of generating a code sequence corresponding to new melody lines by using said note-based code as an input for a composing method, and converting said new melody lines into a musical tone.

5. A method according to claim 2 or 3, characterized in that generating (4b) the musical tone comprises the step of converting said note-based code into a musical tone.

6. A method according to claim 2 or 3, characterized in that generating (4c) the musical tone comprises the steps of comparing the note-based code to melodies previously stored in a memory, selecting from the memory the melody that is the closest match with the note-based code, and converting said melody into a musical tone.

7. A method according to claim 2 or 3, characterized in that generating (4d) the musical tone comprises the steps of generating a code sequence corresponding to new melody lines by using said note-based code as an input for a composing method, comparing the new melody lines to melodies previously stored in a memory, selecting from the memory the melody that is the closest match with the new melody, and converting said melody into a musical tone.

8. A method according to any one of claims 2 to 7, characterized by delivering (5) the musical tone in the form of a common musical notation for editing.

9. A method according to any one of claims 2 to 8, character- i z e d by delivering (6) the musical tone in a non-editable ringing tone format.

10. A method according to any one of claims 3 to 9, characterize d in that the audio-to-notes conversion comprises the steps of estimating (21) fundamental frequencies of the audio signal for obtaining a sequence of fundamental frequencies; and detecting (22) note-events on the basis of the sequence of fundamental frequencies for obtaining the note-based code.

11. A method according to any one of claims 3 to 9, characterize d in that the audio-to-notes conversion comprises the steps of a) segmenting the audio signal into frames in time for obtaining a sequence of frames; b) measuring (90) the signal level of a frame; c) comparing (91) said signal level to a predetermined signal level threshold; d) if said signal level threshold is exceeded in step c, executing (92, 93) a voicing decision for judging if the frame is voiced or unvoiced; e) if the frame is judged voiced in step d, estimating and quantizing (94) the fundamental frequency of the frame for obtaining a quantized present fundamental frequency; f) deciding (95) on the basis of the fundamental frequency obtained in step e if a note is found; g) if a note is found in step f, comparing (97) the quantized present fundamental frequency to the fundamental frequency of the previous frame and applying a note-off event and a note-on event after the note-off event if said fundamental frequencies are different; h) if said signal level threshold is not exceeded in step c, or if the frame is judged unvoiced in step d, or if a note is not found in step f, detecting (96) if a note-on event is currently valid and applying a note-off event if a note- on event is currently valid; and

- repeating steps a to h frame by frame at the same time as the audio signal is received for obtaining the note-based code.

12. A method according to any one of claims 3 to 11, characterized by producing the audio signal by singing, humming, whistling or by playing an instrument.

13. A device for generating a musical tone, such as a ringing tone, characterized in that the device is adapted to receive a musical seed, and to provide the musical tone on the basis of the musical seed.

14. A device according to claim 13, characterized in that the device is adapted to receive the musical seed in the form of a note-based code, and to generate the musical tone on the basis of said note-based code.

15. A device according to claim 13, characterized in that the device is adapted to receive the musical seed in the form of an audio signal, to apply an audio-to-notes conversion to the audio signal for generating a note-based code representing the musical seed, and to generate the musical tone on the basis of said note-based code.

16. A device according to any one of claims 13 to 15, characterized in that the device is a user terminal or a network server connected to a wireless or fixed communication network.

17. A device according to any one of claims 13 to 15, character i z e d in that the device is a mobile phone, a workstation or a personal computer.