WO2006082764A1

WO2006082764A1 - Method and system for controlling a vehicle using voice commands

Info

Publication number: WO2006082764A1
Application number: PCT/JP2006/301375
Authority: WO
Inventors: Yu-Han Chiu; Chia-Shin Yen; Chien-Ming Wu; Che-Ming Lin
Original assignee: Matsushita Electric Industrial Co., Ltd.
Priority date: 2005-02-01
Filing date: 2006-01-23
Publication date: 2006-08-10
Also published as: CN1815556A

Abstract

A method for controlling a vehicle using voice commands includes: using a blind source separator to separate sounds collected by a plurality of microphones into a plurality of sound sources; identifying a voice command from sound source signals sent from the blind source separator according to predetermined voice command data in a command database, and determining whether the voice command is a driving-purpose command; if the voice command is a driving-purpose command, calculating an issuing direction of the driving-purpose command according to position-related information of the driving-purpose command; determining whether the driving-purpose command was issued by a driver according to the issuing direction of the driving-purpose command thus calculated; and controlling a corresponding controlled device in the vehicle if the driving-purpose command was issued by the driver. A system for controlling a vehicle using voice commands is also disclosed.

Description

DESCRIPTION

METHOD AND SYSTEM FOR CONTROLLING A VEHICLE USING VOICE COMMANDS

Technical Field

The invention relates to a method and system for controlling a vehicle, more particularly to a method and system for controlling a vehicle using voice commands.

Background Art

The driver of a vehicle and passengers therein generally use their hands and feet to control controlled devices (e.g., stepping on the accelerator pedal or brake) in the vehicle. However, with the advance of speech recognition technology, there are currently available actual examples of using speech to control vehicles. For instance, referring to Figure 1 , Japanese Patent Publication No. JP041 19400 discloses a speech recognition device for a vehicle, which includes a plurality of microphones 1 1 , a phase shift unit 13, an adder unit 14, a maximum amplitude detection unit 15, and a speech recognition unit 16. The microphones 1 1 are separately disposed in the vehicle, and are used to collect speech sounds coming from a certain seat. The phase shift unit 13 shifts the phase of an electrical signal obtained by each microphone 11 with a corresponding phase shifting amount according to the location of the respective microphone 1 1. The adder unit 14 is used to add a corresponding output signal transmitted from the phase shift unit 13 to each of the source electrical signals obtained by the microphones 11. The maximum amplitude detection unit 15 is used to detect signals having the maximum amplitude from the output signals of the adder unit 14, treats the detected signals as voice commands from the aforesaid seat, and outputs the detected signals to the speech recognition unit 16 for recognition purposes.

The aforesaid conventional speech recognition device indeed can identify a voice command coming from a certain seat for controlling the vehicle if only one person (e.g., the driver or a passenger) in the vehicle speaks. However, if several people in different seats produce speech sounds at the same time, the microphones 1 1 will receive mixed voice commands, and the aforesaid conventional speech recognition device is unable to process such mixed voice commands. Therefore, there is a need for a method and system for controlling a vehicle using voice commands that can cope with a scenario in which several speakers in different seats in the vehicle produce speech sounds at the same time.

Disclosure of Invention

Therefore, the main object of the present invention is to provide a method for controlling a vehicle using voice commands, which can be employed to separate mixed voice commands. Then, if a voice command thus separated is related to driving, the method of the present invention can be further employed to determine whether such a driving-purpose command was indeed issued by the driver of the vehicle.

Accordingly, a method for controlling a vehicle using voice commands comprises the following steps. Initially, a blind source separator is used to separate sounds collected by a plurality of microphones into a plurality of sound sources. Then, according to predetermined voice command data in a command database, a voice command is identified from sound source signals sent from the blind source separator, and a determination is made as to whether the voice command is a driving-purpose command. Next, if the voice command is a driving-purpose command, an issuing direction of the driving-purpose command is calculated according to position-related information of the driving-purpose command. Subsequently, according to the issuing direction of the driving- purpose command thus calculated, a determination is made as to whether the driving-purpose command was issued by a driver. Thereafter, upon determining that the driving-purpose command was issued by the driver, a controlled device in the vehicle to which the driving-purpose command corresponds is controlled accordingly.

In addition, another object of the present invention is to provide a system for controlling a vehicle using voice commands, which can be employed to separate mixed voice commands. Then, if a voice command thus separated is related to driving, the system of the present invention can be further employed to determine whether such a driving-purpose command was indeed issued by the driver of the vehicle.

Accordingly, a system for controlling a vehicle using voice commands of this invention can separate a plurality of sound sources collected by a plurality of microphones into a plurality of voice commands, and subsequently utilize the voice commands to control a plurality of controlled devices in the vehicle. The system includes a blind source separator, a command database, a speech recognizer, a direction calculator, and a controller. The blind source separator is used to separate sounds collected by the microphones into the sound sources. The speech recognizer is used to receive sound source signals sent from the blind source separator, and to identify the voice commands from the sound source signals according to predetermined voice command data in the command database. The direction calculator calculates an issuing direction of a voice command coming from each sound source according to the voice commands identified by the speech recognizer and position-related information of the sound sources sent from the blind source separator. The controller determines whether or not to control the corresponding controlled devices according to the voice commands identified by the speech recognizer and the issuing directions of the voice commands as calculated by the direction calculator.

Brief Description of Drawings

Other features and advantages of the present invention will become apparent in the following detailed description of the preferred embodiment with reference to the accompanying drawings, of which:

Figure 1 is a block diagram illustrating a conventional speech recognition device;

Figure 2 is a system block diagram showing a preferred embodiment of a system for controlling a vehicle using voice commands according to the present invention;

Figure 3 is a flowchart illustrating a method for controlling a vehicle using voice commands according to the present invention; Figure 4 is a schematic diagram showing the positions of microphones in the preferred embodiment of the present invention, as well as the relative relationship of the microphones with sounds emitted by the driver and a passenger; Figure 5 is a flowchart illustrating a preferred embodiment of a method for controlling a vehicle using voice commands according to the present invention; and

Figure 6 is a schematic view illustrating use of a hyperbolic equation to derive an issuing direction of a driving-purpose command.

Best Mode for Carrying Out the Invention

Referring to Figure 2, the preferred embodiment of a system for controlling a vehicle using voice commands according to this invention is capable of separating a plurality of sound sources collected by a plurality of microphones (e.g., two microphones, M1 and M2) into a plurality of voice commands, and subsequently utilizing these voice commands to control a plurality of controlled devices 4 in a vehicle (not shown). The system includes an amplifier 31 , an analog/digital converter 32, a blind source separator 33, a speech recognizer 34, a command database 35, a direction calculator 36, and a controller 37.

The microphones M1 , M2 are used to receive speech sounds uttered by at least one speaker (who may be the driver or a passenger) in the vehicle, and to convert the sound energy into electrical signals. As blind source separation (BSS) techniques are employed in this invention, the system can process the mixed speech sounds received by the microphones M1 , M2 when the speakers speak at the same time. The amplifier 31 is used to amplify the electrical signals sent from the microphones M1 , M2. The analog/digital converter 32 is used to convert the analog signals sent from the amplifier 31 into a set of discrete values to represent the magnitude of the sound energy. The blind source separator 33 is used to separate the mixed speech signals sent from the analog/digital converter 32, and to transmit the speech signals thus separated to the speech recognizer 34. The blind source separator 33 further sends information related to issuing directions of the sounds to the direction calculator 36. The speech recognizer 34 is used to receive the separated speech signals sent from the blind source separator 33. According to predetermined voice command data retrieved from the command database 35, the speech recognizer 34 identifies voice commands for controlling a corresponding one of the controlled devices 4 in the vehicle from the separated speech signals received thereby, and outputs a command code of the corresponding one of the controlled devices 4. In addition, the speech recognizer 34 can be further used to determine whether the identified voice command is a driving- purpose command (i.e., to control an external rear view mirror, windscreen wipers, etc.) or a general-purpose command (i.e., to control entertainment equipment, air-conditioning equipment, etc.). If the voice command code outputted to the controller 37 by the speech recognizer 34 belongs to a driving-purpose command code, the direction calculator 36 is activated. The direction calculator 36 is used to receive control signals sent from the speech recognizer 34, and uses the information related to the issuing directions of the sounds as sent from the blind source separator 33 to calculate the direction from which the driving-purpose command was issued for transmission to the controller 37.

The controller 37 is used to process the command codes sent from the speech recognizer 34. If the command code received by the controller 37 is a driving-purpose command code, a determination is made according to the calculation result transmitted from the direction calculator 36 as to whether the driving-purpose command was issued by the driver. If the driving-purpose command was issued by the driver, a control signal is sent to control the controlled device 4 which corresponds to that driving-purpose command code. If the command code received by the controller 37 is a general-purpose command code, a control signal is directly sent to control the controlled device 4 which corresponds to that general-purpose command code.

Referring to Figures 2 and 3, the method for controlling a vehicle using voice commands according to this invention includes the following steps. Initially, in step 51 , analog mixed sounds of speeches made by a passenger and the driver in the vehicle are inputted through the microphones M1 , M2, and the amplifier 31 and the analog/digital converter 32 are used in sequence to amplify the sounds and convert the sounds into digital signals.

Then, in step 52, the mixed speech signals sent from the analog/digital converter 32 are separated into a plurality of original sound sources using the blind source separator 33 for transmission to the speech recognizer 34, and information related to issuing directions of the sounds is transmitted to the direction calculator 36.

Next, in step 53, a speech signal is identified from the plurality of separated original sound sources sent from the blind source separator 33 using the speech recognizer 34.

Subsequently, in step 54, a determination is made as to whether the identified speech signal is a driving-purpose command according to the command database 35. If the speech signal is not a driving- purpose command, and is a general-purpose command, a general- purpose command code is directly transmitted to the controller 37, regardless of whether the general-purpose command was issued by the driver or the passenger. Furthermore, in step 55, the controller 37 is employed to process the general-purpose command code, and directly sends a control signal to control the controlled device 4, such as entertainment equipment or air-conditioning equipment, which corresponds to the general-purpose command code.

On the contrary, if the speech signal identified by the speech recognizer 34 in the determination step is a driving-purpose command, step 56 is performed to transmit the control signal to the direction calculator 36 for activating the latter to calculate the issuing direction of the driving-purpose command for subsequent transmission to the controller 37. At the same time, the speech recognizer 34 also outputs a driving-purpose command code to the controller 37. Then, in step 57, the controller 37 is used to determine whether the driving- purpose command came from the direction of the driver. If yes, this indicates that the driving-purpose command was issued by the driver, and, in step 58, the controller 37 uses the driving-purpose command code to control the controlled device, such as an external rear view mirror or a windscreen wiper, which corresponds to the driving- purpose command code. Otherwise, this indicates that the driving- purpose command was issued by the passenger. Then, in step 59, the driving-purpose command is ignored without further processing.

Referring to Figure 4, in the preferred embodiment of this invention, the microphones M1 , M2 for collecting mixed sounds are disposed on a rear view mirror within the vehicle. In general, sounds within a vehicle compartment 20 (e.g., sounds coming from the driver S1 or the passenger or other sound sources S2) will convolute with impulse responses of the vehicle compartment 20. Therefore, the microphones MI ₁ M2 will pick up convoluted sounds. As shown in the following equation (1 ), mixed matrix A(τ) represents the impulse response of the vehicle compartment 20. T represents delay of the impulse response. ^α"(^r) represents the impulse response from the driver S1 to the microphone M1. ^α2i(^r) represents the impulse response from the driver S1 to the microphone M2. ^an(^τ) represents the impulse response from the passenger or other sound sources S2 to the microphone M1. ^α22(^r) represents the impulse response from the passenger or other sound sources S2 to the microphone M2.

As shown in the following equation (2), elements ^S1^ and ^S2^ of the matrix "*(*' represent the sound source signals of the driver S1 and the passenger or other sound sources S2, respectively, where t represents a point in time of the sound signal.

Si(O

S(t) = (2)

S2(t)_

As shown in the following equation (3), elements ^wxl(0 and mι^χ2(t) _{of the matr}j_x X(t) represent the signals received by the microphones M1 and M2, respectively. mix\(t)

X(t) = (3) mix2(t)

Accordingly, a mixed signal V' convoluted with the impulse response of the vehicle compartment 20 can be obtained from equation (4), where P represents the length of the impulse response.

«ii (r) a_u(τ)JS\(t - τ) mix\(t)

∑ A(τ)S (t - r) = ∑ = X(t) (4) τ=\ r=l Lα₂₁ (τ) a_Ω(τ)lS2(t - τ)j mix2(t)

Referring to Figures 2 and 5, a preferred embodiment of a method for controlling a vehicle using voice commands according to the present invention employs multiple adaptive decorrelation (MAD), and a BSS algorithm of frequency domain independent component analysis (FDICA) to perform step 52 so as to separate the mixed sounds for generating a driver command and a passenger command.

Such a BSS algorithm can be used to yield a de-mixing matrix ™ (τ) shown in the following equation (5) so that W*X=S, where the symbol "*" represents a convolve operator.

Thus, the BSS equation is as shown in the following equation (6), where Q represents the length of a filter.

= S(t) ... (6)

Therefore, the following equations (7) and (8) can be obtained according to equations (4) and (6):

A = ~ r Wj/-l (7)

In the ideal case, the de-mixing matrix ^w^ will be equal to an inverse matrix of the mixed matrix A(τ). However, in a general case, the de-mixing matrix ^w^ is an approximation of an inverse matrix of the mixed matrix A(τ). Thus, the de-mixing matrix ^w^ can be used to estimate time delay samples between impulse responses. As shown in equation (8), the time delay samples between impulse responses ^αn and ^α2i are equivalent to the time delay samples between impulse responses ^w∞ and ^{~ w}^ . A time delay sample is equivalent to a time differential between the maximum peak values of two impulse responses. The two impulse responses come from the same sound source. For example, if the time of the maximum peak value of ^απ occurs in the 10th sample, and the time of the maximum peak value of ^a^ occurs in the 14th sample, the time delay samples are 4 samples.

Therefore, in the preferred embodiment of a method for controlling a vehicle using voice commands according to the present invention as shown in Figure 5, step 56 further includes sub-steps 561 , 562, and 563. In sub-step 561 , the direction calculator 36 uses the de-mixing matrix W transmitted from the blind source separator 33 to calculate an inverse matrix W-1 thereof. Then, in sub-step 562, time delay is calculated. Subsequently, in sub-step 563, the issuing direction of the driving-purpose command is calculated in a manner to be described hereinafter. Referring to Figure 6, according to the time delay samples, the issuing direction of the driving-purpose command can be derived using a hyperbolic equation. As shown in Figure 6, generally speaking, a rear view mirror (not shown) in the vehicle will be rotated an angle Θ2, which can be inputted into the system of the present invention. Symbols (m, n) represent coordinates of the driver S1 or the passenger or other sound sources S2 in the x'y' coordinate system. Symbol d represents the distance between the two microphones M1 , M2. Symbol a represents delay distance, which is equal to (v^χk)/Fs, where v is the speed of sound (=331.4+0.6^χ temperature (°C)(m/sec)); k is the number of time delay samples; and Fs is the sampling rate. Angle Θ1 is the angle to be derived.

In Figure 5, assuming the numerical values to the left of the points of origin in the xy and x'y' coordinate systems are positive, an equation for line L in the xy coordinate system is x=h, where h represents a horizontal distance from the sound source S(m,n) to the center of the rear view mirror (i.e. , the points of origin of the xy and x'y¹ coordinate systems), and the horizontal distance h is supplied by the manufacturer of the vehicle.

Accordingly, a linear equation of line L in the x'y' coordinate system is shown in equation (9).

x'cos0₂ +/sin0₂ = A (9)

An equation for the hyperbolic curve C in the x'y' coordinate system is shown in the following equation (10):

According to equations (9) and (10), the coordinates of the sound source S(m,n) in the x'y' coordinate system are shown in the following equations (11) and (12):

2h(d² -a²)sin² O₁ -lha¹ cos² θ₂ -{2h{d² -a²)sinθ₂ 2{d² -a²)sm²θ₂cosθ₂ -2a² cos³ θ₂

^4h²{d² -α²)sin² θ₂ -((d²-a²)sin²θ₂ -a² cos² θ₂X4h²(d² -a²)-a²(d² -α²)cos² θ₂)) 2(d² -a²)sin² θ₂ cos<9₂ -2a² cos³ θ₂

H)

2h(d² -a²)smθ₂ n =

2(d² -a²)sϊn² θ₂ -2a² cos² θ₂

τj4h²(d² -a-)sin²θ₂ ~((c/² -a²)sin²θ₂ -a² cos²θ₂)(Ah²(d² -a²)-a²(d² -g-)cos²θ₂) l(d² - a²) sin² θ₂ - 2a² cos² θ₂

(12)

Thus, angle Θ1 can be calculated as tan-1(n/m). Then, by subtracting Θ2 from Θ1, the direction of the sound source S(m,n) can be obtained.

In the method and system for controlling a vehicle using voice commands according to the present invention, the actual angle of the driver S1 relative to the point of origin in the xy coordinate system can be pre-defined to be between 30 and 60 degrees, for instance.

Therefore, when the direction calculator 36 calculates the angle Θ1-Θ2 of a driving-purpose command to be between 30 and 60 degrees, the controller 37 will determine that the driving-purpose command was issued by the driver S1, and will then control the controlled device 4 which corresponds to the driving-purpose command. On the contrary, if the direction calculator 36 calculates the angle Θ1-Θ2 of the driving- purpose command to be outside the range of from 30 to 60 degrees, the controller 37 will determine that the driving-purpose command was issued by the passenger or other sound sources S2, and will then ignore the driving-purpose command without further processing. In addition, aside from the method of utilizing time delay samples and hyperbolic equations to calculate the issuing direction of a voice command, other methods such as beamforming and crosspower spectrum phase (CSP) suitable for calculating the issuing direction of the voice command also fall within the scope of the present invention.

In sum, the method and system for controlling a vehicle using voice commands according to the present invention employs the blind source separator 33 to separate mixed voice commands received by the microphones M1 , M2. Then, if the voice command thus separated is a driving-purpose command, the method and system of the present invention can further determine whether the driving-purpose command was issued by the driver S1 by calculating the issuing direction of the driving-purpose command. If the driving-purpose command was indeed issued by the driver S1 , the controlled device in the vehicle to which the driving-purpose command corresponds will be controlled. On the contrary, if the driving-purpose command was not issued by the driver S1 , the driving-purpose command will not be processed.

While the present invention has been described in connection with what is considered the most practical and preferred embodiment, it is understood that this invention is not limited to the disclosed embodiment but is intended to cover various arrangements included within the spirit and scope of the broadest interpretation so as to encompass all such modifications and equivalent arrangements. Industrial Applicability

The present invention can be applied to method and system for controlling a vehicle using voice commands.

Claims

1. A method for controlling a vehicle using voice commands, comprising the following steps:

(a) using a blind source separator to separate sounds collected by a plurality of microphones into a plurality of sound sources;

(b)according to predetermined voice command data in a command database, identifying a voice command from sound source signals sent from the blind source separator, and determining whether the voice command is a driving-purpose command;

(c) if the voice command is a driving-purpose command, calculating an issuing direction of the driving-purpose command according to position-related information of the driving-purpose command; (d) according to the issuing direction of the driving-purpose command thus calculated, determining whether the driving-purpose command was issued by a driver; and

(e) if the driving-purpose command was issued by the driver, controlling a controlled device in the vehicle to which the driving- purpose command corresponds.

2. The method for controlling a vehicle using voice commands as claimed in Claim 1 , further comprising a step of using an amplifier to amplify the sounds collected by the microphones before step (a).

3. The method for controlling a vehicle using voice commands as claimed in Claim 1 , further comprising a step of using an analog/digital converter to convert analog signals of the sounds collected by the microphones to digital signals before step (a).

4. The method for controlling a vehicle using voice commands as claimed in Claim 1 , wherein if the voice command is a general- purpose command, a controlled device in the vehicle to which the general-purpose command corresponds is directly controlled.

5. The method for controlling a vehicle using voice commands as claimed in Claim 1 , wherein if the driving-purpose command was not issued by the driver, the driving-purpose command is ignored.

6. The method for controlling a vehicle using voice commands as claimed in Claim 1 , wherein step (a) further includes using a de- mixing matrix to separate the sounds collected by the microphones into the sound sources, and step (c) further includes calculating in sequence an inverse matrix of the de-mixing matrix and a time delay so as to calculate the issuing direction of the driving-purpose command.

7. A system for controlling a vehicle using voice commands, which can separate a plurality of sound sources collected by a plurality of microphones into a plurality of voice commands, and which subsequently utilizes the voice commands to control a plurality of controlled devices in the vehicle, said system comprising: a blind source separator for separating sounds collected by the microphones into the sound sources; a command database; a speech recognizer for receiving sound source signals sent from the blind source separator, and for identifying the voice commands from the sound source signals according to predetermined voice command data in the command database; a direction calculator which calculates an issuing direction of a voice command coming from each sound source according to the voice commands identified by the speech recognizer and position- related information of the sound sources sent from the blind source separator; and a controller which, according to the voice commands identified by the speech recognizer and the issuing directions of the voice commands as calculated by the direction calculator, determines whether or not to control the corresponding controlled devices.

8. The system for controlling a vehicle using voice commands as claimed in Claim 7, further comprising an amplifier for amplifying the sounds collected by the microphones.

9. The system for controlling a vehicle using voice commands as claimed in Claim 7, further comprising an analog/digital converter for converting analog signals of the sounds collected by the microphones to digital signals.