Implementation of Vehicle Navigation Human-machine Voice Interactive System

introduction

This article refers to the address: http://

As a natural human-machine interface, voice can make the car navigation system safer and more user-friendly. Through the comparison of the functions of the car navigation system at home and abroad, it is known that supporting voice interaction is a development trend of the car navigation system. In addition, research data from market information services company JD Power and Associates also showed that 56% of consumers are more inclined to choose voice-activated navigation systems. Therefore, it is meaningful to develop a car navigation system. At present, the domestic technology base for developing on-board voice navigation systems has been established in China, especially the text-to-speech TTS technology and the voice command recognition technology based on small and medium vocabulary have reached a practical level. Based on the vehicle navigation system of the research group and the two domestic speech engines, this paper develops a car navigation system that supports voice interaction.

Car voice navigation system structure

The car voice navigation system is functionally divided into two aspects: car navigation and navigation voice interaction. The car navigation function includes real-time display of navigation information such as GPS satellite navigation and positioning, electronic map browsing query, intelligent path planning, vehicle geographic location and speed; navigation voice interaction function is divided into two parts: voice operation and voice prompt. In the design of the system, according to the needs of human-computer interaction, the hardware framework of the design of the voice navigation system is shown in Figure 1.

The human-computer interaction interface between the voice navigation system and the user consists of five interactive devices: a touch screen, a button, a microphone, a display, and a loudspeaker. The hardware framework can implement a conventional manual interaction mode or a voice interaction mode. The whole system is divided into three subsystems: navigation subsystem, speech recognition subsystem and speech synthesis subsystem. Each subsystem communicates through interfaces and coordinates voice navigation tasks.
Design of dialogue mode for car navigation human-machine voice interaction system

Navigation system state transition network

The whole navigation system is a complex human-computer interaction system. In order to facilitate the design of the voice interactive dialogue mode, the system is first divided into states, and then the state transition network of the whole system is described from the perspective of human-computer interaction. The system is divided into six functional states such as map browsing and function selection, and an exit state. Figure 2 depicts the state transition network between these states.

The nodes in the figure represent the various states of the system, and the lines with arrows represent the transition from the source state to the target state. The state transition network receives the user's operation as a drive event, completing the transition from one state to another, and a path in the network represents a specific interaction process.

Design of dialogue mode for each state node of navigation system

To facilitate the description of the dialog mode within each state node, the state nodes are numbered S1~S7 as shown in FIG. 2, and the transition of the state node Sm to the state node Sn is represented by Tmn. In addition, referring to the representation method of the state flow stateflow model, a dialog model for describing the in-vehicle navigation human-machine voice interaction system is proposed. Redefine the way the transformation is described, with four attributes describing a transformation within the state node:

T={P1, P2, P3, P4} (1)

Where t is used to represent a conversion, and P1 to P4 are attributes of the conversion: P1 is a speech event; P2 is a speech output; P3 is an additional condition; and P4 is a conversion action.

Thus, a conversion t describes the user's voice input in a conversation, the voice output of the system, the constraints imposed by the conversation, and the actions performed by the system.

Take the map browsing status as an example to illustrate the process of dialogue mode design. The map browsing state consists of two mutually exclusive substates: map roaming state and vehicle guidance state (see Figure 2). The human-computer interaction of these two seed states is mostly the same, so the two are uniformly divided into the map browsing state. For the interaction process that treats the two sub-states differently, the current sub-state can be judged by additional conditions, and then different processing is performed. The dialog mode design of the map browsing status node is shown in Figure 3.



Implementation of human-machine voice interactive system

Implementation of voice control commands

The implementation of the voice control command is shown in Figure 4. The box on the left in the figure represents the state transition network STN of the entire voice navigation system dialog mode. According to the design of the dialogue mode, the system is divided into seven state nodes, such as map browsing state, function selection state, and path planning state. Each state node has its own voice dialogue mode, and the dialogue mode is composed of several internal transitions. Therefore, the entire voice navigation system is a two-layer state transition network whose internal conversion is driven by voice events. The speech event is generated by the interface module of the navigation subsystem based on the user's intent sent by the speech recognition subsystem.

The implementation process of the voice control command is divided into the following four steps:

* The speech recognition engine recognizes the user's voice according to the current command vocabulary and obtains the recognition result.

* The management window obtains the recognition result, and obtains a control command corresponding to the recognition result by querying the "recognition word-control command" mapping, and sends the control command as an interface module of the navigation subsystem as a user intention.

* The interface module responds to the user's intent by changing the state of the voice navigation system through voice events.

* The interface module determines whether the current command vocabulary needs to be changed according to the state of the voice navigation system, and if necessary, changes the current command vocabulary through the management window.

POI name identification method

In addition to identifying control commands, the recognition subsystem also needs to identify the POI (point of interest, landmark) name. The biggest difference between POI name recognition and control command recognition is the difference in size of its candidate sets. In this system, the size of the candidate set is about 30 when the control command is identified. However, when the POI name is identified, the Beijing electronic map is used as an example, and the number of POI points is 20,172. The size of the collection is orders of magnitude larger than when the control command is recognized.

When using the command word recognition engine for identification, the engine must be provided with a current vocabulary, and the words in the candidate set must first be converted into a vocabulary to be truly recognized. At the same time, the ASR recognition engine based on the small and medium vocabulary cannot generate a vocabulary with a scale of more than 20,000. Therefore, for the POI name recognition, a scheme different from the control command recognition is adopted. When the control command is recognized, since the candidate set can be represented by a vocabulary, an online recognition method is adopted. When the POI name is recognized, the single vocabulary cannot accommodate all the POI names, and thus an offline traversal recognition scheme using the offline recognition function of the recognition engine is proposed. The scheme uses multiple vocabularies to describe the entire candidate set. The specific process of implementation is shown in Figure 5.

The scheme divides the candidate POI set into n subsets, and generates the vocabulary of each subset, and then offlinely identifies each vocabulary as the current vocabulary, and summarizes the partial recognition results to form a temporary vocabulary, and finally This temporary vocabulary is identified to obtain a global optimal recognition result. The process traverses each subset, which is equivalent to matching the optimal recognition result in the entire candidate set, so the recognition correct rate is guaranteed. At the same time, due to the increase in the number of recognitions, the recognition time becomes correspondingly longer.

Navigation system voice prompt implementation

The voice prompts of the navigation system are completed by a dedicated speech synthesis subsystem. The implementation process of the voice prompt is divided into two steps of making a request and executing a request. The requester and the performer form a client/server (C/S) model in which the speech synthesis subsystem acts as a server. Since the speech synthesis engine usually cannot output multi-line synthesized speech at the same time, there is a case where a request conflict occurs. When a request conflict occurs, the most straightforward processing strategy is to abort the ongoing synthesis to proceed to the next synthesis, or to maintain the ongoing synthesis while ignoring the new composite request. To this end, a management module is designed in the speech synthesis subsystem to determine the processing mode when a synthetic collision occurs.

For the speech synthesis subsystem, the synthesis request is presented as a random event, and such random events are recorded as Qi. Each composite request Qi has a priority attribute, and its priority depends on the importance of the requested prompt information, as shown in Table 1. The processing flow of the management module is shown in Figure 6. If the priority of the next request Qi+1 is higher than the current request Qi, then Qi+1 is preferentially synthesized.


Experimental verification of vehicle voice navigation system

Figure 7 is a physical photograph of the in-vehicle voice navigation system herein. The verification experiment of voice navigation was carried out on the system, and the car navigation function shown in Table 2 was completed through voice interaction. Experiments show that the state of the system can be completely and correctly converted according to the designed dialogue mode, and the human-machine dialogue process of various navigation functions can be correctly completed; at the same time, the voice prompt of the system can work correctly.



In addition, the ability of the system to respond correctly to voice control commands was tested. In the test, 49 clear words of all voice control commands in the map browsing state were tested with clear and steady voice. A total of 49×3=147 times were tested, 132 times were successful, 15 times failed, and the success rate was 89.8%. It can be seen that the system voice control command is effective.

In the trial of mass POI name identification, POI names with 2 to 10 words were tested. For each POI name of each length, 10 are tested separately. Each of the POI names is tested at most twice, and the second test is continued only if the first test fails. The test results are shown in Table 3.

It can be seen that the correct recognition rate of the offline traversal identification scheme is 86.7%, and the secondary recognition accuracy rate is 93.3%. The average time taken for correct identification is between 6.1s and 10.4s, and the average time taken to calculate the weighted by the statistical distribution of the word number of the POI name is 8.3s. The above data shows that the scheme can realize the recognition of large vocabulary POI names by using a small vocabulary keyword recognition engine, and obtain a satisfactory recognition accuracy rate, but it takes a long time.
Conclusion

This paper mainly completed the design and implementation of the vehicle navigation human-machine voice interaction system, and carried out experimental verification on the system in the laboratory environment.

It proves that with the synthesized speech, rich and flexible voice prompts can be realized, so that the user can use the navigation system without distracting too much energy. Further work is to improve the recognition accuracy and reduce the average time spent correctly.

Servo Motor Diameter 360mm

Zhejiang Synmot Electrical Technology Co., Ltd , https://www.synmot-electrical.com

Posted on