Real-time voice interaction for smarter and more convenient applications

With the advancement of artificial intelligence and natural language processing technology, user demand for intelligent and convenient applications is increasing. Voice interaction technology, with its intuitive voice commands, revolutionizes the traditional manual input method, simplifies user operations, and makes applications easier to use and more efficient.

Through voice interaction, users can operate through voice commands when it is inconvenient to use touch screen operations such as driving or cooking; when a large amount of text needs to be entered, the efficiency of information input can be significantly improved through voice input; in addition, voice interaction also provides a convenient alternative interaction method for users with visual impairment or reading difficulties.

HarmonyOS SDK Basic voice services(Core Speech Kit) integrates basic AI capabilities for speech classes, includingtext-to-speech(TextToSpeech) andspeech recognition(SpeechRecognizer) capability, which facilitates the interaction between the user and the device, enabling the conversion of real-time input between speech and text.

text-to-speech

It can efficiently synthesize a text of up to 10,000 characters into a broadcastable audio stream, converting the text into a smooth and natural human voice, which is widely used in multiple application scenarios such as audiobook reading, news broadcasting, and station hall announcing.

The system has barrier-free access to text-to-speech capability, and can also provide the visually impaired with Mandarin broadcasting function under the no-network condition, with the timbre of the female voice of Listening Xiaoshan.

speech recognition

It can efficiently realize real-time voice transcription into text, freeing your hands, and is suitable for multiple application scenarios such as voice chat, voice search, voice command, voice Q&A, and so on.

Convert a piece of audio (up to 60s in length) information into text. Speech recognition service provides the ability to convert audio information into text, which facilitates the interaction between the user and the device, realizing real-time voice interaction and speech recognition. Currently, the languages supported by this service are Chinese, and offline models are supported.

Competence Advantage

Stable and reliable: end-side capability, network-independent, stable and reliable.

Out-of-the-box: system native APIs that don't take up application space out-of-the-box.

Feature-rich: Provides rich expansion and adjustment parameters for different scenarios.

Function Demo

development step

(i) Text-to-speech

1. When using text-to-speech, add the class related to the implementation of text-to-speech to the project.

import { textToSpeech } from '@';
import { BusinessError } from '@';

2. CallcreateEngineinterface to create a textToSpeechEngine instance.

The createEngine interface provides two types of calls, one of which is currently used as an example, while the other can be referred to asAPI Reference。

let ttsEngine: ;

// Setting Creation Engine Parameters
let extraParam: Record<string, Object> = {"style": 'interaction-broadcast', "locate": 'CN', "name": 'EngineName'};
let initParamsInfo: = {
  language: 'zh-CN',
  person: 0,
  online: 1,
  extraParams: extraParam
};

// call (programming)createEnginemethodologies
(initParamsInfo, (err: BusinessError, textToSpeechEngine: ) => {
  if (!err) {
    ('Succeeded in creating engine');
    // Receive instances of the creation engine
    ttsEngine = textToSpeechEngine;
  } else {
    // Returns an error code if engine creation fails1003400005，Possible causes：The engine doesn't exist.、Resource does not exist、Create Engine Timeout
    (`Failed to create engine. Code: ${}, message: ${}.`);
  }
});

3. GettingTextToSpeechEngineAfter instantiating the object, instantiate theSpeakParamsboyfriend、SpeakListenerobject, and pass in the originalText of the text to be synthesized and broadcast, call thespeakinterface for broadcasting.

// set upspeakcallback information
let speakListener: = {
  // Start broadcasting callbacks
  onStart(requestId: string, response: ) {
    (`onStart, requestId: ${requestId} response: ${(response)}`);
  },
  // Synthesis Completion and Broadcast Completion Callbacks
  onComplete(requestId: string, response: ) {
    (`onComplete, requestId: ${requestId} response: ${(response)}`);
  },
  // Stop Announcement Callback
  onStop(requestId: string, response: ) {
    (`onStop, requestId: ${requestId} response: ${(response)}`);
  },
  // Returns the audio stream
  onData(requestId: string, audio: ArrayBuffer, response: ) {
    (`onData, requestId: ${requestId} sequence: ${(response)} audio: ${(audio)}`);
  },
  // error callback
  onError(requestId: string, errorCode: number, errorMessage: string) {
    (`onError, requestId: ${requestId} errorCode: ${errorCode} errorMessage: ${errorMessage}`);
  }
};
// set up回调
(speakListener);
let originalText: string = 'How are you?,Huawei (brand)';
// set up播报相关参数
let extraParam: Record<string, Object> = {"queueMode": 0, "speed": 1, "volume": 2, "pitch": 1, "languageContext": 'zh-CN',
"audioType": "pcm", "soundChannel": 3, "playType": 1 };
let speakParams: = {
  requestId: '123456', // requestIdCan only be used once in the same instance，请勿重复set up
  extraParams: extraParam
};
// Calling the broadcast method
(originalText, speakParams);

(ii) Speech recognition

1. When using speech recognition, add classes related to implementing speech recognition to the project.

import { speechRecognizer } from '@';
import { BusinessError } from '@';

2. CallcreateEnginemethod, which initializes the engine and creates theSpeechRecognitionEngineExample.

The createEngine method provides two types of calls, one of which is used as an example.API Reference。

let asrEngine: ;
let requestId: string = '123456';
// Creating an Engine，pass (a bill or inspection etc)callbackReturns in the form
// 设置Creating an Engine参数
let extraParam: Record<string, Object> = {"locate": "CN", "recognizerMode": "short"};
let initParamsInfo: = {
  language: 'zh-CN',
  online: 1,
  extraParams: extraParam
};
// call (programming)createEnginemethodologies
(initParamsInfo, (err: BusinessError, speechRecognitionEngine: ) => {
  if (!err) {
    ('Succeeded in creating engine.');
    // 接收Creating an Engine的实例
    asrEngine = speechRecognitionEngine;
  } else {
    // 无法Creating an Engine时返回错误码1002200008，rationale：The engine is being destroyed.
    (`Failed to create engine. Code: ${}, message: ${}.`);
  }
});

3. GettingSpeechRecognitionEngineAfter instantiating the object, instantiate theRecognitionListenerobject, call thesetListenermethod sets up callbacks to receive voice recognition related callback information.

// Creating Callback Objects
let setListener: = {
  // Start Recognition Success Callback
  onStart(sessionId: string, eventMessage: string) {
    (`onStart, sessionId: ${sessionId} eventMessage: ${eventMessage}`);
  },
  // event callback
  onEvent(sessionId: string, eventCode: number, eventMessage: string) {
    (`onEvent, sessionId: ${sessionId} eventCode: ${eventCode} eventMessage: ${eventMessage}`);
  },
  // Recognition result callback，Includes intermediate and final results
  onResult(sessionId: string, result: ) {
    (`onResult, sessionId: ${sessionId} sessionId: ${(result)}`);
  },
  // Recognition completion callback
  onComplete(sessionId: string, eventMessage: string) {
    (`onComplete, sessionId: ${sessionId} eventMessage: ${eventMessage}`);
  },
  // error callback，Error codes are returned by this method
  // as if：Return error code1002200006，The recognition engine is busy.，The engine is being recognized.
  // For more error codes, please refer to Error Code Reference
  onError(sessionId: string, errorCode: number, errorMessage: string) {
    (`onError, sessionId: ${sessionId} errorCode: ${errorCode} errorMessage: ${errorMessage}`);
  }
}
// Setting Callbacks
(setListener);

4. Set the parameters related to the start of the recognition, call thestartListeningmethod and start synthesizing.

let audioParam: = {audioType: 'pcm', sampleRate: 16000, soundChannel: 1, sampleBit: 16};
let extraParam: Record<string, Object> = {"vadBegin": 2000, "vadEnd": 3000, "maxAudioDuration": 40000};
let recognizerParams: = {
  sessionId: requestId,
  audioInfo: audioParam,
  extraParams: extraParam
};
// Calling the Start Recognition method
(recognizerParams);

5. Pass in the audio stream and call thewriteAudiomethod to start writing the audio stream. To read an audio file, the developer needs to prepare a pcm format audio file in advance.

let uint8Array: Uint8Array = new Uint8Array();
// The audio stream can be obtained in the following ways: 1) by recording the audio stream; 2) by reading the audio stream from an audio file.
// Write the audio stream, the audio stream length only supports 640 or 1280.
(requestId, uint8Array).

Learn more >>

interviewsOfficial website of the Alliance for Basic Voice Services

gainText-to-Speech Service Development Guidance Document

gainSpeech Recognition Service Development Guidance Document