Building a Voice to Text Search
On this page
To build a voice search experience, your application needs three things:
- an input
- an output
- and - in the middle - fulfillment
Algolia sits right there in the middle in the fulfillment, but before looking at that, let’s look at what you need for the outer layers of the round-trip.
Input
The speech to text (STT) layer
This is where your user will speak to your application and their speech becomes text. Algolia only handles search that comes from text, so you must have a speech to text (STT) layer.
If you’re building on top of a voice-first platform, like Alexa or Google Assistant, then you get built-in speech to text. This is also true today inside the Chrome browser, and in iOS and Android native apps. For all other web-based applications, you’ll need an STT service. Some options are Google Cloud Speech to Text, Azure Cognitive Services, or AssemblyAI. You will send the user’s speech to the STT service, receive it back, and then send it to Algolia as a search query.
Steps
-
Add a speech to text (STT) layer
- With browser (Chrome only), native app, or voice platform tooling
- With third-party service, such as:
- Google Cloud Speech to Text
- Azure Cognitive Services
- AssemblyAI
Output
Speech Synthesis
Not all voice platforms need speech synthesis, or text to speech (TTS). A mobile website, for example, might suffice for showing search results. If you need it, your choices again are either baked-in, or third party. Voice-first platforms have their own speech synthesis, of course, and all major, modern browsers have support for speech synthesis through the SpeechSynthesis API. If you want a wider choice of voices, you have Azure Cognitive Services or AWS Polly.
Steps
-
Determine if speech synthesis is necessary (might not be if not a conversation)
-
Implement speech synthesis
- With browser, native app, or vvoice platform tooling
- With third-party services
- Azure Cognitive Services
- AWS Polly
Fulfillment
The fulfillment is the business logic code that powers your application or website. Algolia will be the part of the fulfillment that brings up the relevant content to display to the user, much like Algolia is one part of your website or application today.
There are two parts to the Algolia fulfillment:
- query time settings
- index configurations
Query Time Settings
-
Set
removeStopWords
to the two-letter code of language used (e.g. en)- This will pull out words like “a,” “an,” or “the” that don’t add value to the query
-
Send the entire query string along as
optionalWords
(no need to split the words)- When searching conversationally, searchers might add words that won’t be in any of the records. Marking all of the words as optional means that records don’t need to match all of the words, but records matching more words will rank higher than those matching fewer.
-
Set
ignorePlurals
to true to the two-letter code of language used (e.g. en)- This makes words like “car” and “cars” equivalent
- Send
analyticsTags
including voice - You can activate these settings (and more) using the
naturalLanguages
parameter