Bot Orchestrator
What is it ?
Bot Orchestrator is a product designed to give a voice to regular chatbots, in real-time. It interfaces with an existing chatbot, and provide a telephony platform to host the voice bot.
What does it do ?
Once integrated with the chatbot, the orchestrator provides a direct dial-in (DID) to receive calls. For each call the orchestrator:
- initializes the environment and requests the bot (NLU) for a welcome sentence
- convert this sentence to an audio streams and plays it to the caller (text-to-speech, TTS)
- listen to the caller audio response, and send it to our speech to text recognition engine (ASR)
- post process the ASR response to correct common mistakes and mis-interpretations, and send it to the bot (NLU)
- forwards the bot text response to the TTS, and plays it back
- and loops to wait for the caller to say something, etc...
- loop on these steps as needed by the bot scenario.
Supported Bot Engines
The orchestrator already has connectors to several NLU engine of the market:
- IBM Watson
- Tock
- Mindsay
- Google DialogFlow
- XBRAIN
- Allo-Media's own Internal NLP
- Orange SMARTLY
- ODIGO
- ILLUIN
- Sopra Steria
Creating a new connector is a straightforward and easy process, specially when the bot provides a simple REST API (syncronous).
Generally, exchanges with the bot is based on json requests and reply. Using the json tags in the reply, the bot can request different actions from the orchestrator:
- terminate a call (hangup)
- transfer a call to another number
- notify when audio message (ie, bot answer) is completed
- switch to a specific mode / language model
Configuration
Each bot instance can be tuned with a number of parameters:
- NLU engine
- TTS Engine (available: Voxygen, Google TTS) - text mode or SSML
- Voice, and voice parameters (language, pitch, speed, etc...)
- Number of simultaneous call per DID
- Failure default message (when the NLU does not answer)
Language Model and Modes
In some situations, the bot NLU engine expects a specific kind of data: a phone number, a customer id, a license plate, an address...
In order to narrow down the ASR response to these contexts, the optional request parameter mode
accept one of the following mode, as well as length extra parameters, in the form: "mode" : "{modename}-{min}-{max}"
.
Modes allow concatenating utterances received from the ASR.
Length parameters
In the default mode (no specific context), a request is sent to the bot NLU as soon as an end of utterance is received. If no word is said during a default timeout (configurable per did/bot), ususally 15 or 10 seconds, an event #VOCALTIMEOUT# is sent. When using modes, a minimum expected characters length and maximum expected characters length is specified. Depending on the number of characters received, longer or shorter timeouts are activated. This allows for customers to have time to search for a piece of information (their customer id for instance, that he may not know by heart), but also, once a reasonable number of information is received, to react rapidly.
- When less than
min
characters are received, a first timeout is activated - this timeout is usually long, - When more than
min
characters are received but less thanmax
, a shorter timeout is activated, - When
max
characters are received or more, the end of utterance triggers the sending of the caller's reply to the bot.
All modes except the basic default mode, and those explicitly indicated, have this timeout management feature.
Digits
Code:digits
It will correct some spelling mistakes (quatre vingt -> 80, cette -> 7).
Example for a french phone number of 10 digits: "mode": "digit-10-10"
Spelling
Code: spell
Return only letters - if words are pronounced, the initial is returned.
Spell Simple
Code: spellsimple
Fixes common mistakes (c'est => C) but returns full words when pronounced, along with single letters.
Alphanumeric
Code: mixed
: return both digits and letters
License plates
Code: immat
Returns a french license plate. For example new AB123CD
and old 1234WW99
formats.
Waiting
Code: wait
Simply waits with longer timeout instead of end of utterances - allow for the caller to mark pauses - like when requesting a lengthy explanation
Address
Code: adresse
Triggers a specific language model that will prefer city names over nouns. For example the city of Hyeres
would be prefered over the word hier
.
Names
Code: noms
Triggers a specific language model dédicated to names. No timeout management implemented
Names Spelling
Code: spellname
Using the defaut language model, waits for a speaker to spell a name, and processes things like 'Double L' => LL, and returns a word assembled with the spelled letters
DTMF (deprecated)
Code: dtmf
Receives dtmf tones. Deprecated.
Car brands
Code: automarque
Allows to recognize most known car brands