Bot Orchestrator

What is it ?

Bot Orchestrator is a product designed to give a voice to regular chatbots, in real-time. It interfaces with an existing chatbot, and provide a telephony platform to host the voice bot.

What does it do ?

Once integrated with the chatbot, the orchestrator provides a direct dial-in (DID) to receive calls. For each call the orchestrator:

  • initializes the environment and requests the bot (NLU) for a welcome sentence
  • convert this sentence to an audio streams and plays it to the caller (text-to-speech, TTS)
  • listen to the caller audio response, and send it to our speech to text recognition engine (ASR)
  • post process the ASR response to correct common mistakes and mis-interpretations, and send it to the bot (NLU)
  • forwards the bot text response to the TTS, and plays it back
  • and loops to wait for the caller to say something, etc...
  • loop on these steps as needed by the bot scenario.

Supported Bot Engines

The orchestrator already has connectors to several NLU engine of the market:

Creating a new connector is a straightforward and easy process, specially when the bot provides a simple REST API (syncronous).

Generally, exchanges with the bot is based on json requests and reply. Using the json tags in the reply, the bot can request different actions from the orchestrator:

  • terminate a call (hangup)
  • transfer a call to another number
  • notify when audio message (ie, bot answer) is completed
  • switch to a specific mode / language model


Each bot instance can be tuned with a number of parameters:

  • NLU engine
  • TTS Engine (available: Voxygen, Google TTS) - text mode or SSML
  • Voice, and voice parameters (language, pitch, speed, etc...)
  • Number of simultaneous call per DID
  • Failure default message (when the NLU does not answer)

Language Model and Modes

In some situations, the bot NLU engine expects a specific kind of data: a phone number, a customer id, a license plate, an address...

In order to narrow down the ASR response to these contexts, the optional request parameter mode accept one of the following mode, as well as length extra parameters, in the form: "mode" : "{modename}-{min}-{max}".

Modes allow concatenating utterances received from the ASR.

Length parameters

In the default mode (no specific context), a request is sent to the bot NLU as soon as an end of utterance is received. If no word is said during a default timeout (configurable per did/bot), ususally 15 or 10 seconds, an event #VOCALTIMEOUT# is sent. When using modes, a minimum expected characters length and maximum expected characters length is specified. Depending on the number of characters received, longer or shorter timeouts are activated. This allows for customers to have time to search for a piece of information (their customer id for instance, that he may not know by heart), but also, once a reasonable number of information is received, to react rapidly.

  • When less than min characters are received, a first timeout is activated - this timeout is usually long,
  • When more than min characters are received but less than max, a shorter timeout is activated,
  • When max characters are received or more, the end of utterance triggers the sending of the caller's reply to the bot.

All modes except the basic default mode, and those explicitly indicated, have this timeout management feature.



It will correct some spelling mistakes (quatre vingt -> 80, cette -> 7).

Example for a french phone number of 10 digits: "mode": "digit-10-10"


Code: spell

Return only letters - if words are pronounced, the initial is returned.

Spell Simple

Code: spellsimple

Fixes common mistakes (c'est => C) but returns full words when pronounced, along with single letters.


Code: mixed : return both digits and letters

License plates

Code: immat

Returns a french license plate. For example new AB123CD and old 1234WW99 formats.


Code: wait

Simply waits with longer timeout instead of end of utterances - allow for the caller to mark pauses - like when requesting a lengthy explanation


Code: adresse

Triggers a specific language model that will prefer city names over nouns. For example the city of Hyeres would be prefered over the word hier.


Code: noms

Triggers a specific language model dédicated to names. No timeout management implemented

Names Spelling

Code: spellname

Using the defaut language model, waits for a speaker to spell a name, and processes things like 'Double L' => LL, and returns a word assembled with the spelled letters

DTMF (deprecated)

Code: dtmf

Receives dtmf tones. Deprecated.

Car brands

Code: automarque

Allows to recognize most known car brands