Available MRCP grammars

Unless otherwise specified, these grammars work in French as well as in English.

Multi-grammars

Support of multiple grammars is available using command RECOGNIZE.

Note that grammars can not be weighted. The first matching grammar will be returned. In case several grammar match at the same time, the first one listed will be returned.

Default

Raw ASR, without any interpretation and using only words in letters (no digits).

Grammar name:

  • builtin:grammar/none
  • builtin:speech/transcribe: alias to builtin:grammar/none,
  • builtin:speech/none: alias to builtin:speech/transcribe.

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>je veux changer mon billet</instance>
<input mode="speech" timestamp-start="2020-12-22T11:11:45.620+01:00" timestamp-end="2020-12-22T11:11:47.060+01:00" confidence="1.00">je veux changer mon billet</input>
</interpretation>
</result>

Address

Address mode for postal addresses.

In order to validate a complete-match, it is necessary that these 3 elements are expressed:

  • Street
  • City
  • Zip code

It is possible to announce these 3 elements in any order.

The language model is based on real address dictionaries. So you have to avoid imaginary addresses if you want to measure the performance of the grammar.

About the modes:

  • In normal mode: it is possible not to give the 3 elements: street, city and zip code. If one or more are missing, you will get a "partial-match". The "no-match" will only occur if the ASR is not able to detect anything.
  • In hotword mode: if one or more of the 3 elements are missing, then you will get a "no-match". The hotword mode only supports "complete" or "no-match".

At the moment this feature is only available in French and for addresses in France.

Raw mode

Grammar name:

  • builtin:speech/address

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>37 rue du docteur leroy 72000 le mans</instance>
<input mode="speech" timestamp-start="2021-02-19T10:20:21.114+01:00" timestamp-end="2021-02-19T10:20:25.224+01:00" confidence="1.00">trente-sept rue du docteur leroy soixante-douze mille le mans</input>
</interpretation>
</result>

Structured mode

Instead of returning a plain text result, this grammar returns an XML document with different fields for street number, street, city and postal code. It's quite robust.
You should use this grammar for maximum acuracy and detailled address information.

At the moment this feature is only available in French and for addresses in France.

Grammar name:

  • builtin:speech/address?struct

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>
<address>
<number>37</number>
<street>rue du docteur leroy</street>
<zipcode>72000</zipcode>
<city>le mans</city>
</address>
</instance>
<input mode="speech" timestamp-start="2021-03-31T11:12:12.606+02:00" timestamp-end="2021-03-31T11:12:16.716+02:00" confidence="1.00">alors j'habite au trente-sept rue du docteur leroy au mans et euh le code postal c'est soixante-douze mille</input>
</interpretation>
</result>

Boolean

The Yes or No grammar. Compatible with Hotword mode.

Analyze the user input to determine if they accept/confirm or refuse/deny.

When the grammar matches, the interpretation contains either "yes" or "no".

Grammar name:

  • builtin:speech/boolean

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>yes</instance>
<input mode="speech" timestamp-start="2020-12-22T11:11:45.620+01:00" timestamp-end="2020-12-22T11:11:47.060+01:00" confidence="1.00">euh tout à fait</input>
</interpretation>
</result>

Keyword spotting

Match user input for some keyword from a set you define. The interpretation returns the matched keyword. Compatible with Hotword mode.

This grammar is really useful for example to offer end-users with choices at the beginning of their interaction with the IVR.

Grammar name:

  • builtin:speech/keywords?alternatives= + mutually exclusive keywords separated by |
  • Example: builtin:speech/keywords?alternatives=facture|commande|compte|conseiller

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>commande</instance>
<input mode="speech" timestamp-start="2021-02-19T10:20:21.114+01:00" timestamp-end="2021-02-19T10:20:25.224+01:00" confidence="1.00">
je voudrais euh suivre ma commande
</input>
</interpretation>
</result>

Numbers in digit form

Raw ASR with interpretation of numbers in digit form.

Grammar name:

  • builtin:speech/text2num

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>environ 1575 euros et 28 centimes</instance>
<input mode="speech" timestamp-start="2021-02-19T10:20:21.114+01:00" timestamp-end="2021-02-19T10:20:25.224+01:00" confidence="1.00">environ mille cinq cent soixante-quinze euros et vingt-huit centimes</input>
</interpretation>
</result>

Spelling

With the following grammars, the interpretation forces understanding of letters and numbers (returned in digit form) while filtering out parasite words.

Alphanumeric

Spelling mode for both letters and digits.

Grammar name:

  • builtin:speech/spelling/mixed

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.92">
<instance>e 7 1 9</instance>
<input mode="speech" timestamp-start="2021-02-19T10:28:30.294+01:00" timestamp-end="2021-02-19T10:28:31.944+01:00" confidence="0.98">euh sept un neuf</input>
</interpretation>
</result>

Digits only

Spelling mode restricted to numbers (in digit form) only.

Grammar name:

  • builtin:speech/spelling/digits

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.79">
<instance>1 2</instance>
<input mode="speech" timestamp-start="2021-02-19T10:30:12.684+01:00" timestamp-end="2021-02-19T10:30:17.934+01:00" confidence="0.63">ah ah ah un deux ah ah ah</input>
</interpretation>
</result>

Concerning the grouping of numbers, there are two possibilities:

  • Use of a "regex" or "length" parameter: all the digits are grouped and formatted without spaces.
  • No use of a "regex" or "length" parameter: no formatting is defined, the grouping is done according to what has been said.

Letters only

Spelling mode restricted to letters only. This grammar is especially useful for interpretation of names.

Grammar name:

  • builtin:speech/spelling/letters

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.91">
<instance>t n c a p b f f</instance>
<input mode="speech" timestamp-start="2021-02-19T10:31:54.664+01:00" timestamp-end="2021-02-19T10:32:06.814+01:00" confidence="0.96">alors t n c a p b f quatre-vingt-dix-huit f cinq dix-sept trente-huit soixante et un trente-trois dix-sept</input>
</interpretation>
</result>

Additional options

All of the above spelling grammars (mixed, digits and letters) accept additional options to narrow down the interpretation.

Thanks to these options, you will be able to interpret phone numbers, customer identifiers, social security numbers, licence plates and many other entities.

regex and length parameters are mutually exclusive.

Clarifications on homophones

When using spelling grammars, we may encounter homophones that can confuse the expected results. For example, "mon numéro DE colis" may be understood and transformed into the number "2". In most cases, homophones are corrected and not added to the results. Some homophones may not be corrected yet, we will add new rules as we encounter these cases. Do not hesitate to give us feedback about this.

Length

Forces the interpretation as a single word of the given length. Compatible with Hotword mode.

Grammar names:

  • builtin:speech/spelling/mixed?length= + integer
  • builtin:speech/spelling/digits?length= + integer
  • builtin:speech/spelling/letters?length= + integer

Response example for builtin:speech/spelling/digits?length=5:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>35170</instance>
<input mode="speech" timestamp-start="2021-02-19T10:33:36.303+01:00" timestamp-end="2021-02-19T10:33:38.193+01:00" confidence="1.00">trente-cinq mille cent soixante-dix</input>
</interpretation>
</result>

If you're looking for a ZipCode interpretation, please have a look at our dedicated grammar.

If you're looking for a phone number, the spelling digits grammar with a specific length is what you're looking for.

Regex

The interpretation returns the regular expression match as a single word. Compatible with Hotword mode.

Note that partial matches are not supported with this parameter set, so you'd better use hotword mode with it.

The regex syntax supported is detailled here: https://docs.rs/regex/1.4.3/regex/#syntax

Grammar names:

  • builtin:speech/spelling/mixed?regex= + pattern
  • builtin:speech/spelling/digits?regex= + pattern
  • builtin:speech/spelling/letters?regex= + pattern

Response example for builtin:speech/spelling/mixed?regex=[a-z]{2}[0-9]{9}[a-z]{2}:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.98">
<instance>bf969352528jd</instance>
<input mode="speech" timestamp-start="2021-02-19T10:35:20.474+01:00" timestamp-end="2021-02-19T10:35:30.014+01:00" confidence="0.98">b f neuf six neuf trois cinq deux cinq deux huit j d</input>
</interpretation>
</result>

Looking to interpret licence plates? That's probably the grammar you're looking for!

  • For example for both old and new licence plates in France, the pattern you're looking for is ([a-z]{2}[0-9]{3}[a-z]{2})|([0-9]{4}[a-z]{3}[0-9]{2}),
  • For the UK, the pattern would be [a-z]{2}[0-9]{2}[a-z]{3}

Note that these patterns are quite straightforward but could be improved. For example in the UK Z cannot be part of the two first letters, or in France the letter O is never used.

Zipcode

The interpretation returns a 5 digits ZIP codes that are splittable as 2+3 digits, as it can be found in France, Spain or USA. Compatible with Hotword mode.

To conform to MRCPv2 RFC, note that:

  • in normal mode, it may return a partial-match if what the user said starts like a legal zip code
  • in hotword mode, it only returns match or no-match (you'd have to set a hotword-max-duration for that)

Grammar name:

  • builtin:speech/spelling/zipcode
  • builtin:speech/zipcode alias to builtin:speech/spelling/zipcode

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>35170</instance>
<input mode="speech" timestamp-start="2021-02-19T10:37:17.755+01:00" timestamp-end="2021-02-19T10:37:20.125+01:00" confidence="1.00">trente-cinq cent soixante-dix</input>
</interpretation>
</result>

Beware that this builtin is not universal. For other patterns than the one described above, we recommend you use spelling digits with length option instead.